ds_resource_plugin_py_lib.common.resource.dataset¶
File: __init__.py
Region: ds_resource_plugin_py_lib/common/resource/dataset
Description¶
Dataset models, typed properties, and storage format helpers.
Submodules¶
- ds_resource_plugin_py_lib.common.resource.dataset.base
- ds_resource_plugin_py_lib.common.resource.dataset.decorators
- ds_resource_plugin_py_lib.common.resource.dataset.enums
- ds_resource_plugin_py_lib.common.resource.dataset.errors
- ds_resource_plugin_py_lib.common.resource.dataset.result
- ds_resource_plugin_py_lib.common.resource.dataset.storage_format
Classes¶
The ds workflow nested object which identifies data within a data store, |
|
NamedTuple that represents the dataset information. |
|
The object containing the settings of the dataset. |
|
Tabular dataset object which identifies data within a data store, |
|
Allowed dataset operation names. |
|
Structured error captured from a |
|
Report produced by every dataset operation. |
|
The object containing the storage format of the dataset. |
|
Enum to define the storage format types. |
Package Contents¶
- class ds_resource_plugin_py_lib.common.resource.dataset.Dataset[source]¶
Bases:
abc.ABC,ds_common_serde_py_lib.Serializable,Generic[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType]The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents.
You probably want to use the subclasses and not this class directly.
- id: uuid.UUID¶
- name: str¶
- description: str | None = None¶
- version: str¶
- settings: DatasetSettingsType¶
- linked_service: LinkedServiceType¶
- serializer: SerializerType | None = None¶
- deserializer: DeserializerType | None = None¶
- input: Any | None = None¶
- output: Any | None = None¶
- checkpoint: dict[str, Any]¶
- classmethod __init_subclass__(**kwargs: Any) None[source]¶
Initialize the subclass.
- Parameters:
kwargs – The keyword arguments.
- Returns:
The subclass.
- __exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) None[source]¶
Context manager exit.
- Parameters:
exc_type – The type of the exception.
exc_value – The value of the exception.
traceback – The traceback of the exception.
- property supports_checkpoint: bool¶
Whether this provider supports incremental loads via
self.checkpoint.
- property type: enum.StrEnum¶
- Abstractmethod:
Get the type of the dataset.
- abstractmethod create() None[source]¶
Insert all rows in
self.inputinto the target as a single atomic transaction. Must not delete, update, or overwrite existing data.- Raises:
CreateError – If the operation fails.
NotSupportedError – If the provider does not support create.
See also
Full contract:
docs/DATASET_CONTRACT.md–create()
- abstractmethod read() None[source]¶
Read data from the source and assign it to
self.output. Pagination within a single call is handled internally. Supports incremental loads viaself.checkpoint.- Raises:
ReadError – If the operation fails.
NotSupportedError – If the provider does not support read.
See also
Full contract:
docs/DATASET_CONTRACT.md–read()
- abstractmethod update() None[source]¶
Update existing rows in the target matched by identity columns defined in
self.settings. Atomic. Must not insert new rows.- Raises:
UpdateError – If the operation fails.
NotSupportedError – If the provider does not support update.
See also
Full contract:
docs/DATASET_CONTRACT.md–update()
- abstractmethod upsert() None[source]¶
Insert rows that do not exist, update rows that do, matched by identity columns defined in
self.settings. Atomic.- Raises:
UpsertError – If the operation fails.
NotSupportedError – If the provider does not support upsert.
See also
Full contract:
docs/DATASET_CONTRACT.md–upsert()
- abstractmethod delete() None[source]¶
Remove specific rows from the target matched by identity columns defined in
self.settings. Atomic. Idempotent.- Raises:
DeleteError – If the operation fails.
NotSupportedError – If the provider does not support delete.
See also
Full contract:
docs/DATASET_CONTRACT.md–delete()
- abstractmethod purge() None[source]¶
Remove all content from the target.
self.inputis not used. Atomic. Idempotent.- Raises:
PurgeError – If the operation fails.
NotSupportedError – If the provider does not support purge.
See also
Full contract:
docs/DATASET_CONTRACT.md–purge()
- abstractmethod list() None[source]¶
Discover available resources and populate
self.outputwith a DataFrame of resources and their metadata. Idempotent.- Raises:
ListError – If the operation fails.
NotSupportedError – If the provider does not support listing.
See also
Full contract:
docs/DATASET_CONTRACT.md–list()
- abstractmethod rename() None[source]¶
Rename the resource in the backend. Atomic. Not idempotent.
- Raises:
RenameError – If the operation fails.
NotSupportedError – If the provider does not support renaming.
See also
Full contract:
docs/DATASET_CONTRACT.md–rename()
- class ds_resource_plugin_py_lib.common.resource.dataset.DatasetInfo[source]¶
Bases:
NamedTupleNamedTuple that represents the dataset information.
- type: str¶
- name: str¶
- class_name: str¶
- version: str¶
- description: str | None = None¶
- __str__() str[source]¶
Return a string representation of the dataset info.
- Returns:
A string representation of the dataset info.
- property key: tuple[str, str]¶
Return the composite key (type, version) for dictionary lookups.
- Returns:
A tuple containing the type and version.
- class ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings[source]¶
Bases:
ds_common_serde_py_lib.SerializableThe object containing the settings of the dataset.
- class ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[source]¶
Bases:
Dataset[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType],Generic[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType]Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.
The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.
- input: pandas.DataFrame¶
- output: pandas.DataFrame¶
- class ds_resource_plugin_py_lib.common.resource.dataset.DatasetMethod[source]¶
Bases:
enum.StrEnumAllowed dataset operation names.
- CREATE = 'create'¶
Insert rows into the target. Atomic. Not idempotent.
- READ = 'read'¶
Read all data from the source into
self.output. Idempotent.
- UPDATE = 'update'¶
Update existing rows matched by identity columns. Atomic. Idempotent.
- UPSERT = 'upsert'¶
Insert or update rows matched by identity columns. Atomic. Idempotent.
- DELETE = 'delete'¶
Remove specific rows matched by identity columns. Atomic. Idempotent.
- PURGE = 'purge'¶
Remove all content from the target. Atomic. Idempotent.
- LIST = 'list'¶
Discover available resources and populate
self.output. Idempotent.
- RENAME = 'rename'¶
Rename a resource in the backend. Atomic. Not idempotent.
- class ds_resource_plugin_py_lib.common.resource.dataset.OperationError[source]¶
Bases:
ds_common_serde_py_lib.SerializableStructured error captured from a
ResourceException.- message: str¶
The error message.
- code: str¶
The error code.
- status_code: int¶
The HTTP status code.
- details: dict[str, Any]¶
The error details.
- class ds_resource_plugin_py_lib.common.resource.dataset.OperationInfo[source]¶
Bases:
ds_common_serde_py_lib.SerializableReport produced by every dataset operation.
Timing fields (
started_at,ended_at,duration_ms) are populated automatically by thetrack_resultdecorator. Providers may setrow_count,schema, ormetadatainside their method; any value left at its default will be auto-derived fromself.outputafter the method returns.Accessible on the dataset instance as
self.operation.- method: ds_resource_plugin_py_lib.common.resource.dataset.enums.DatasetMethod | None = None¶
The method that was called.
- success: bool = False¶
Whether the method call was successful.
- error: OperationError | None = None¶
The error captured from a
ResourceException.
- row_count: int = 0¶
The number of rows read, written, or discovered.
- started_at: datetime.datetime | None = None¶
The timestamp when the method started.
- ended_at: datetime.datetime | None = None¶
The timestamp when the method ended.
- duration_ms: float = 0.0¶
The duration of the method in milliseconds.
- schema: dict[str, Any] | None = None¶
The schema of the data.
- metadata: dict[str, Any]¶
The metadata of the data.