ds_resource_plugin_py_lib.common.resource.dataset.base¶
File: base.py
Region: ds_resource_plugin_py_lib/common/resource/dataset
Description¶
Base dataset models and typed properties.
Attributes¶
Classes¶
NamedTuple that represents the dataset information. |
|
The object containing the settings of the dataset. |
|
The ds workflow nested object which identifies data within a data store, |
|
Binary dataset object which identifies data within a data store, |
|
Tabular dataset object which identifies data within a data store, |
Module Contents¶
- class ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetInfo[source]¶
Bases:
NamedTupleNamedTuple that represents the dataset information.
- type: str¶
- name: str¶
- class_name: str¶
- version: str¶
- description: str | None = None¶
- __str__() str[source]¶
Return a string representation of the dataset info.
- Returns:
A string representation of the dataset info.
- property key: tuple[str, str]¶
Return the composite key (type, version) for dictionary lookups.
- Returns:
A tuple containing the type and version.
- class ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettings[source]¶
Bases:
ds_common_serde_py_lib.SerializableThe object containing the settings of the dataset.
- ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettingsType¶
- ds_resource_plugin_py_lib.common.resource.dataset.base.LinkedServiceType¶
- ds_resource_plugin_py_lib.common.resource.dataset.base.SerializerType¶
- ds_resource_plugin_py_lib.common.resource.dataset.base.DeserializerType¶
- class ds_resource_plugin_py_lib.common.resource.dataset.base.Dataset[source]¶
Bases:
abc.ABC,ds_common_serde_py_lib.Serializable,Generic[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType]The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents.
You probably want to use the subclasses and not this class directly.
- id: uuid.UUID¶
- name: str¶
- description: str | None = None¶
- version: str¶
- settings: DatasetSettingsType¶
- linked_service: LinkedServiceType¶
- serializer: SerializerType | None = None¶
- deserializer: DeserializerType | None = None¶
- input: Any | None = None¶
- output: Any | None = None¶
- checkpoint: dict[str, Any]¶
- classmethod __init_subclass__(**kwargs: Any) None[source]¶
Initialize the subclass.
- Parameters:
kwargs – The keyword arguments.
- Returns:
The subclass.
- __exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) None[source]¶
Context manager exit.
- Parameters:
exc_type – The type of the exception.
exc_value – The value of the exception.
traceback – The traceback of the exception.
- property supports_checkpoint: bool¶
Whether this provider supports incremental loads via
self.checkpoint.
- property type: enum.StrEnum¶
- Abstractmethod:
Get the type of the dataset.
- abstractmethod create() None[source]¶
Insert all rows in
self.inputinto the target as a single atomic transaction. Must not delete, update, or overwrite existing data.- Raises:
CreateError – If the operation fails.
NotSupportedError – If the provider does not support create.
See also
Full contract:
docs/DATASET_CONTRACT.md–create()
- abstractmethod read() None[source]¶
Read data from the source and assign it to
self.output. Pagination within a single call is handled internally. Supports incremental loads viaself.checkpoint.- Raises:
ReadError – If the operation fails.
NotSupportedError – If the provider does not support read.
See also
Full contract:
docs/DATASET_CONTRACT.md–read()
- abstractmethod update() None[source]¶
Update existing rows in the target matched by identity columns defined in
self.settings. Atomic. Must not insert new rows.- Raises:
UpdateError – If the operation fails.
NotSupportedError – If the provider does not support update.
See also
Full contract:
docs/DATASET_CONTRACT.md–update()
- abstractmethod upsert() None[source]¶
Insert rows that do not exist, update rows that do, matched by identity columns defined in
self.settings. Atomic.- Raises:
UpsertError – If the operation fails.
NotSupportedError – If the provider does not support upsert.
See also
Full contract:
docs/DATASET_CONTRACT.md–upsert()
- abstractmethod delete() None[source]¶
Remove specific rows from the target matched by identity columns defined in
self.settings. Atomic. Idempotent.- Raises:
DeleteError – If the operation fails.
NotSupportedError – If the provider does not support delete.
See also
Full contract:
docs/DATASET_CONTRACT.md–delete()
- abstractmethod purge() None[source]¶
Remove all content from the target.
self.inputis not used. Atomic. Idempotent.- Raises:
PurgeError – If the operation fails.
NotSupportedError – If the provider does not support purge.
See also
Full contract:
docs/DATASET_CONTRACT.md–purge()
- abstractmethod list() None[source]¶
Discover available resources and populate
self.outputwith a DataFrame of resources and their metadata. Idempotent.- Raises:
ListError – If the operation fails.
NotSupportedError – If the provider does not support listing.
See also
Full contract:
docs/DATASET_CONTRACT.md–list()
- abstractmethod rename() None[source]¶
Rename the resource in the backend. Atomic. Not idempotent.
- Raises:
RenameError – If the operation fails.
NotSupportedError – If the provider does not support renaming.
See also
Full contract:
docs/DATASET_CONTRACT.md–rename()
- class ds_resource_plugin_py_lib.common.resource.dataset.base.BinaryDataset[source]¶
Bases:
Dataset[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType],Generic[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType]Binary dataset object which identifies data within a data store, such as files, folders and documents.
The input of the dataset is a binary file. The output of the dataset is a binary file.
- input: io.BytesIO¶
- output: io.BytesIO¶
- class ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset[source]¶
Bases:
Dataset[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType],Generic[LinkedServiceType,DatasetSettingsType,SerializerType,DeserializerType]Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.
The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.
- input: pandas.DataFrame¶
- output: pandas.DataFrame¶