ds_resource_plugin_py_lib.common.resource.dataset.base¶

File: base.py Region: ds_resource_plugin_py_lib/common/resource/dataset

Description¶

Base dataset models and typed properties.

Attributes¶

`DatasetSettingsType`
`LinkedServiceType`
`SerializerType`
`DeserializerType`

Classes¶

`DatasetInfo`	NamedTuple that represents the dataset information.
`DatasetSettings`	The object containing the settings of the dataset.
`Dataset`	The ds workflow nested object which identifies data within a data store,
`BinaryDataset`	Binary dataset object which identifies data within a data store,
`TabularDataset`	Tabular dataset object which identifies data within a data store,

Module Contents¶

class ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetInfo[source]¶

Bases: NamedTuple

NamedTuple that represents the dataset information.

type: str¶

name: str¶

class_name: str¶

version: str¶

description: str | None = None¶

__str__() → str[source]¶

Return a string representation of the dataset info.

Returns:: A string representation of the dataset info.

property key: tuple[str, str]¶

Return the composite key (type, version) for dictionary lookups.

Returns:: A tuple containing the type and version.

class ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettings[source]¶

Bases: ds_common_serde_py_lib.Serializable

The object containing the settings of the dataset.

ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettingsType¶

ds_resource_plugin_py_lib.common.resource.dataset.base.LinkedServiceType¶

ds_resource_plugin_py_lib.common.resource.dataset.base.SerializerType¶

ds_resource_plugin_py_lib.common.resource.dataset.base.DeserializerType¶

class ds_resource_plugin_py_lib.common.resource.dataset.base.Dataset[source]¶

Bases: abc.ABC, ds_common_serde_py_lib.Serializable, Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents.

You probably want to use the subclasses and not this class directly.

id: uuid.UUID¶

name: str¶

description: str | None = None¶

version: str¶

settings: DatasetSettingsType¶

linked_service: LinkedServiceType¶

serializer: SerializerType | None = None¶

deserializer: DeserializerType | None = None¶

input: Any | None = None¶

output: Any | None = None¶

checkpoint: dict[str, Any]¶

operation: ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo¶

classmethod __init_subclass__(**kwargs: Any) → None[source]¶

Initialize the subclass.

Parameters:: kwargs – The keyword arguments.
Returns:: The subclass.

__enter__() → Self[source]¶

Context manager enter.

Returns:: The dataset.

__exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) → None[source]¶

Context manager exit.

Parameters:

exc_type – The type of the exception.
exc_value – The value of the exception.
traceback – The traceback of the exception.

property supports_checkpoint: bool¶: Whether this provider supports incremental loads via self.checkpoint.

property type: enum.StrEnum¶

Abstractmethod:

Get the type of the dataset.

abstractmethod create() → None[source]¶

Insert all rows in self.input into the target as a single atomic transaction. Must not delete, update, or overwrite existing data.

Raises:

CreateError – If the operation fails.
NotSupportedError – If the provider does not support create.

See also

Full contract: docs/DATASET_CONTRACT.md – create()

abstractmethod read() → None[source]¶

Read data from the source and assign it to self.output. Pagination within a single call is handled internally. Supports incremental loads via self.checkpoint.

Raises:

ReadError – If the operation fails.
NotSupportedError – If the provider does not support read.

See also

Full contract: docs/DATASET_CONTRACT.md – read()

abstractmethod update() → None[source]¶

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:

UpdateError – If the operation fails.
NotSupportedError – If the provider does not support update.

See also

Full contract: docs/DATASET_CONTRACT.md – update()

abstractmethod upsert() → None[source]¶

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:

UpsertError – If the operation fails.
NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.md – upsert()

abstractmethod delete() → None[source]¶

Remove specific rows from the target matched by identity columns defined in self.settings. Atomic. Idempotent.

Raises:

DeleteError – If the operation fails.
NotSupportedError – If the provider does not support delete.

See also

Full contract: docs/DATASET_CONTRACT.md – delete()

abstractmethod purge() → None[source]¶

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:

PurgeError – If the operation fails.
NotSupportedError – If the provider does not support purge.

See also

Full contract: docs/DATASET_CONTRACT.md – purge()

abstractmethod list() → None[source]¶

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:

ListError – If the operation fails.
NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.md – list()

abstractmethod rename() → None[source]¶

Rename the resource in the backend. Atomic. Not idempotent.

Raises:

RenameError – If the operation fails.
NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.md – rename()

abstractmethod close() → None[source]¶: Release any connections, sessions, or handles held by the linked service. Must not raise if already closed. Idempotent.

See also

Full contract: docs/DATASET_CONTRACT.md – close()

class ds_resource_plugin_py_lib.common.resource.dataset.base.BinaryDataset[source]¶

Bases: Dataset[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType], Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

Binary dataset object which identifies data within a data store, such as files, folders and documents.

The input of the dataset is a binary file. The output of the dataset is a binary file.

input: io.BytesIO¶

output: io.BytesIO¶

class ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset[source]¶

Bases: Dataset[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType], Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

input: pandas.DataFrame¶

output: pandas.DataFrame¶