ds_resource_plugin_py_lib.common.resource.dataset.base

File: base.py Region: ds_resource_plugin_py_lib/common/resource/dataset

Description

Base dataset models and typed properties.

Attributes

DatasetSettingsType

LinkedServiceType

SerializerType

DeserializerType

Classes

DatasetInfo

NamedTuple that represents the dataset information.

DatasetSettings

The object containing the settings of the dataset.

Dataset

The ds workflow nested object which identifies data within a data store,

BinaryDataset

Binary dataset object which identifies data within a data store,

TabularDataset

Tabular dataset object which identifies data within a data store,

Module Contents

class ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetInfo[source]

Bases: NamedTuple

NamedTuple that represents the dataset information.

type: str
name: str
class_name: str
version: str
description: str | None = None
__str__() str[source]

Return a string representation of the dataset info.

Returns:

A string representation of the dataset info.

property key: tuple[str, str]

Return the composite key (type, version) for dictionary lookups.

Returns:

A tuple containing the type and version.

class ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettings[source]

Bases: ds_common_serde_py_lib.Serializable

The object containing the settings of the dataset.

ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettingsType
ds_resource_plugin_py_lib.common.resource.dataset.base.LinkedServiceType
ds_resource_plugin_py_lib.common.resource.dataset.base.SerializerType
ds_resource_plugin_py_lib.common.resource.dataset.base.DeserializerType
class ds_resource_plugin_py_lib.common.resource.dataset.base.Dataset[source]

Bases: abc.ABC, ds_common_serde_py_lib.Serializable, Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents.

You probably want to use the subclasses and not this class directly.

id: uuid.UUID
name: str
description: str | None = None
version: str
settings: DatasetSettingsType
linked_service: LinkedServiceType
serializer: SerializerType | None = None
deserializer: DeserializerType | None = None
input: Any | None = None
output: Any | None = None
checkpoint: dict[str, Any]
operation: ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo
classmethod __init_subclass__(**kwargs: Any) None[source]

Initialize the subclass.

Parameters:

kwargs – The keyword arguments.

Returns:

The subclass.

__enter__() Self[source]

Context manager enter.

Returns:

The dataset.

__exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) None[source]

Context manager exit.

Parameters:
  • exc_type – The type of the exception.

  • exc_value – The value of the exception.

  • traceback – The traceback of the exception.

property supports_checkpoint: bool

Whether this provider supports incremental loads via self.checkpoint.

property type: enum.StrEnum
Abstractmethod:

Get the type of the dataset.

abstractmethod create() None[source]

Insert all rows in self.input into the target as a single atomic transaction. Must not delete, update, or overwrite existing data.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdcreate()

abstractmethod read() None[source]

Read data from the source and assign it to self.output. Pagination within a single call is handled internally. Supports incremental loads via self.checkpoint.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdread()

abstractmethod update() None[source]

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdupdate()

abstractmethod upsert() None[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

abstractmethod delete() None[source]

Remove specific rows from the target matched by identity columns defined in self.settings. Atomic. Idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mddelete()

abstractmethod purge() None[source]

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdpurge()

abstractmethod list() None[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

abstractmethod rename() None[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

abstractmethod close() None[source]

Release any connections, sessions, or handles held by the linked service. Must not raise if already closed. Idempotent.

See also

Full contract: docs/DATASET_CONTRACT.mdclose()

class ds_resource_plugin_py_lib.common.resource.dataset.base.BinaryDataset[source]

Bases: Dataset[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType], Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

Binary dataset object which identifies data within a data store, such as files, folders and documents.

The input of the dataset is a binary file. The output of the dataset is a binary file.

input: io.BytesIO
output: io.BytesIO
class ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset[source]

Bases: Dataset[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType], Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

input: pandas.DataFrame
output: pandas.DataFrame