ds_resource_plugin_py_lib.common.resource.dataset

File: __init__.py Region: ds_resource_plugin_py_lib/common/resource/dataset

Description

Dataset models, typed properties, and storage format helpers.

Submodules

Classes

Dataset

The ds workflow nested object which identifies data within a data store,

DatasetInfo

NamedTuple that represents the dataset information.

DatasetSettings

The object containing the settings of the dataset.

TabularDataset

Tabular dataset object which identifies data within a data store,

DatasetMethod

Allowed dataset operation names.

OperationError

Structured error captured from a ResourceException.

OperationInfo

Report produced by every dataset operation.

DatasetStorageFormat

The object containing the storage format of the dataset.

DatasetStorageFormatType

Enum to define the storage format types.

Package Contents

class ds_resource_plugin_py_lib.common.resource.dataset.Dataset[source]

Bases: abc.ABC, ds_common_serde_py_lib.Serializable, Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents.

You probably want to use the subclasses and not this class directly.

id: uuid.UUID
name: str
description: str | None = None
version: str
settings: DatasetSettingsType
linked_service: LinkedServiceType
serializer: SerializerType | None = None
deserializer: DeserializerType | None = None
input: Any | None = None
output: Any | None = None
checkpoint: dict[str, Any]
operation: ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo
classmethod __init_subclass__(**kwargs: Any) None[source]

Initialize the subclass.

Parameters:

kwargs – The keyword arguments.

Returns:

The subclass.

__enter__() Self[source]

Context manager enter.

Returns:

The dataset.

__exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) None[source]

Context manager exit.

Parameters:
  • exc_type – The type of the exception.

  • exc_value – The value of the exception.

  • traceback – The traceback of the exception.

property supports_checkpoint: bool

Whether this provider supports incremental loads via self.checkpoint.

property type: enum.StrEnum
Abstractmethod:

Get the type of the dataset.

abstractmethod create() None[source]

Insert all rows in self.input into the target as a single atomic transaction. Must not delete, update, or overwrite existing data.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdcreate()

abstractmethod read() None[source]

Read data from the source and assign it to self.output. Pagination within a single call is handled internally. Supports incremental loads via self.checkpoint.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdread()

abstractmethod update() None[source]

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdupdate()

abstractmethod upsert() None[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

abstractmethod delete() None[source]

Remove specific rows from the target matched by identity columns defined in self.settings. Atomic. Idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mddelete()

abstractmethod purge() None[source]

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdpurge()

abstractmethod list() None[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

abstractmethod rename() None[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

abstractmethod close() None[source]

Release any connections, sessions, or handles held by the linked service. Must not raise if already closed. Idempotent.

See also

Full contract: docs/DATASET_CONTRACT.mdclose()

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetInfo[source]

Bases: NamedTuple

NamedTuple that represents the dataset information.

type: str
name: str
class_name: str
version: str
description: str | None = None
__str__() str[source]

Return a string representation of the dataset info.

Returns:

A string representation of the dataset info.

property key: tuple[str, str]

Return the composite key (type, version) for dictionary lookups.

Returns:

A tuple containing the type and version.

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings[source]

Bases: ds_common_serde_py_lib.Serializable

The object containing the settings of the dataset.

class ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[source]

Bases: Dataset[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType], Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

input: pandas.DataFrame
output: pandas.DataFrame
class ds_resource_plugin_py_lib.common.resource.dataset.DatasetMethod[source]

Bases: enum.StrEnum

Allowed dataset operation names.

CREATE = 'create'

Insert rows into the target. Atomic. Not idempotent.

READ = 'read'

Read all data from the source into self.output. Idempotent.

UPDATE = 'update'

Update existing rows matched by identity columns. Atomic. Idempotent.

UPSERT = 'upsert'

Insert or update rows matched by identity columns. Atomic. Idempotent.

DELETE = 'delete'

Remove specific rows matched by identity columns. Atomic. Idempotent.

PURGE = 'purge'

Remove all content from the target. Atomic. Idempotent.

LIST = 'list'

Discover available resources and populate self.output. Idempotent.

RENAME = 'rename'

Rename a resource in the backend. Atomic. Not idempotent.

static all_values() frozenset[str][source]

Return all operation method values as a frozen set.

class ds_resource_plugin_py_lib.common.resource.dataset.OperationError[source]

Bases: ds_common_serde_py_lib.Serializable

Structured error captured from a ResourceException.

message: str

The error message.

code: str

The error code.

status_code: int

The HTTP status code.

details: dict[str, Any]

The error details.

class ds_resource_plugin_py_lib.common.resource.dataset.OperationInfo[source]

Bases: ds_common_serde_py_lib.Serializable

Report produced by every dataset operation.

Timing fields (started_at, ended_at, duration_ms) are populated automatically by the track_result decorator. Providers may set row_count, schema, or metadata inside their method; any value left at its default will be auto-derived from self.output after the method returns.

Accessible on the dataset instance as self.operation.

method: ds_resource_plugin_py_lib.common.resource.dataset.enums.DatasetMethod | None = None

The method that was called.

success: bool = False

Whether the method call was successful.

error: OperationError | None = None

The error captured from a ResourceException.

row_count: int = 0

The number of rows read, written, or discovered.

started_at: datetime.datetime | None = None

The timestamp when the method started.

ended_at: datetime.datetime | None = None

The timestamp when the method ended.

duration_ms: float = 0.0

The duration of the method in milliseconds.

schema: dict[str, Any] | None = None

The schema of the data.

metadata: dict[str, Any]

The metadata of the data.

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormat[source]

Bases: ds_common_serde_py_lib.Serializable

The object containing the storage format of the dataset.

type: DatasetStorageFormatType
args: dict[str, Any]
class ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormatType[source]

Bases: enum.StrEnum

Enum to define the storage format types.

PARQUET = 'parquet'
CSV = 'csv'
JSON = 'json'
EXCEL = 'excel'
SEMI_STRUCTURED_JSON = 'semi-structured-json'
XML = 'xml'