ds_resource_plugin_py_lib.common.resource.dataset¶

File: __init__.py Region: ds_resource_plugin_py_lib/common/resource/dataset

Description¶

Dataset models, typed properties, and storage format helpers.

Submodules¶

Classes¶

`Dataset`	The ds workflow nested object which identifies data within a data store,
`DatasetInfo`	NamedTuple that represents the dataset information.
`DatasetSettings`	The object containing the settings of the dataset.
`TabularDataset`	Tabular dataset object which identifies data within a data store,
`DatasetMethod`	Allowed dataset operation names.
`OperationError`	Structured error captured from a `ResourceException`.
`OperationInfo`	Report produced by every dataset operation.
`DatasetStorageFormat`	The object containing the storage format of the dataset.
`DatasetStorageFormatType`	Enum to define the storage format types.

Package Contents¶

class ds_resource_plugin_py_lib.common.resource.dataset.Dataset[source]¶

Bases: abc.ABC, ds_common_serde_py_lib.Serializable, Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents.

You probably want to use the subclasses and not this class directly.

id: uuid.UUID¶

name: str¶

description: str | None = None¶

version: str¶

settings: DatasetSettingsType¶

linked_service: LinkedServiceType¶

serializer: SerializerType | None = None¶

deserializer: DeserializerType | None = None¶

input: Any | None = None¶

output: Any | None = None¶

checkpoint: dict[str, Any]¶

operation: ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo¶

classmethod __init_subclass__(**kwargs: Any) → None[source]¶

Initialize the subclass.

Parameters:: kwargs – The keyword arguments.
Returns:: The subclass.

__enter__() → Self[source]¶

Context manager enter.

Returns:: The dataset.

__exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) → None[source]¶

Context manager exit.

Parameters:

exc_type – The type of the exception.
exc_value – The value of the exception.
traceback – The traceback of the exception.

property supports_checkpoint: bool¶: Whether this provider supports incremental loads via self.checkpoint.

property type: enum.StrEnum¶

Abstractmethod:

Get the type of the dataset.

abstractmethod create() → None[source]¶

Insert all rows in self.input into the target as a single atomic transaction. Must not delete, update, or overwrite existing data.

Raises:

CreateError – If the operation fails.
NotSupportedError – If the provider does not support create.

See also

Full contract: docs/DATASET_CONTRACT.md – create()

abstractmethod read() → None[source]¶

Read data from the source and assign it to self.output. Pagination within a single call is handled internally. Supports incremental loads via self.checkpoint.

Raises:

ReadError – If the operation fails.
NotSupportedError – If the provider does not support read.

See also

Full contract: docs/DATASET_CONTRACT.md – read()

abstractmethod update() → None[source]¶

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:

UpdateError – If the operation fails.
NotSupportedError – If the provider does not support update.

See also

Full contract: docs/DATASET_CONTRACT.md – update()

abstractmethod upsert() → None[source]¶

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:

UpsertError – If the operation fails.
NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.md – upsert()

abstractmethod delete() → None[source]¶

Remove specific rows from the target matched by identity columns defined in self.settings. Atomic. Idempotent.

Raises:

DeleteError – If the operation fails.
NotSupportedError – If the provider does not support delete.

See also

Full contract: docs/DATASET_CONTRACT.md – delete()

abstractmethod purge() → None[source]¶

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:

PurgeError – If the operation fails.
NotSupportedError – If the provider does not support purge.

See also

Full contract: docs/DATASET_CONTRACT.md – purge()

abstractmethod list() → None[source]¶

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:

ListError – If the operation fails.
NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.md – list()

abstractmethod rename() → None[source]¶

Rename the resource in the backend. Atomic. Not idempotent.

Raises:

RenameError – If the operation fails.
NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.md – rename()

abstractmethod close() → None[source]¶: Release any connections, sessions, or handles held by the linked service. Must not raise if already closed. Idempotent.

See also

Full contract: docs/DATASET_CONTRACT.md – close()

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetInfo[source]¶

Bases: NamedTuple

NamedTuple that represents the dataset information.

type: str¶

name: str¶

class_name: str¶

version: str¶

description: str | None = None¶

__str__() → str[source]¶

Return a string representation of the dataset info.

Returns:: A string representation of the dataset info.

property key: tuple[str, str]¶

Return the composite key (type, version) for dictionary lookups.

Returns:: A tuple containing the type and version.

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings[source]¶

Bases: ds_common_serde_py_lib.Serializable

The object containing the settings of the dataset.

class ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[source]¶

Bases: Dataset[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType], Generic[LinkedServiceType, DatasetSettingsType, SerializerType, DeserializerType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

input: pandas.DataFrame¶

output: pandas.DataFrame¶

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetMethod[source]¶

Bases: enum.StrEnum

Allowed dataset operation names.

CREATE = 'create'¶: Insert rows into the target. Atomic. Not idempotent.

READ = 'read'¶: Read all data from the source into self.output. Idempotent.

UPDATE = 'update'¶: Update existing rows matched by identity columns. Atomic. Idempotent.

UPSERT = 'upsert'¶: Insert or update rows matched by identity columns. Atomic. Idempotent.

DELETE = 'delete'¶: Remove specific rows matched by identity columns. Atomic. Idempotent.

PURGE = 'purge'¶: Remove all content from the target. Atomic. Idempotent.

LIST = 'list'¶: Discover available resources and populate self.output. Idempotent.

RENAME = 'rename'¶: Rename a resource in the backend. Atomic. Not idempotent.

static all_values() → frozenset[str][source]¶: Return all operation method values as a frozen set.

class ds_resource_plugin_py_lib.common.resource.dataset.OperationError[source]¶

Bases: ds_common_serde_py_lib.Serializable

Structured error captured from a ResourceException.

message: str¶: The error message.

code: str¶: The error code.

status_code: int¶: The HTTP status code.

details: dict[str, Any]¶: The error details.

class ds_resource_plugin_py_lib.common.resource.dataset.OperationInfo[source]¶

Bases: ds_common_serde_py_lib.Serializable

Report produced by every dataset operation.

Timing fields (started_at, ended_at, duration_ms) are populated automatically by the track_result decorator. Providers may set row_count, schema, or metadata inside their method; any value left at its default will be auto-derived from self.output after the method returns.

Accessible on the dataset instance as self.operation.

method: ds_resource_plugin_py_lib.common.resource.dataset.enums.DatasetMethod | None = None¶: The method that was called.

success: bool = False¶: Whether the method call was successful.

error: OperationError | None = None¶: The error captured from a ResourceException.

row_count: int = 0¶: The number of rows read, written, or discovered.

started_at: datetime.datetime | None = None¶: The timestamp when the method started.

ended_at: datetime.datetime | None = None¶: The timestamp when the method ended.

duration_ms: float = 0.0¶: The duration of the method in milliseconds.

schema: dict[str, Any] | None = None¶: The schema of the data.

metadata: dict[str, Any]¶: The metadata of the data.

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormat[source]¶

Bases: ds_common_serde_py_lib.Serializable

The object containing the storage format of the dataset.

type: DatasetStorageFormatType¶

args: dict[str, Any]¶

class ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormatType[source]¶

Bases: enum.StrEnum

Enum to define the storage format types.

PARQUET = 'parquet'¶

CSV = 'csv'¶

JSON = 'json'¶

EXCEL = 'excel'¶

SEMI_STRUCTURED_JSON = 'semi-structured-json'¶

XML = 'xml'¶