ds_resource_plugin_py_lib.common.resource.dataset ================================================= .. py:module:: ds_resource_plugin_py_lib.common.resource.dataset .. autoapi-nested-parse:: **File:** ``__init__.py`` **Region:** ``ds_resource_plugin_py_lib/common/resource/dataset`` Description ----------- Dataset models, typed properties, and storage format helpers. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/base/index /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/decorators/index /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/enums/index /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/errors/index /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/result/index /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/storage_format/index Classes ------- .. autoapisummary:: ds_resource_plugin_py_lib.common.resource.dataset.Dataset ds_resource_plugin_py_lib.common.resource.dataset.DatasetInfo ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset ds_resource_plugin_py_lib.common.resource.dataset.DatasetMethod ds_resource_plugin_py_lib.common.resource.dataset.OperationError ds_resource_plugin_py_lib.common.resource.dataset.OperationInfo ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormat ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormatType Package Contents ---------------- .. py:class:: Dataset Bases: :py:obj:`abc.ABC`, :py:obj:`ds_common_serde_py_lib.Serializable`, :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ] The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents. You probably want to use the subclasses and not this class directly. .. py:attribute:: id :type: uuid.UUID .. py:attribute:: name :type: str .. py:attribute:: description :type: str | None :value: None .. py:attribute:: version :type: str .. py:attribute:: settings :type: DatasetSettingsType .. py:attribute:: linked_service :type: LinkedServiceType .. py:attribute:: serializer :type: SerializerType | None :value: None .. py:attribute:: deserializer :type: DeserializerType | None :value: None .. py:attribute:: input :type: Any | None :value: None .. py:attribute:: output :type: Any | None :value: None .. py:attribute:: checkpoint :type: dict[str, Any] .. py:attribute:: operation :type: ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo .. py:method:: __init_subclass__(**kwargs: Any) -> None :classmethod: Initialize the subclass. :param kwargs: The keyword arguments. :returns: The subclass. .. py:method:: __enter__() -> Self Context manager enter. :returns: The dataset. .. py:method:: __exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) -> None Context manager exit. :param exc_type: The type of the exception. :param exc_value: The value of the exception. :param traceback: The traceback of the exception. .. py:property:: supports_checkpoint :type: bool Whether this provider supports incremental loads via ``self.checkpoint``. .. py:property:: type :type: enum.StrEnum :abstractmethod: Get the type of the dataset. .. py:method:: create() -> None :abstractmethod: Insert all rows in ``self.input`` into the target as a single atomic transaction. Must not delete, update, or overwrite existing data. :raises CreateError: If the operation fails. :raises NotSupportedError: If the provider does not support create. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``create()`` .. py:method:: read() -> None :abstractmethod: Read data from the source and assign it to ``self.output``. Pagination within a single call is handled internally. Supports incremental loads via ``self.checkpoint``. :raises ReadError: If the operation fails. :raises NotSupportedError: If the provider does not support read. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``read()`` .. py:method:: update() -> None :abstractmethod: Update existing rows in the target matched by identity columns defined in ``self.settings``. Atomic. Must not insert new rows. :raises UpdateError: If the operation fails. :raises NotSupportedError: If the provider does not support update. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``update()`` .. py:method:: upsert() -> None :abstractmethod: Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: delete() -> None :abstractmethod: Remove specific rows from the target matched by identity columns defined in ``self.settings``. Atomic. Idempotent. :raises DeleteError: If the operation fails. :raises NotSupportedError: If the provider does not support delete. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``delete()`` .. py:method:: purge() -> None :abstractmethod: Remove all content from the target. ``self.input`` is not used. Atomic. Idempotent. :raises PurgeError: If the operation fails. :raises NotSupportedError: If the provider does not support purge. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``purge()`` .. py:method:: list() -> None :abstractmethod: Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: rename() -> None :abstractmethod: Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None :abstractmethod: Release any connections, sessions, or handles held by the linked service. Must not raise if already closed. Idempotent. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``close()`` .. py:class:: DatasetInfo Bases: :py:obj:`NamedTuple` NamedTuple that represents the dataset information. .. py:attribute:: type :type: str .. py:attribute:: name :type: str .. py:attribute:: class_name :type: str .. py:attribute:: version :type: str .. py:attribute:: description :type: str | None :value: None .. py:method:: __str__() -> str Return a string representation of the dataset info. :returns: A string representation of the dataset info. .. py:property:: key :type: tuple[str, str] Return the composite key (type, version) for dictionary lookups. :returns: A tuple containing the type and version. .. py:class:: DatasetSettings Bases: :py:obj:`ds_common_serde_py_lib.Serializable` The object containing the settings of the dataset. .. py:class:: TabularDataset Bases: :py:obj:`Dataset`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ], :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: input :type: pandas.DataFrame .. py:attribute:: output :type: pandas.DataFrame .. py:class:: DatasetMethod Bases: :py:obj:`enum.StrEnum` Allowed dataset operation names. .. py:attribute:: CREATE :value: 'create' Insert rows into the target. Atomic. Not idempotent. .. py:attribute:: READ :value: 'read' Read all data from the source into ``self.output``. Idempotent. .. py:attribute:: UPDATE :value: 'update' Update existing rows matched by identity columns. Atomic. Idempotent. .. py:attribute:: UPSERT :value: 'upsert' Insert or update rows matched by identity columns. Atomic. Idempotent. .. py:attribute:: DELETE :value: 'delete' Remove specific rows matched by identity columns. Atomic. Idempotent. .. py:attribute:: PURGE :value: 'purge' Remove all content from the target. Atomic. Idempotent. .. py:attribute:: LIST :value: 'list' Discover available resources and populate ``self.output``. Idempotent. .. py:attribute:: RENAME :value: 'rename' Rename a resource in the backend. Atomic. Not idempotent. .. py:method:: all_values() -> frozenset[str] :staticmethod: Return all operation method values as a frozen set. .. py:class:: OperationError Bases: :py:obj:`ds_common_serde_py_lib.Serializable` Structured error captured from a ``ResourceException``. .. py:attribute:: message :type: str The error message. .. py:attribute:: code :type: str The error code. .. py:attribute:: status_code :type: int The HTTP status code. .. py:attribute:: details :type: dict[str, Any] The error details. .. py:class:: OperationInfo Bases: :py:obj:`ds_common_serde_py_lib.Serializable` Report produced by every dataset operation. Timing fields (``started_at``, ``ended_at``, ``duration_ms``) are populated automatically by the ``track_result`` decorator. Providers may set ``row_count``, ``schema``, or ``metadata`` inside their method; any value left at its default will be auto-derived from ``self.output`` after the method returns. Accessible on the dataset instance as ``self.operation``. .. py:attribute:: method :type: ds_resource_plugin_py_lib.common.resource.dataset.enums.DatasetMethod | None :value: None The method that was called. .. py:attribute:: success :type: bool :value: False Whether the method call was successful. .. py:attribute:: error :type: OperationError | None :value: None The error captured from a ``ResourceException``. .. py:attribute:: row_count :type: int :value: 0 The number of rows read, written, or discovered. .. py:attribute:: started_at :type: datetime.datetime | None :value: None The timestamp when the method started. .. py:attribute:: ended_at :type: datetime.datetime | None :value: None The timestamp when the method ended. .. py:attribute:: duration_ms :type: float :value: 0.0 The duration of the method in milliseconds. .. py:attribute:: schema :type: dict[str, Any] | None :value: None The schema of the data. .. py:attribute:: metadata :type: dict[str, Any] The metadata of the data. .. py:class:: DatasetStorageFormat Bases: :py:obj:`ds_common_serde_py_lib.Serializable` The object containing the storage format of the dataset. .. py:attribute:: type :type: DatasetStorageFormatType .. py:attribute:: args :type: dict[str, Any] .. py:class:: DatasetStorageFormatType Bases: :py:obj:`enum.StrEnum` Enum to define the storage format types. .. py:attribute:: PARQUET :value: 'parquet' .. py:attribute:: CSV :value: 'csv' .. py:attribute:: JSON :value: 'json' .. py:attribute:: EXCEL :value: 'excel' .. py:attribute:: SEMI_STRUCTURED_JSON :value: 'semi-structured-json' .. py:attribute:: XML :value: 'xml'