ds_resource_plugin_py_lib.common.resource.dataset
=================================================

.. py:module:: ds_resource_plugin_py_lib.common.resource.dataset

.. autoapi-nested-parse::

   **File:** ``__init__.py``
   **Region:** ``ds_resource_plugin_py_lib/common/resource/dataset``

   Description
   -----------
   Dataset models, typed properties, and storage format helpers.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/base/index
   /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/decorators/index
   /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/enums/index
   /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/errors/index
   /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/result/index
   /autoapi/ds_resource_plugin_py_lib/common/resource/dataset/storage_format/index


Classes
-------

.. autoapisummary::

   ds_resource_plugin_py_lib.common.resource.dataset.Dataset
   ds_resource_plugin_py_lib.common.resource.dataset.DatasetInfo
   ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings
   ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset
   ds_resource_plugin_py_lib.common.resource.dataset.DatasetMethod
   ds_resource_plugin_py_lib.common.resource.dataset.OperationError
   ds_resource_plugin_py_lib.common.resource.dataset.OperationInfo
   ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormat
   ds_resource_plugin_py_lib.common.resource.dataset.DatasetStorageFormatType


Package Contents
----------------

.. py:class:: Dataset

   Bases: :py:obj:`abc.ABC`, :py:obj:`ds_common_serde_py_lib.Serializable`, :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ]


   The ds workflow nested object which identifies data within a data store,
   such as table, files, folders and documents.

   You probably want to use the subclasses and not this class directly.


   .. py:attribute:: id
      :type:  uuid.UUID


   .. py:attribute:: name
      :type:  str


   .. py:attribute:: description
      :type:  str | None
      :value: None


   .. py:attribute:: version
      :type:  str


   .. py:attribute:: settings
      :type:  DatasetSettingsType


   .. py:attribute:: linked_service
      :type:  LinkedServiceType


   .. py:attribute:: serializer
      :type:  SerializerType | None
      :value: None


   .. py:attribute:: deserializer
      :type:  DeserializerType | None
      :value: None


   .. py:attribute:: input
      :type:  Any | None
      :value: None


   .. py:attribute:: output
      :type:  Any | None
      :value: None


   .. py:attribute:: checkpoint
      :type:  dict[str, Any]


   .. py:attribute:: operation
      :type:  ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo


   .. py:method:: __init_subclass__(**kwargs: Any) -> None
      :classmethod:


      Initialize the subclass.

      :param kwargs: The keyword arguments.

      :returns: The subclass.


   .. py:method:: __enter__() -> Self

      Context manager enter.

      :returns: The dataset.


   .. py:method:: __exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) -> None

      Context manager exit.

      :param exc_type: The type of the exception.
      :param exc_value: The value of the exception.
      :param traceback: The traceback of the exception.


   .. py:property:: supports_checkpoint
      :type: bool


      Whether this provider supports incremental loads via ``self.checkpoint``.


   .. py:property:: type
      :type: enum.StrEnum

      :abstractmethod:


      Get the type of the dataset.


   .. py:method:: create() -> None
      :abstractmethod:


      Insert all rows in ``self.input`` into the target as a single atomic
      transaction. Must not delete, update, or overwrite existing data.

      :raises CreateError: If the operation fails.
      :raises NotSupportedError: If the provider does not support create.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``create()``


   .. py:method:: read() -> None
      :abstractmethod:


      Read data from the source and assign it to ``self.output``.
      Pagination within a single call is handled internally.
      Supports incremental loads via ``self.checkpoint``.

      :raises ReadError: If the operation fails.
      :raises NotSupportedError: If the provider does not support read.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``read()``


   .. py:method:: update() -> None
      :abstractmethod:


      Update existing rows in the target matched by identity columns
      defined in ``self.settings``. Atomic. Must not insert new rows.

      :raises UpdateError: If the operation fails.
      :raises NotSupportedError: If the provider does not support update.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``update()``


   .. py:method:: upsert() -> None
      :abstractmethod:


      Insert rows that do not exist, update rows that do, matched by
      identity columns defined in ``self.settings``. Atomic.

      :raises UpsertError: If the operation fails.
      :raises NotSupportedError: If the provider does not support upsert.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()``


   .. py:method:: delete() -> None
      :abstractmethod:


      Remove specific rows from the target matched by identity columns
      defined in ``self.settings``. Atomic. Idempotent.

      :raises DeleteError: If the operation fails.
      :raises NotSupportedError: If the provider does not support delete.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``delete()``


   .. py:method:: purge() -> None
      :abstractmethod:


      Remove all content from the target. ``self.input`` is not used.
      Atomic. Idempotent.

      :raises PurgeError: If the operation fails.
      :raises NotSupportedError: If the provider does not support purge.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``purge()``


   .. py:method:: list() -> None
      :abstractmethod:


      Discover available resources and populate ``self.output`` with a
      DataFrame of resources and their metadata. Idempotent.

      :raises ListError: If the operation fails.
      :raises NotSupportedError: If the provider does not support listing.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()``


   .. py:method:: rename() -> None
      :abstractmethod:


      Rename the resource in the backend. Atomic. Not idempotent.

      :raises RenameError: If the operation fails.
      :raises NotSupportedError: If the provider does not support renaming.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()``


   .. py:method:: close() -> None
      :abstractmethod:


      Release any connections, sessions, or handles held by the linked
      service. Must not raise if already closed. Idempotent.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``close()``


.. py:class:: DatasetInfo

   Bases: :py:obj:`NamedTuple`


   NamedTuple that represents the dataset information.


   .. py:attribute:: type
      :type:  str


   .. py:attribute:: name
      :type:  str


   .. py:attribute:: class_name
      :type:  str


   .. py:attribute:: version
      :type:  str


   .. py:attribute:: description
      :type:  str | None
      :value: None


   .. py:method:: __str__() -> str

      Return a string representation of the dataset info.

      :returns: A string representation of the dataset info.


   .. py:property:: key
      :type: tuple[str, str]


      Return the composite key (type, version) for dictionary lookups.

      :returns: A tuple containing the type and version.


.. py:class:: DatasetSettings

   Bases: :py:obj:`ds_common_serde_py_lib.Serializable`


   The object containing the settings of the dataset.


.. py:class:: TabularDataset

   Bases: :py:obj:`Dataset`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ], :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ]


   Tabular dataset object which identifies data within a data store,
   such as table/csv/json/parquet/parquetdataset/ and other documents.

   The input of the dataset is a pandas DataFrame.
   The output of the dataset is a pandas DataFrame.


   .. py:attribute:: input
      :type:  pandas.DataFrame


   .. py:attribute:: output
      :type:  pandas.DataFrame


.. py:class:: DatasetMethod

   Bases: :py:obj:`enum.StrEnum`


   Allowed dataset operation names.


   .. py:attribute:: CREATE
      :value: 'create'


      Insert rows into the target. Atomic. Not idempotent.


   .. py:attribute:: READ
      :value: 'read'


      Read all data from the source into ``self.output``. Idempotent.


   .. py:attribute:: UPDATE
      :value: 'update'


      Update existing rows matched by identity columns. Atomic. Idempotent.


   .. py:attribute:: UPSERT
      :value: 'upsert'


      Insert or update rows matched by identity columns. Atomic. Idempotent.


   .. py:attribute:: DELETE
      :value: 'delete'


      Remove specific rows matched by identity columns. Atomic. Idempotent.


   .. py:attribute:: PURGE
      :value: 'purge'


      Remove all content from the target. Atomic. Idempotent.


   .. py:attribute:: LIST
      :value: 'list'


      Discover available resources and populate ``self.output``. Idempotent.


   .. py:attribute:: RENAME
      :value: 'rename'


      Rename a resource in the backend. Atomic. Not idempotent.


   .. py:method:: all_values() -> frozenset[str]
      :staticmethod:


      Return all operation method values as a frozen set.


.. py:class:: OperationError

   Bases: :py:obj:`ds_common_serde_py_lib.Serializable`


   Structured error captured from a ``ResourceException``.


   .. py:attribute:: message
      :type:  str

      The error message.


   .. py:attribute:: code
      :type:  str

      The error code.


   .. py:attribute:: status_code
      :type:  int

      The HTTP status code.


   .. py:attribute:: details
      :type:  dict[str, Any]

      The error details.


.. py:class:: OperationInfo

   Bases: :py:obj:`ds_common_serde_py_lib.Serializable`


   Report produced by every dataset operation.

   Timing fields (``started_at``, ``ended_at``, ``duration_ms``) are
   populated automatically by the ``track_result`` decorator.  Providers
   may set ``row_count``, ``schema``, or ``metadata`` inside their
   method; any value left at its default will be auto-derived from
   ``self.output`` after the method returns.

   Accessible on the dataset instance as ``self.operation``.


   .. py:attribute:: method
      :type:  ds_resource_plugin_py_lib.common.resource.dataset.enums.DatasetMethod | None
      :value: None


      The method that was called.


   .. py:attribute:: success
      :type:  bool
      :value: False


      Whether the method call was successful.


   .. py:attribute:: error
      :type:  OperationError | None
      :value: None


      The error captured from a ``ResourceException``.


   .. py:attribute:: row_count
      :type:  int
      :value: 0


      The number of rows read, written, or discovered.


   .. py:attribute:: started_at
      :type:  datetime.datetime | None
      :value: None


      The timestamp when the method started.


   .. py:attribute:: ended_at
      :type:  datetime.datetime | None
      :value: None


      The timestamp when the method ended.


   .. py:attribute:: duration_ms
      :type:  float
      :value: 0.0


      The duration of the method in milliseconds.


   .. py:attribute:: schema
      :type:  dict[str, Any] | None
      :value: None


      The schema of the data.


   .. py:attribute:: metadata
      :type:  dict[str, Any]

      The metadata of the data.


.. py:class:: DatasetStorageFormat

   Bases: :py:obj:`ds_common_serde_py_lib.Serializable`


   The object containing the storage format of the dataset.


   .. py:attribute:: type
      :type:  DatasetStorageFormatType


   .. py:attribute:: args
      :type:  dict[str, Any]


.. py:class:: DatasetStorageFormatType

   Bases: :py:obj:`enum.StrEnum`


   Enum to define the storage format types.


   .. py:attribute:: PARQUET
      :value: 'parquet'


   .. py:attribute:: CSV
      :value: 'csv'


   .. py:attribute:: JSON
      :value: 'json'


   .. py:attribute:: EXCEL
      :value: 'excel'


   .. py:attribute:: SEMI_STRUCTURED_JSON
      :value: 'semi-structured-json'


   .. py:attribute:: XML
      :value: 'xml'