ds_resource_plugin_py_lib.common.resource.dataset.base ====================================================== .. py:module:: ds_resource_plugin_py_lib.common.resource.dataset.base .. autoapi-nested-parse:: **File:** ``base.py`` **Region:** ``ds_resource_plugin_py_lib/common/resource/dataset`` Description ----------- Base dataset models and typed properties. Attributes ---------- .. autoapisummary:: ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettingsType ds_resource_plugin_py_lib.common.resource.dataset.base.LinkedServiceType ds_resource_plugin_py_lib.common.resource.dataset.base.SerializerType ds_resource_plugin_py_lib.common.resource.dataset.base.DeserializerType Classes ------- .. autoapisummary:: ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetInfo ds_resource_plugin_py_lib.common.resource.dataset.base.DatasetSettings ds_resource_plugin_py_lib.common.resource.dataset.base.Dataset ds_resource_plugin_py_lib.common.resource.dataset.base.BinaryDataset ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset Module Contents --------------- .. py:class:: DatasetInfo Bases: :py:obj:`NamedTuple` NamedTuple that represents the dataset information. .. py:attribute:: type :type: str .. py:attribute:: name :type: str .. py:attribute:: class_name :type: str .. py:attribute:: version :type: str .. py:attribute:: description :type: str | None :value: None .. py:method:: __str__() -> str Return a string representation of the dataset info. :returns: A string representation of the dataset info. .. py:property:: key :type: tuple[str, str] Return the composite key (type, version) for dictionary lookups. :returns: A tuple containing the type and version. .. py:class:: DatasetSettings Bases: :py:obj:`ds_common_serde_py_lib.Serializable` The object containing the settings of the dataset. .. py:data:: DatasetSettingsType .. py:data:: LinkedServiceType .. py:data:: SerializerType .. py:data:: DeserializerType .. py:class:: Dataset Bases: :py:obj:`abc.ABC`, :py:obj:`ds_common_serde_py_lib.Serializable`, :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ] The ds workflow nested object which identifies data within a data store, such as table, files, folders and documents. You probably want to use the subclasses and not this class directly. .. py:attribute:: id :type: uuid.UUID .. py:attribute:: name :type: str .. py:attribute:: description :type: str | None :value: None .. py:attribute:: version :type: str .. py:attribute:: settings :type: DatasetSettingsType .. py:attribute:: linked_service :type: LinkedServiceType .. py:attribute:: serializer :type: SerializerType | None :value: None .. py:attribute:: deserializer :type: DeserializerType | None :value: None .. py:attribute:: input :type: Any | None :value: None .. py:attribute:: output :type: Any | None :value: None .. py:attribute:: checkpoint :type: dict[str, Any] .. py:attribute:: operation :type: ds_resource_plugin_py_lib.common.resource.dataset.result.OperationInfo .. py:method:: __init_subclass__(**kwargs: Any) -> None :classmethod: Initialize the subclass. :param kwargs: The keyword arguments. :returns: The subclass. .. py:method:: __enter__() -> Self Context manager enter. :returns: The dataset. .. py:method:: __exit__(exc_type: type[BaseException] | None, exc_value: BaseException | None, traceback: types.TracebackType | None) -> None Context manager exit. :param exc_type: The type of the exception. :param exc_value: The value of the exception. :param traceback: The traceback of the exception. .. py:property:: supports_checkpoint :type: bool Whether this provider supports incremental loads via ``self.checkpoint``. .. py:property:: type :type: enum.StrEnum :abstractmethod: Get the type of the dataset. .. py:method:: create() -> None :abstractmethod: Insert all rows in ``self.input`` into the target as a single atomic transaction. Must not delete, update, or overwrite existing data. :raises CreateError: If the operation fails. :raises NotSupportedError: If the provider does not support create. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``create()`` .. py:method:: read() -> None :abstractmethod: Read data from the source and assign it to ``self.output``. Pagination within a single call is handled internally. Supports incremental loads via ``self.checkpoint``. :raises ReadError: If the operation fails. :raises NotSupportedError: If the provider does not support read. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``read()`` .. py:method:: update() -> None :abstractmethod: Update existing rows in the target matched by identity columns defined in ``self.settings``. Atomic. Must not insert new rows. :raises UpdateError: If the operation fails. :raises NotSupportedError: If the provider does not support update. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``update()`` .. py:method:: upsert() -> None :abstractmethod: Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: delete() -> None :abstractmethod: Remove specific rows from the target matched by identity columns defined in ``self.settings``. Atomic. Idempotent. :raises DeleteError: If the operation fails. :raises NotSupportedError: If the provider does not support delete. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``delete()`` .. py:method:: purge() -> None :abstractmethod: Remove all content from the target. ``self.input`` is not used. Atomic. Idempotent. :raises PurgeError: If the operation fails. :raises NotSupportedError: If the provider does not support purge. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``purge()`` .. py:method:: list() -> None :abstractmethod: Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: rename() -> None :abstractmethod: Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None :abstractmethod: Release any connections, sessions, or handles held by the linked service. Must not raise if already closed. Idempotent. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``close()`` .. py:class:: BinaryDataset Bases: :py:obj:`Dataset`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ], :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ] Binary dataset object which identifies data within a data store, such as files, folders and documents. The input of the dataset is a binary file. The output of the dataset is a binary file. .. py:attribute:: input :type: io.BytesIO .. py:attribute:: output :type: io.BytesIO .. py:class:: TabularDataset Bases: :py:obj:`Dataset`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ], :py:obj:`Generic`\ [\ :py:obj:`LinkedServiceType`\ , :py:obj:`DatasetSettingsType`\ , :py:obj:`SerializerType`\ , :py:obj:`DeserializerType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: input :type: pandas.DataFrame .. py:attribute:: output :type: pandas.DataFrame