ds_provider_azure_py_lib.dataset ================================ .. py:module:: ds_provider_azure_py_lib.dataset .. autoapi-nested-parse:: **File:** ``__init__.py`` **Region:** ``ds_provider_azure_py_lib/dataset`` Azure Datasets: Table and Blob This module implements a datasets for Azure. Example (AzureTable): >>> azure_table = AzureTable( ... settings=AzureTableDatasetSettings( ... table_name="users", ... partition_key="partition_key", ... row_key="row_key", ... query_filter="additional query filter", ... delete_table=False, ... ), ... linked_service=AzureLinkedService( ... settings=AzureLinkedServiceSettings( ... account_name="account name", ... access_key="access key" ... ), ... ), ... ) >>> azure_table.read() >>> table_data = azure_table.output Example (AzureBlob): >>> azure_blob = AzureBlob( ... deserializer=AzureBlobDeserializer(format=DatasetStorageFormatType.CSV), ... serializer=AzureBlobSerializer(format=DatasetStorageFormatType.CSV), ... settings=AzureBlobDatasetSettings( ... container_name="my-container", ... blob_name="path/to/example_file.csv", ... prefix=None, # for multiple blobs, provide a prefix instead of blob_name ... create=CreateSettings( ... overwrite_blob_if_exists=True, # overwrite existing blob or raise an error. ... new_container=True # create container if missing or raise an error. ... ), ... purge=DeleteSettings( ... delete_container=True # delete the container or only delete the blob ... ), ... ), ... linked_service=AzureLinkedService( ... settings=AzureLinkedServiceSettings( ... account_name="account name", ... access_key="access key" ... ), ... id=uuid.uuid4(), ... name="testazurepackage", ... version="0.0.1", ... description="testazurepackage", ... ), ... id=uuid.uuid4(), ... name="testazurepackage", ... version="0.0.1", ... description="testazurepackage" ... ) >>> azure_blob.read() >>> blob_data = azure_blob.output Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_provider_azure_py_lib/dataset/blob/index /autoapi/ds_provider_azure_py_lib/dataset/table/index Classes ------- .. autoapisummary:: ds_provider_azure_py_lib.dataset.AzureBlob ds_provider_azure_py_lib.dataset.AzureBlobDatasetSettings ds_provider_azure_py_lib.dataset.AzureTable ds_provider_azure_py_lib.dataset.AzureTableDatasetSettings Package Contents ---------------- .. py:class:: AzureBlob Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureBlobDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureBlobDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: AzureLinkedServiceType .. py:attribute:: settings :type: AzureBlobDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None .. py:property:: type :type: ds_provider_azure_py_lib.enums.ResourceType Get the type of the dataset. :returns: ResourceType .. py:method:: _list_blobs(prefix: str) -> azure.core.paging.ItemPaged[azure.storage.blob.BlobProperties] List all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: An iterable of BlobProperties matching the prefix. :rtype: ItemPaged[BlobProperties] .. py:method:: _read_blob(blob: str) -> pandas.DataFrame Read a specific blob in the container. :param blob: name of the blob to read. :returns: content of the blob as a DataFrame. :rtype: pd.DataFrame .. py:method:: _read_blobs(prefix: str) -> pandas.DataFrame Read all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: Content of all blobs concatenated as a DataFrame. :rtype: pd.DataFrame .. py:method:: _create_container() -> None Create a container in the Azure Blob Storage. :raises CreateError: If the container creation fails. :returns: None .. py:method:: _create_blob(stream: bytes, blob: str) -> None Create a specific blob in the container. :param stream: data stream to upload to the blob. :param blob: name of the blob to create. :raises CreateError: If the blob creation fails. :returns: None .. py:method:: _delete_blob(blob: str) -> pandas.DataFrame Delete a specific blob in the container. :param blob: name of the blob to delete. :returns: Empty DataFrame upon successful deletion. :rtype: pd.DataFrame :raises DeleteError: If the blob deletion fails. .. py:method:: _delete_blobs(prefix: str) -> pandas.DataFrame Delete all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: Empty DataFrame upon successful deletion of all blobs. :rtype: pd.DataFrame :raises DeleteError: If one or more blob deletions fail. .. py:method:: read(**_kwargs: Any) -> None Read Azure Blob Storage dataset. :param _kwargs: Additional keyword arguments to pass to the request. :returns: None :raises ReadError: If reading the blob(s) fails. .. py:method:: create(**_kwargs: Any) -> None Create a blob in the container :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises CreateError: If the blob creation fails. .. py:method:: update() -> NoReturn Update existing rows in the target matched by identity columns defined in ``self.settings``. Atomic. Must not insert new rows. :raises UpdateError: If the operation fails. :raises NotSupportedError: If the provider does not support update. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``update()`` .. py:method:: list() -> NoReturn Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: purge(**_kwargs: Any) -> None Purge (remove all content from) the container. For Azure Blob Storage, this deletes all blobs from the container, leaving the container empty. The container itself is not deleted. :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises DeleteError: If the purge operation fails. .. py:method:: upsert() -> NoReturn Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: delete(**_kwargs: Any) -> None Delete specific blob(s) or the entire container from Azure Blob Storage. For Azure Blob Storage, a "row" is a blob. This method deletes: - Specific blob by blob_name - Multiple blobs by prefix - Entire container if delete_container=True and no blob_name/prefix provided :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises DeleteError: If the deletion fails or requirements not met. .. py:method:: rename() -> NoReturn Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None No need to close the linked service. Just to comply with the interface. :returns: None .. py:method:: concat(dfs: list[pandas.DataFrame]) -> pandas.DataFrame :staticmethod: concatenate a list of dataframes into a single dataframe. :param dfs: DataFrames to concatenate. :returns: Concatenated DataFrame or empty DataFrame if input list is empty. :rtype: DataFrame .. py:method:: get_details() -> dict[str, Any] Get details of the dataset. :returns: Details of the dataset. :rtype: Dict[str, Any] .. py:class:: AzureBlobDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for Azure Blob Storage dataset operations. Exactly one of `blob_name` or `prefix` must be provided for read()/delete(); if specifying both, only `blob_name` will be considered. `prefix` is not used for create(); it can be called only with `blob_name`. `create` by default (if not passed) will attempt to create the container if it does not exist. `delete()` removes specific blob(s) by name or prefix. .. py:attribute:: container_name :type: str .. py:attribute:: blob_name :type: str | None :value: None .. py:attribute:: prefix :type: str | None :value: None .. py:attribute:: create :type: CreateSettings .. py:attribute:: purge :type: PurgeSettings .. py:class:: AzureTable Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureTableDatasetSettingsType`\ , :py:obj:`ds_provider_azure_py_lib.serde.AzureTableSerializer`\ , :py:obj:`ds_provider_azure_py_lib.serde.AzureTableDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureTableDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: AzureLinkedServiceType .. py:attribute:: settings :type: AzureTableDatasetSettingsType .. py:method:: __post_init__() -> None .. py:property:: type :type: ds_provider_azure_py_lib.enums.ResourceType Get the type of the Dataset. :returns: ResourceType .. py:method:: _prepare_content(content: pandas.DataFrame) -> dict[str, Any] Ensure that the content is provided and is in the correct format. :param content: The content to prepare. :type content: pd.DataFrame :returns: The prepared content. :rtype: dict :raises DatasetException: If the content is not a DataFrame, is empty, or does not contain required columns. .. py:method:: _get_table_client() -> azure.data.tables.TableClient Return a TableClient for the currently configured table. :returns: TableClient .. py:method:: _build_transaction_from_input(operation: str, params: collections.abc.Mapping[str, Any] | None = None) -> list[TransactionEntry] Build a list of transaction entries from self.input. operation: operation name as expected by TableClient.submit_transaction, e.g. "create", "upsert", "delete" :param operation: The operation to perform. :type operation: str :param params: optional params dict passed as third item in tuple (when required) e.g. {"mode": UpdateMode.REPLACE} :returns: list[TransactionEntry] :raises CreateError: If there is an error preparing content for creation. :raises UpdateError: If there is an error preparing content for update. :raises DeleteError: If there is an error preparing content for deletion. :raises DatasetException: If there is a general error preparing content. .. py:method:: _submit_transaction(transaction: collections.abc.Iterable[TransactionEntry], error_cls: type[ds_resource_plugin_py_lib.common.resource.dataset.errors.DatasetException]) -> None Submit transaction and map TableTransactionError to provided error_type. :param transaction: The transaction to submit. :type transaction: Iterable[TransactionEntry] :param error_cls: The exception class to raise on error. :type error_cls: builtins.type[DatasetException] :raises error_cls: An error submitting the transaction. .. py:method:: _delete_table() -> None Deletes the entire table from Azure Table Storage. :returns: None :raises DeleteError: If the table could not be deleted. .. py:method:: _create_table() -> None Creates a table in Azure Table Storage if it does not exist. :returns: None :raises CreateError: If the table could not be created due to an error other than it already existing. .. py:method:: read(**_kwargs: Any) -> None Read Azure Table Storage dataset. :param _kwargs: Additional keyword arguments :returns: None :raises ReadError: If there is an error reading from Azure Table Storage. .. py:method:: create(**_kwargs: Any) -> None Create an entity in Azure Table Storage. :returns: None :raises CreateError: If the entity could not be created. .. py:method:: update(**_kwargs: Any) -> None Update an entity in Azure Table Storage. :returns: None .. py:method:: delete(**_kwargs: Any) -> None Delete specific entities from Azure Table Storage. Only entities specified in `self.input` are deleted, matched by PartitionKey and RowKey. :param _kwargs: Additional keyword arguments :returns: None :raises DeleteError: If there is an error deleting from Azure Table Storage. .. py:method:: rename() -> NoReturn Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None No need to close the linked service. Just to comply with the interface. :returns: None .. py:method:: list() -> NoReturn Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: purge(**_kwargs: Any) -> None Purge all entities from the table or drop the entire table. If `delete_table=True` in settings, deletes the entire table. Otherwise, deletes all entities from the table, leaving it empty. :returns: None :raises DeleteError: If there is an error purging from Azure Table Storage. .. py:method:: upsert(**_kwargs: Any) -> None Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: get_details() -> dict[str, Any] Get details about the dataset. :returns: dict[str, Any] .. py:class:: AzureTableDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for Azure Table Storage dataset operations. The `read` settings contains read-specific configuration that only applies to the read() operation, not to create(), delete(), update(), etc. .. py:attribute:: table_name :type: str .. py:attribute:: purge :type: PurgeSettings Purge-specific settings. Only applies to the purge() operation. .. py:attribute:: read :type: ReadSettings Read-specific settings. Only applies to the read() operation. By default, read() will use read without filter.