ds_provider_azure_py_lib ======================== .. py:module:: ds_provider_azure_py_lib .. autoapi-nested-parse:: **File:** ``__init__.py`` **Region:** ``ds-provider-azure-py-lib`` Description ----------- A Python package from the ds-provider-azure-py-lib library. .. rubric:: Example .. code-block:: python from ds_provider_azure_py_lib import __version__ print(f"Package version: {__version__}") Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_provider_azure_py_lib/dataset/index /autoapi/ds_provider_azure_py_lib/enums/index /autoapi/ds_provider_azure_py_lib/linked_service/index /autoapi/ds_provider_azure_py_lib/serde/index Attributes ---------- .. autoapisummary:: ds_provider_azure_py_lib.__version__ Classes ------- .. autoapisummary:: ds_provider_azure_py_lib.AzureBlob ds_provider_azure_py_lib.AzureBlobDatasetSettings ds_provider_azure_py_lib.AzureTable ds_provider_azure_py_lib.AzureTableDatasetSettings ds_provider_azure_py_lib.AzureLinkedService ds_provider_azure_py_lib.AzureLinkedServiceSettings Package Contents ---------------- .. py:class:: AzureBlob Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureBlobDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureBlobDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: AzureLinkedServiceType .. py:attribute:: settings :type: AzureBlobDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None .. py:property:: type :type: ds_provider_azure_py_lib.enums.ResourceType Get the type of the dataset. :returns: ResourceType .. py:method:: _list_blobs(prefix: str) -> azure.core.paging.ItemPaged[azure.storage.blob.BlobProperties] List all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: An iterable of BlobProperties matching the prefix. :rtype: ItemPaged[BlobProperties] .. py:method:: _read_blob(blob: str) -> pandas.DataFrame Read a specific blob in the container. :param blob: name of the blob to read. :returns: content of the blob as a DataFrame. :rtype: pd.DataFrame .. py:method:: _read_blobs(prefix: str) -> pandas.DataFrame Read all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: Content of all blobs concatenated as a DataFrame. :rtype: pd.DataFrame .. py:method:: _create_container() -> None Create a container in the Azure Blob Storage. :raises CreateError: If the container creation fails. :returns: None .. py:method:: _create_blob(stream: bytes, blob: str) -> None Create a specific blob in the container. :param stream: data stream to upload to the blob. :param blob: name of the blob to create. :raises CreateError: If the blob creation fails. :returns: None .. py:method:: _delete_blob(blob: str) -> pandas.DataFrame Delete a specific blob in the container. :param blob: name of the blob to delete. :returns: Empty DataFrame upon successful deletion. :rtype: pd.DataFrame :raises DeleteError: If the blob deletion fails. .. py:method:: _delete_blobs(prefix: str) -> pandas.DataFrame Delete all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: Empty DataFrame upon successful deletion of all blobs. :rtype: pd.DataFrame :raises DeleteError: If one or more blob deletions fail. .. py:method:: read(**_kwargs: Any) -> None Read Azure Blob Storage dataset. :param _kwargs: Additional keyword arguments to pass to the request. :returns: None :raises ReadError: If reading the blob(s) fails. .. py:method:: create(**_kwargs: Any) -> None Create a blob in the container :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises CreateError: If the blob creation fails. .. py:method:: update() -> NoReturn Update existing rows in the target matched by identity columns defined in ``self.settings``. Atomic. Must not insert new rows. :raises UpdateError: If the operation fails. :raises NotSupportedError: If the provider does not support update. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``update()`` .. py:method:: list() -> NoReturn Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: purge(**_kwargs: Any) -> None Purge (remove all content from) the container. For Azure Blob Storage, this deletes all blobs from the container, leaving the container empty. The container itself is not deleted. :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises DeleteError: If the purge operation fails. .. py:method:: upsert() -> NoReturn Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: delete(**_kwargs: Any) -> None Delete specific blob(s) or the entire container from Azure Blob Storage. For Azure Blob Storage, a "row" is a blob. This method deletes: - Specific blob by blob_name - Multiple blobs by prefix - Entire container if delete_container=True and no blob_name/prefix provided :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises DeleteError: If the deletion fails or requirements not met. .. py:method:: rename() -> NoReturn Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None No need to close the linked service. Just to comply with the interface. :returns: None .. py:method:: concat(dfs: list[pandas.DataFrame]) -> pandas.DataFrame :staticmethod: concatenate a list of dataframes into a single dataframe. :param dfs: DataFrames to concatenate. :returns: Concatenated DataFrame or empty DataFrame if input list is empty. :rtype: DataFrame .. py:method:: get_details() -> dict[str, Any] Get details of the dataset. :returns: Details of the dataset. :rtype: Dict[str, Any] .. py:class:: AzureBlobDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for Azure Blob Storage dataset operations. Exactly one of `blob_name` or `prefix` must be provided for read()/delete(); if specifying both, only `blob_name` will be considered. `prefix` is not used for create(); it can be called only with `blob_name`. `create` by default (if not passed) will attempt to create the container if it does not exist. `delete()` removes specific blob(s) by name or prefix. .. py:attribute:: container_name :type: str .. py:attribute:: blob_name :type: str | None :value: None .. py:attribute:: prefix :type: str | None :value: None .. py:attribute:: create :type: CreateSettings .. py:attribute:: purge :type: PurgeSettings .. py:class:: AzureTable Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureTableDatasetSettingsType`\ , :py:obj:`ds_provider_azure_py_lib.serde.AzureTableSerializer`\ , :py:obj:`ds_provider_azure_py_lib.serde.AzureTableDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureTableDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: AzureLinkedServiceType .. py:attribute:: settings :type: AzureTableDatasetSettingsType .. py:method:: __post_init__() -> None .. py:property:: type :type: ds_provider_azure_py_lib.enums.ResourceType Get the type of the Dataset. :returns: ResourceType .. py:method:: _prepare_content(content: pandas.DataFrame) -> dict[str, Any] Ensure that the content is provided and is in the correct format. :param content: The content to prepare. :type content: pd.DataFrame :returns: The prepared content. :rtype: dict :raises DatasetException: If the content is not a DataFrame, is empty, or does not contain required columns. .. py:method:: _get_table_client() -> azure.data.tables.TableClient Return a TableClient for the currently configured table. :returns: TableClient .. py:method:: _build_transaction_from_input(operation: str, params: collections.abc.Mapping[str, Any] | None = None) -> list[TransactionEntry] Build a list of transaction entries from self.input. operation: operation name as expected by TableClient.submit_transaction, e.g. "create", "upsert", "delete" :param operation: The operation to perform. :type operation: str :param params: optional params dict passed as third item in tuple (when required) e.g. {"mode": UpdateMode.REPLACE} :returns: list[TransactionEntry] :raises CreateError: If there is an error preparing content for creation. :raises UpdateError: If there is an error preparing content for update. :raises DeleteError: If there is an error preparing content for deletion. :raises DatasetException: If there is a general error preparing content. .. py:method:: _submit_transaction(transaction: collections.abc.Iterable[TransactionEntry], error_cls: type[ds_resource_plugin_py_lib.common.resource.dataset.errors.DatasetException]) -> None Submit transaction and map TableTransactionError to provided error_type. :param transaction: The transaction to submit. :type transaction: Iterable[TransactionEntry] :param error_cls: The exception class to raise on error. :type error_cls: builtins.type[DatasetException] :raises error_cls: An error submitting the transaction. .. py:method:: _delete_table() -> None Deletes the entire table from Azure Table Storage. :returns: None :raises DeleteError: If the table could not be deleted. .. py:method:: _create_table() -> None Creates a table in Azure Table Storage if it does not exist. :returns: None :raises CreateError: If the table could not be created due to an error other than it already existing. .. py:method:: read(**_kwargs: Any) -> None Read Azure Table Storage dataset. :param _kwargs: Additional keyword arguments :returns: None :raises ReadError: If there is an error reading from Azure Table Storage. .. py:method:: create(**_kwargs: Any) -> None Create an entity in Azure Table Storage. :returns: None :raises CreateError: If the entity could not be created. .. py:method:: update(**_kwargs: Any) -> None Update an entity in Azure Table Storage. :returns: None .. py:method:: delete(**_kwargs: Any) -> None Delete specific entities from Azure Table Storage. Only entities specified in `self.input` are deleted, matched by PartitionKey and RowKey. :param _kwargs: Additional keyword arguments :returns: None :raises DeleteError: If there is an error deleting from Azure Table Storage. .. py:method:: rename() -> NoReturn Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None No need to close the linked service. Just to comply with the interface. :returns: None .. py:method:: list() -> NoReturn Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: purge(**_kwargs: Any) -> None Purge all entities from the table or drop the entire table. If `delete_table=True` in settings, deletes the entire table. Otherwise, deletes all entities from the table, leaving it empty. :returns: None :raises DeleteError: If there is an error purging from Azure Table Storage. .. py:method:: upsert(**_kwargs: Any) -> None Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: get_details() -> dict[str, Any] Get details about the dataset. :returns: dict[str, Any] .. py:class:: AzureTableDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for Azure Table Storage dataset operations. The `read` settings contains read-specific configuration that only applies to the read() operation, not to create(), delete(), update(), etc. .. py:attribute:: table_name :type: str .. py:attribute:: purge :type: PurgeSettings Purge-specific settings. Only applies to the purge() operation. .. py:attribute:: read :type: ReadSettings Read-specific settings. Only applies to the read() operation. By default, read() will use read without filter. .. py:class:: AzureLinkedService Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.linked_service.LinkedService`\ [\ :py:obj:`AzureLinkedServiceSettingsType`\ ], :py:obj:`Generic`\ [\ :py:obj:`AzureLinkedServiceSettingsType`\ ] Linked service for connecting to AzureLinkedService. .. py:attribute:: settings :type: AzureLinkedServiceSettingsType .. py:attribute:: _blob_service_client :type: azure.storage.blob.BlobServiceClient | None :value: None .. py:attribute:: _table_service_client :type: azure.data.tables.TableServiceClient | None :value: None .. py:attribute:: _credential :type: azure.core.credentials.AzureNamedKeyCredential | None :value: None .. py:method:: check_settings_is_set() -> None Check if settings are set correctly. :returns: None :raises AttributeError: If settings are not set correctly. .. py:property:: type :type: ds_provider_azure_py_lib.enums.ResourceType Get the type of the linked service. :returns: ResourceType .. py:property:: connection :type: AzureLinkedServiceConnection Get the connection object for Azure StorageAccount. :returns: AzureLinkedServiceConnection .. py:property:: blob_service_client :type: azure.storage.blob.BlobServiceClient Get the BlobServiceClient instance. :returns: BlobServiceClient :raises ConnectionError: If blob service client is not connected. .. py:property:: table_service_client :type: azure.data.tables.TableServiceClient Get the TableServiceClient instance. :returns: TableServiceClient :raises ConnectionError: If table service client is not connected. .. py:method:: get_blob_service() -> azure.storage.blob.BlobServiceClient Connect to Azure Blob StorageAccount. :returns: BlobServiceClient .. py:method:: get_table_service() -> azure.data.tables.TableServiceClient Connect to Azure Table StorageAccount. :returns: TableServiceClient .. py:method:: connect() -> None Connect to Azure Storage (Blob and Table), ensuring both service clients are initialized. :returns: None .. py:method:: test_connection() -> tuple[bool, str] Test the connection to Azure Storage (Blob or Table). :returns: tuple[bool, str] .. py:method:: close() -> None No need to close the linked service. Just to comply with the interface. :returns: None .. py:method:: __enter__() -> AzureLinkedService[AzureLinkedServiceSettingsType] Enter context manager. :returns: Returns self for use in with statement. :rtype: AzureLinkedService .. py:method:: __exit__(exc_type: object, exc_val: object, exc_tb: object) -> None Exit context manager and close the connection. :param exc_type: Exception type if an exception occurred. :param exc_val: Exception value if an exception occurred. :param exc_tb: Exception traceback if an exception occurred. :returns: None .. py:class:: AzureLinkedServiceSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.linked_service.LinkedServiceSettings` The object containing the Azure linked service settings. .. py:attribute:: account_name :type: str .. py:attribute:: access_key :type: str .. py:data:: __version__