ds_provider_azure_py_lib.dataset.blob ===================================== .. py:module:: ds_provider_azure_py_lib.dataset.blob .. autoapi-nested-parse:: **File:** ``blob.py`` **Region:** ``ds_provider_azure_py_lib/dataset/blob`` Azure Blob Dataset This module implements a blob dataset for azure. .. rubric:: Example >>> azure_blob = AzureBlob( ... deserializer=AzureBlobDeserializer(format=DatasetStorageFormatType.CSV), ... serializer=AzureBlobSerializer(format=DatasetStorageFormatType.CSV), ... settings=AzureBlobDatasetSettings( ... container_name="my-container", ... blob_name="path/to/example_file.csv", ... prefix=None, # for multiple blobs, provide a prefix instead of blob_name ... create=CreateSettings( ... overwrite_blob_if_exists=True, # overwrite the blob that already exists or raise an error ... new_container=True # create a new container or raise an error ... ), ... delete=DeleteSettings( ... delete_container=True # confirm deletion of the container via delete() method ... ), ... ), ... linked_service=AzureLinkedService( ... settings=AzureLinkedServiceSettings( ... account_name="account name", ... access_key="access key" ... ), ... id=uuid.uuid4(), ... name="testazurepackage", ... version="0.0.1", ... description="testazurepackage", ... ), ... id=uuid.uuid4(), ... name="testazurepackage", ... version="0.0.1", ... description="testazurepackage" ... ) >>> azure_blob.read() >>> blob_data = azure_blob.output Attributes ---------- .. autoapisummary:: ds_provider_azure_py_lib.dataset.blob.logger ds_provider_azure_py_lib.dataset.blob.AzureBlobDatasetSettingsType ds_provider_azure_py_lib.dataset.blob.AzureLinkedServiceType Classes ------- .. autoapisummary:: ds_provider_azure_py_lib.dataset.blob.CreateSettings ds_provider_azure_py_lib.dataset.blob.PurgeSettings ds_provider_azure_py_lib.dataset.blob.AzureBlobDatasetSettings ds_provider_azure_py_lib.dataset.blob.AzureBlob Module Contents --------------- .. py:data:: logger .. py:class:: CreateSettings Settings for create operations. .. py:attribute:: overwrite_blob_if_exists :type: bool :value: True controls whether to overwrite an existing blob in case of name conflict. If True, the create operation will overwrite the existing blob with the new content. If False, the create operation will raise an error if a blob with the same name already exists. .. py:attribute:: new_container :type: bool :value: True confirm creation of a new container if it does not exist already. If True, the create operation will attempt to create the container if it does not exist. If False, the create operation will raise an error if the container does not exist. .. py:class:: PurgeSettings Settings for purge operations .. py:attribute:: delete_container :type: bool :value: False Confirm deletion of the entire container when purge() is called. If True, delete() will delete the container. If False, delete() will remove all blobs from the container but keep the container itself. .. py:class:: AzureBlobDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for Azure Blob Storage dataset operations. Exactly one of `blob_name` or `prefix` must be provided for read()/delete(); if specifying both, only `blob_name` will be considered. `prefix` is not used for create(); it can be called only with `blob_name`. `create` by default (if not passed) will attempt to create the container if it does not exist. `delete()` removes specific blob(s) by name or prefix. .. py:attribute:: container_name :type: str .. py:attribute:: blob_name :type: str | None :value: None .. py:attribute:: prefix :type: str | None :value: None .. py:attribute:: create :type: CreateSettings .. py:attribute:: purge :type: PurgeSettings .. py:data:: AzureBlobDatasetSettingsType .. py:data:: AzureLinkedServiceType .. py:class:: AzureBlob Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureBlobDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`AzureLinkedServiceType`\ , :py:obj:`AzureBlobDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: AzureLinkedServiceType .. py:attribute:: settings :type: AzureBlobDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None .. py:property:: type :type: ds_provider_azure_py_lib.enums.ResourceType Get the type of the dataset. :returns: ResourceType .. py:method:: _list_blobs(prefix: str) -> azure.core.paging.ItemPaged[azure.storage.blob.BlobProperties] List all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: An iterable of BlobProperties matching the prefix. :rtype: ItemPaged[BlobProperties] .. py:method:: _read_blob(blob: str) -> pandas.DataFrame Read a specific blob in the container. :param blob: name of the blob to read. :returns: content of the blob as a DataFrame. :rtype: pd.DataFrame .. py:method:: _read_blobs(prefix: str) -> pandas.DataFrame Read all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: Content of all blobs concatenated as a DataFrame. :rtype: pd.DataFrame .. py:method:: _create_container() -> None Create a container in the Azure Blob Storage. :raises CreateError: If the container creation fails. :returns: None .. py:method:: _create_blob(stream: bytes, blob: str) -> None Create a specific blob in the container. :param stream: data stream to upload to the blob. :param blob: name of the blob to create. :raises CreateError: If the blob creation fails. :returns: None .. py:method:: _delete_blob(blob: str) -> pandas.DataFrame Delete a specific blob in the container. :param blob: name of the blob to delete. :returns: Empty DataFrame upon successful deletion. :rtype: pd.DataFrame :raises DeleteError: If the blob deletion fails. .. py:method:: _delete_blobs(prefix: str) -> pandas.DataFrame Delete all blobs in the container with a specific prefix. :param prefix: a string prefix to match one or multiple blobs. :returns: Empty DataFrame upon successful deletion of all blobs. :rtype: pd.DataFrame :raises DeleteError: If one or more blob deletions fail. .. py:method:: read(**_kwargs: Any) -> None Read Azure Blob Storage dataset. :param _kwargs: Additional keyword arguments to pass to the request. :returns: None :raises ReadError: If reading the blob(s) fails. .. py:method:: create(**_kwargs: Any) -> None Create a blob in the container :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises CreateError: If the blob creation fails. .. py:method:: update() -> NoReturn Update existing rows in the target matched by identity columns defined in ``self.settings``. Atomic. Must not insert new rows. :raises UpdateError: If the operation fails. :raises NotSupportedError: If the provider does not support update. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``update()`` .. py:method:: list() -> NoReturn Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: purge(**_kwargs: Any) -> None Purge (remove all content from) the container. For Azure Blob Storage, this deletes all blobs from the container, leaving the container empty. The container itself is not deleted. :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises DeleteError: If the purge operation fails. .. py:method:: upsert() -> NoReturn Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: delete(**_kwargs: Any) -> None Delete specific blob(s) or the entire container from Azure Blob Storage. For Azure Blob Storage, a "row" is a blob. This method deletes: - Specific blob by blob_name - Multiple blobs by prefix - Entire container if delete_container=True and no blob_name/prefix provided :param _kwargs: Additional keyword arguments to pass to the request. (not used) :returns: None :raises DeleteError: If the deletion fails or requirements not met. .. py:method:: rename() -> NoReturn Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: close() -> None No need to close the linked service. Just to comply with the interface. :returns: None .. py:method:: concat(dfs: list[pandas.DataFrame]) -> pandas.DataFrame :staticmethod: concatenate a list of dataframes into a single dataframe. :param dfs: DataFrames to concatenate. :returns: Concatenated DataFrame or empty DataFrame if input list is empty. :rtype: DataFrame .. py:method:: get_details() -> dict[str, Any] Get details of the dataset. :returns: Details of the dataset. :rtype: Dict[str, Any]