ds_provider_azure_py_lib.dataset.blob¶
File: blob.py
Region: ds_provider_azure_py_lib/dataset/blob
Azure Blob Dataset
This module implements a blob dataset for azure.
Example
>>> azure_blob = AzureBlob(
... deserializer=AzureBlobDeserializer(format=DatasetStorageFormatType.CSV),
... serializer=AzureBlobSerializer(format=DatasetStorageFormatType.CSV),
... settings=AzureBlobDatasetSettings(
... container_name="my-container",
... blob_name="path/to/example_file.csv",
... prefix=None, # for multiple blobs, provide a prefix instead of blob_name
... create=CreateSettings(
... overwrite_blob_if_exists=True, # overwrite the blob that already exists or raise an error
... new_container=True # create a new container or raise an error
... ),
... delete=DeleteSettings(
... delete_container=True # confirm deletion of the container via delete() method
... ),
... ),
... linked_service=AzureLinkedService(
... settings=AzureLinkedServiceSettings(
... account_name="account name",
... access_key="access key"
... ),
... id=uuid.uuid4(),
... name="testazurepackage",
... version="0.0.1",
... description="testazurepackage",
... ),
... id=uuid.uuid4(),
... name="testazurepackage",
... version="0.0.1",
... description="testazurepackage"
... )
>>> azure_blob.read()
>>> blob_data = azure_blob.output
Attributes¶
Classes¶
Settings for create operations. |
|
Settings for purge operations |
|
Settings for Azure Blob Storage dataset operations. |
|
Tabular dataset object which identifies data within a data store, |
Module Contents¶
- ds_provider_azure_py_lib.dataset.blob.logger¶
- class ds_provider_azure_py_lib.dataset.blob.CreateSettings[source]¶
Settings for create operations.
- overwrite_blob_if_exists: bool = True¶
controls whether to overwrite an existing blob in case of name conflict. If True, the create operation will overwrite the existing blob with the new content. If False, the create operation will raise an error if a blob with the same name already exists.
- new_container: bool = True¶
confirm creation of a new container if it does not exist already. If True, the create operation will attempt to create the container if it does not exist. If False, the create operation will raise an error if the container does not exist.
- class ds_provider_azure_py_lib.dataset.blob.PurgeSettings[source]¶
Settings for purge operations
- delete_container: bool = False¶
Confirm deletion of the entire container when purge() is called. If True, delete() will delete the container. If False, delete() will remove all blobs from the container but keep the container itself.
- class ds_provider_azure_py_lib.dataset.blob.AzureBlobDatasetSettings[source]¶
Bases:
ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettingsSettings for Azure Blob Storage dataset operations.
Exactly one of blob_name or prefix must be provided for read()/delete(); if specifying both, only blob_name will be considered. prefix is not used for create(); it can be called only with blob_name. create by default (if not passed) will attempt to create the container if it does not exist. delete() removes specific blob(s) by name or prefix.
- container_name: str¶
- blob_name: str | None = None¶
- prefix: str | None = None¶
- create: CreateSettings¶
- purge: PurgeSettings¶
- ds_provider_azure_py_lib.dataset.blob.AzureBlobDatasetSettingsType¶
- ds_provider_azure_py_lib.dataset.blob.AzureLinkedServiceType¶
- class ds_provider_azure_py_lib.dataset.blob.AzureBlob[source]¶
Bases:
ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset[AzureLinkedServiceType,AzureBlobDatasetSettingsType,ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer,ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer],Generic[AzureLinkedServiceType,AzureBlobDatasetSettingsType]Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.
The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.
- linked_service: AzureLinkedServiceType¶
- settings: AzureBlobDatasetSettingsType¶
- serializer: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None¶
- deserializer: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None¶
- property type: ds_provider_azure_py_lib.enums.ResourceType¶
Get the type of the dataset.
- Returns:
ResourceType
- _list_blobs(prefix: str) azure.core.paging.ItemPaged[azure.storage.blob.BlobProperties][source]¶
List all blobs in the container with a specific prefix.
- Parameters:
prefix – a string prefix to match one or multiple blobs.
- Returns:
An iterable of BlobProperties matching the prefix.
- Return type:
ItemPaged[BlobProperties]
- _read_blob(blob: str) pandas.DataFrame[source]¶
Read a specific blob in the container.
- Parameters:
blob – name of the blob to read.
- Returns:
content of the blob as a DataFrame.
- Return type:
pd.DataFrame
- _read_blobs(prefix: str) pandas.DataFrame[source]¶
Read all blobs in the container with a specific prefix.
- Parameters:
prefix – a string prefix to match one or multiple blobs.
- Returns:
Content of all blobs concatenated as a DataFrame.
- Return type:
pd.DataFrame
- _create_container() None[source]¶
Create a container in the Azure Blob Storage.
- Raises:
CreateError – If the container creation fails.
- Returns:
None
- _create_blob(stream: bytes, blob: str) None[source]¶
Create a specific blob in the container.
- Parameters:
stream – data stream to upload to the blob.
blob – name of the blob to create.
- Raises:
CreateError – If the blob creation fails.
- Returns:
None
- _delete_blob(blob: str) pandas.DataFrame[source]¶
Delete a specific blob in the container.
- Parameters:
blob – name of the blob to delete.
- Returns:
Empty DataFrame upon successful deletion.
- Return type:
pd.DataFrame
- Raises:
DeleteError – If the blob deletion fails.
- _delete_blobs(prefix: str) pandas.DataFrame[source]¶
Delete all blobs in the container with a specific prefix.
- Parameters:
prefix – a string prefix to match one or multiple blobs.
- Returns:
Empty DataFrame upon successful deletion of all blobs.
- Return type:
pd.DataFrame
- Raises:
DeleteError – If one or more blob deletions fail.
- read(**_kwargs: Any) None[source]¶
Read Azure Blob Storage dataset.
- Parameters:
_kwargs – Additional keyword arguments to pass to the request.
- Returns:
None
- Raises:
ReadError – If reading the blob(s) fails.
- create(**_kwargs: Any) None[source]¶
Create a blob in the container
- Parameters:
_kwargs – Additional keyword arguments to pass to the request. (not used)
- Returns:
None
- Raises:
CreateError – If the blob creation fails.
- update() NoReturn[source]¶
Update existing rows in the target matched by identity columns defined in
self.settings. Atomic. Must not insert new rows.- Raises:
UpdateError – If the operation fails.
NotSupportedError – If the provider does not support update.
See also
Full contract:
docs/DATASET_CONTRACT.md–update()
- list() NoReturn[source]¶
Discover available resources and populate
self.outputwith a DataFrame of resources and their metadata. Idempotent.- Raises:
ListError – If the operation fails.
NotSupportedError – If the provider does not support listing.
See also
Full contract:
docs/DATASET_CONTRACT.md–list()
- purge(**_kwargs: Any) None[source]¶
Purge (remove all content from) the container.
For Azure Blob Storage, this deletes all blobs from the container, leaving the container empty. The container itself is not deleted.
- Parameters:
_kwargs – Additional keyword arguments to pass to the request. (not used)
- Returns:
None
- Raises:
DeleteError – If the purge operation fails.
- upsert() NoReturn[source]¶
Insert rows that do not exist, update rows that do, matched by identity columns defined in
self.settings. Atomic.- Raises:
UpsertError – If the operation fails.
NotSupportedError – If the provider does not support upsert.
See also
Full contract:
docs/DATASET_CONTRACT.md–upsert()
- delete(**_kwargs: Any) None[source]¶
Delete specific blob(s) or the entire container from Azure Blob Storage.
For Azure Blob Storage, a “row” is a blob. This method deletes: - Specific blob by blob_name - Multiple blobs by prefix - Entire container if delete_container=True and no blob_name/prefix provided
- Parameters:
_kwargs – Additional keyword arguments to pass to the request. (not used)
- Returns:
None
- Raises:
DeleteError – If the deletion fails or requirements not met.
- rename() NoReturn[source]¶
Rename the resource in the backend. Atomic. Not idempotent.
- Raises:
RenameError – If the operation fails.
NotSupportedError – If the provider does not support renaming.
See also
Full contract:
docs/DATASET_CONTRACT.md–rename()
- close() None[source]¶
No need to close the linked service. Just to comply with the interface.
- Returns:
None