ds_provider_azure_py_lib

File: __init__.py Region: ds-provider-azure-py-lib

Description

A Python package from the ds-provider-azure-py-lib library.

Example

from ds_provider_azure_py_lib import __version__

print(f"Package version: {__version__}")

Submodules

Attributes

__version__

Classes

AzureBlob

Tabular dataset object which identifies data within a data store,

AzureBlobDatasetSettings

Settings for Azure Blob Storage dataset operations.

AzureTable

Tabular dataset object which identifies data within a data store,

AzureTableDatasetSettings

Settings for Azure Table Storage dataset operations.

AzureLinkedService

Linked service for connecting to AzureLinkedService.

AzureLinkedServiceSettings

The object containing the Azure linked service settings.

Package Contents

class ds_provider_azure_py_lib.AzureBlob[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.base.TabularDataset[AzureLinkedServiceType, AzureBlobDatasetSettingsType, ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer, ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer], Generic[AzureLinkedServiceType, AzureBlobDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: AzureLinkedServiceType
settings: AzureBlobDatasetSettingsType
serializer: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None
deserializer: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None
property type: ds_provider_azure_py_lib.enums.ResourceType

Get the type of the dataset.

Returns:

ResourceType

_list_blobs(prefix: str) azure.core.paging.ItemPaged[azure.storage.blob.BlobProperties][source]

List all blobs in the container with a specific prefix.

Parameters:

prefix – a string prefix to match one or multiple blobs.

Returns:

An iterable of BlobProperties matching the prefix.

Return type:

ItemPaged[BlobProperties]

_read_blob(blob: str) pandas.DataFrame[source]

Read a specific blob in the container.

Parameters:

blob – name of the blob to read.

Returns:

content of the blob as a DataFrame.

Return type:

pd.DataFrame

_read_blobs(prefix: str) pandas.DataFrame[source]

Read all blobs in the container with a specific prefix.

Parameters:

prefix – a string prefix to match one or multiple blobs.

Returns:

Content of all blobs concatenated as a DataFrame.

Return type:

pd.DataFrame

_create_container() None[source]

Create a container in the Azure Blob Storage.

Raises:

CreateError – If the container creation fails.

Returns:

None

_create_blob(stream: bytes, blob: str) None[source]

Create a specific blob in the container.

Parameters:
  • stream – data stream to upload to the blob.

  • blob – name of the blob to create.

Raises:

CreateError – If the blob creation fails.

Returns:

None

_delete_blob(blob: str) pandas.DataFrame[source]

Delete a specific blob in the container.

Parameters:

blob – name of the blob to delete.

Returns:

Empty DataFrame upon successful deletion.

Return type:

pd.DataFrame

Raises:

DeleteError – If the blob deletion fails.

_delete_blobs(prefix: str) pandas.DataFrame[source]

Delete all blobs in the container with a specific prefix.

Parameters:

prefix – a string prefix to match one or multiple blobs.

Returns:

Empty DataFrame upon successful deletion of all blobs.

Return type:

pd.DataFrame

Raises:

DeleteError – If one or more blob deletions fail.

read(**_kwargs: Any) None[source]

Read Azure Blob Storage dataset.

Parameters:

_kwargs – Additional keyword arguments to pass to the request.

Returns:

None

Raises:

ReadError – If reading the blob(s) fails.

create(**_kwargs: Any) None[source]

Create a blob in the container

Parameters:

_kwargs – Additional keyword arguments to pass to the request. (not used)

Returns:

None

Raises:

CreateError – If the blob creation fails.

update() NoReturn[source]

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:
  • UpdateError – If the operation fails.

  • NotSupportedError – If the provider does not support update.

See also

Full contract: docs/DATASET_CONTRACT.mdupdate()

list() NoReturn[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:
  • ListError – If the operation fails.

  • NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

purge(**_kwargs: Any) None[source]

Purge (remove all content from) the container.

For Azure Blob Storage, this deletes all blobs from the container, leaving the container empty. The container itself is not deleted.

Parameters:

_kwargs – Additional keyword arguments to pass to the request. (not used)

Returns:

None

Raises:

DeleteError – If the purge operation fails.

upsert() NoReturn[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:
  • UpsertError – If the operation fails.

  • NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

delete(**_kwargs: Any) None[source]

Delete specific blob(s) or the entire container from Azure Blob Storage.

For Azure Blob Storage, a “row” is a blob. This method deletes: - Specific blob by blob_name - Multiple blobs by prefix - Entire container if delete_container=True and no blob_name/prefix provided

Parameters:

_kwargs – Additional keyword arguments to pass to the request. (not used)

Returns:

None

Raises:

DeleteError – If the deletion fails or requirements not met.

rename() NoReturn[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:
  • RenameError – If the operation fails.

  • NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

close() None[source]

No need to close the linked service. Just to comply with the interface.

Returns:

None

static concat(dfs: list[pandas.DataFrame]) pandas.DataFrame[source]

concatenate a list of dataframes into a single dataframe.

Parameters:

dfs – DataFrames to concatenate.

Returns:

Concatenated DataFrame or empty DataFrame if input list is empty.

Return type:

DataFrame

get_details() dict[str, Any][source]

Get details of the dataset.

Returns:

Details of the dataset.

Return type:

Dict[str, Any]

class ds_provider_azure_py_lib.AzureBlobDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

Settings for Azure Blob Storage dataset operations.

Exactly one of blob_name or prefix must be provided for read()/delete(); if specifying both, only blob_name will be considered. prefix is not used for create(); it can be called only with blob_name. create by default (if not passed) will attempt to create the container if it does not exist. delete() removes specific blob(s) by name or prefix.

container_name: str
blob_name: str | None = None
prefix: str | None = None
create: CreateSettings
purge: PurgeSettings
class ds_provider_azure_py_lib.AzureTable[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[AzureLinkedServiceType, AzureTableDatasetSettingsType, ds_provider_azure_py_lib.serde.AzureTableSerializer, ds_provider_azure_py_lib.serde.AzureTableDeserializer], Generic[AzureLinkedServiceType, AzureTableDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: AzureLinkedServiceType
settings: AzureTableDatasetSettingsType
__post_init__() None[source]
property type: ds_provider_azure_py_lib.enums.ResourceType

Get the type of the Dataset.

Returns:

ResourceType

_prepare_content(content: pandas.DataFrame) dict[str, Any][source]

Ensure that the content is provided and is in the correct format.

Parameters:

content (pd.DataFrame) – The content to prepare.

Returns:

The prepared content.

Return type:

dict

Raises:

DatasetException – If the content is not a DataFrame, is empty, or does not contain required columns.

_get_table_client() azure.data.tables.TableClient[source]

Return a TableClient for the currently configured table.

Returns:

TableClient

_build_transaction_from_input(operation: str, params: collections.abc.Mapping[str, Any] | None = None) list[TransactionEntry][source]

Build a list of transaction entries from self.input. operation: operation name as expected by TableClient.submit_transaction, e.g. “create”, “upsert”, “delete”

Parameters:
  • operation (str) – The operation to perform.

  • params – optional params dict passed as third item in tuple (when required) e.g. {“mode”: UpdateMode.REPLACE}

Returns:

list[TransactionEntry]

Raises:
  • CreateError – If there is an error preparing content for creation.

  • UpdateError – If there is an error preparing content for update.

  • DeleteError – If there is an error preparing content for deletion.

  • DatasetException – If there is a general error preparing content.

_submit_transaction(transaction: collections.abc.Iterable[TransactionEntry], error_cls: type[ds_resource_plugin_py_lib.common.resource.dataset.errors.DatasetException]) None[source]

Submit transaction and map TableTransactionError to provided error_type.

Parameters:
  • transaction (Iterable[TransactionEntry]) – The transaction to submit.

  • error_cls (builtins.type[DatasetException]) – The exception class to raise on error.

Raises:

error_cls – An error submitting the transaction.

_delete_table() None[source]

Deletes the entire table from Azure Table Storage.

Returns:

None

Raises:

DeleteError – If the table could not be deleted.

_create_table() None[source]

Creates a table in Azure Table Storage if it does not exist.

Returns:

None

Raises:

CreateError – If the table could not be created due to an error other than it already existing.

read(**_kwargs: Any) None[source]

Read Azure Table Storage dataset.

Parameters:

_kwargs – Additional keyword arguments

Returns:

None

Raises:

ReadError – If there is an error reading from Azure Table Storage.

create(**_kwargs: Any) None[source]

Create an entity in Azure Table Storage.

Returns:

None

Raises:

CreateError – If the entity could not be created.

update(**_kwargs: Any) None[source]

Update an entity in Azure Table Storage.

Returns:

None

delete(**_kwargs: Any) None[source]

Delete specific entities from Azure Table Storage.

Only entities specified in self.input are deleted, matched by PartitionKey and RowKey.

Parameters:

_kwargs – Additional keyword arguments

Returns:

None

Raises:

DeleteError – If there is an error deleting from Azure Table Storage.

rename() NoReturn[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:
  • RenameError – If the operation fails.

  • NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

close() None[source]

No need to close the linked service. Just to comply with the interface.

Returns:

None

list() NoReturn[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:
  • ListError – If the operation fails.

  • NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

purge(**_kwargs: Any) None[source]

Purge all entities from the table or drop the entire table.

If delete_table=True in settings, deletes the entire table. Otherwise, deletes all entities from the table, leaving it empty.

Returns:

None

Raises:

DeleteError – If there is an error purging from Azure Table Storage.

upsert(**_kwargs: Any) None[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:
  • UpsertError – If the operation fails.

  • NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

get_details() dict[str, Any][source]

Get details about the dataset.

Returns:

dict[str, Any]

class ds_provider_azure_py_lib.AzureTableDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

Settings for Azure Table Storage dataset operations.

The read settings contains read-specific configuration that only applies to the read() operation, not to create(), delete(), update(), etc.

table_name: str
purge: PurgeSettings

Purge-specific settings. Only applies to the purge() operation.

read: ReadSettings

Read-specific settings. Only applies to the read() operation.

By default, read() will use read without filter.

class ds_provider_azure_py_lib.AzureLinkedService[source]

Bases: ds_resource_plugin_py_lib.common.resource.linked_service.LinkedService[AzureLinkedServiceSettingsType], Generic[AzureLinkedServiceSettingsType]

Linked service for connecting to AzureLinkedService.

settings: AzureLinkedServiceSettingsType
_blob_service_client: azure.storage.blob.BlobServiceClient | None = None
_table_service_client: azure.data.tables.TableServiceClient | None = None
_credential: azure.core.credentials.AzureNamedKeyCredential | None = None
check_settings_is_set() None[source]

Check if settings are set correctly.

Returns:

None

Raises:

AttributeError – If settings are not set correctly.

property type: ds_provider_azure_py_lib.enums.ResourceType

Get the type of the linked service.

Returns:

ResourceType

property connection: AzureLinkedServiceConnection

Get the connection object for Azure StorageAccount.

Returns:

AzureLinkedServiceConnection

property blob_service_client: azure.storage.blob.BlobServiceClient

Get the BlobServiceClient instance.

Returns:

BlobServiceClient

Raises:

ConnectionError – If blob service client is not connected.

property table_service_client: azure.data.tables.TableServiceClient

Get the TableServiceClient instance.

Returns:

TableServiceClient

Raises:

ConnectionError – If table service client is not connected.

get_blob_service() azure.storage.blob.BlobServiceClient[source]

Connect to Azure Blob StorageAccount.

Returns:

BlobServiceClient

get_table_service() azure.data.tables.TableServiceClient[source]

Connect to Azure Table StorageAccount.

Returns:

TableServiceClient

connect() None[source]

Connect to Azure Storage (Blob and Table), ensuring both service clients are initialized.

Returns:

None

test_connection() tuple[bool, str][source]

Test the connection to Azure Storage (Blob or Table).

Returns:

tuple[bool, str]

close() None[source]

No need to close the linked service. Just to comply with the interface.

Returns:

None

__enter__() AzureLinkedService[AzureLinkedServiceSettingsType][source]

Enter context manager.

Returns:

Returns self for use in with statement.

Return type:

AzureLinkedService

__exit__(exc_type: object, exc_val: object, exc_tb: object) None[source]

Exit context manager and close the connection.

Parameters:
  • exc_type – Exception type if an exception occurred.

  • exc_val – Exception value if an exception occurred.

  • exc_tb – Exception traceback if an exception occurred.

Returns:

None

class ds_provider_azure_py_lib.AzureLinkedServiceSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.linked_service.LinkedServiceSettings

The object containing the Azure linked service settings.

account_name: str
access_key: str
ds_provider_azure_py_lib.__version__