ds_provider_azure_py_lib.dataset.table

File: table.py Region: ds_provider_azure_py_lib/dataset/table

Azure Dataset - Table Storage

This module implements a dataset for Azure Table Storage, allowing for CRUD operations on table entities using pandas DataFrames for data representation.

Example

>>> azure_table = AzureTable(
...     settings=AzureTableDatasetSettings(
...         table_name="users",
...         partition_key="partition_key",
...         row_key="row_key",
...         query_filter="additional query filter",
...         delete_table=False,
...     ),
...     linked_service=AzureLinkedService(
...         settings=AzureLinkedServiceSettings(
...             account_name="account name",
...             access_key="access key"
...         ),
...     id=uuid.uuid4(),
...     name="testazurepackage",
...     version="0.0.1",
...     description="testazurepackage",
... ),
... id=uuid.uuid4(),
... name="testazurepackage",
... version="0.0.1",
... description="testazurepackage"
... )
>>> azure_table.read()
>>> table_data = azure_table.output

Attributes

logger

TransactionEntry

AzureTableDatasetSettingsType

AzureLinkedServiceType

Classes

ReadSettings

Settings specific to the read() operation.

PurgeSettings

Settings specific to the purge() operation.

AzureTableDatasetSettings

Settings for Azure Table Storage dataset operations.

AzureTable

Tabular dataset object which identifies data within a data store,

Module Contents

ds_provider_azure_py_lib.dataset.table.logger
ds_provider_azure_py_lib.dataset.table.TransactionEntry
class ds_provider_azure_py_lib.dataset.table.ReadSettings[source]

Settings specific to the read() operation.

These settings only apply when reading data from the database and do not affect other operations like: create(), delete(), update(), or rename().

query_filter: str | None = None

An OData-compliant string to filter the entities returned by the read() operation. If None, no filter is applied and all entities are returned.

Example: “PartitionKey eq ‘{self.partition_key}’ and RowKey eq ‘{self.row_key}’”

class ds_provider_azure_py_lib.dataset.table.PurgeSettings[source]

Settings specific to the purge() operation.

These settings only apply when deleting data from the database and do not affect other operations like: create(), read(), update(), or rename().

delete_table: bool = False

If True, the entire table will be deleted when purge() is called. If False, only the table content will be deleted.

class ds_provider_azure_py_lib.dataset.table.AzureTableDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

Settings for Azure Table Storage dataset operations.

The read settings contains read-specific configuration that only applies to the read() operation, not to create(), delete(), update(), etc.

table_name: str
purge: PurgeSettings

Purge-specific settings. Only applies to the purge() operation.

read: ReadSettings

Read-specific settings. Only applies to the read() operation.

By default, read() will use read without filter.

ds_provider_azure_py_lib.dataset.table.AzureTableDatasetSettingsType
ds_provider_azure_py_lib.dataset.table.AzureLinkedServiceType
class ds_provider_azure_py_lib.dataset.table.AzureTable[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[AzureLinkedServiceType, AzureTableDatasetSettingsType, ds_provider_azure_py_lib.serde.AzureTableSerializer, ds_provider_azure_py_lib.serde.AzureTableDeserializer], Generic[AzureLinkedServiceType, AzureTableDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: AzureLinkedServiceType
settings: AzureTableDatasetSettingsType
__post_init__() None[source]
property type: ds_provider_azure_py_lib.enums.ResourceType

Get the type of the Dataset.

Returns:

ResourceType

_prepare_content(content: pandas.DataFrame) dict[str, Any][source]

Ensure that the content is provided and is in the correct format.

Parameters:

content (pd.DataFrame) – The content to prepare.

Returns:

The prepared content.

Return type:

dict

Raises:

DatasetException – If the content is not a DataFrame, is empty, or does not contain required columns.

_get_table_client() azure.data.tables.TableClient[source]

Return a TableClient for the currently configured table.

Returns:

TableClient

_build_transaction_from_input(operation: str, params: collections.abc.Mapping[str, Any] | None = None) list[TransactionEntry][source]

Build a list of transaction entries from self.input. operation: operation name as expected by TableClient.submit_transaction, e.g. “create”, “upsert”, “delete”

Parameters:
  • operation (str) – The operation to perform.

  • params – optional params dict passed as third item in tuple (when required) e.g. {“mode”: UpdateMode.REPLACE}

Returns:

list[TransactionEntry]

Raises:
  • CreateError – If there is an error preparing content for creation.

  • UpdateError – If there is an error preparing content for update.

  • DeleteError – If there is an error preparing content for deletion.

  • DatasetException – If there is a general error preparing content.

_submit_transaction(transaction: collections.abc.Iterable[TransactionEntry], error_cls: type[ds_resource_plugin_py_lib.common.resource.dataset.errors.DatasetException]) None[source]

Submit transaction and map TableTransactionError to provided error_type.

Parameters:
  • transaction (Iterable[TransactionEntry]) – The transaction to submit.

  • error_cls (builtins.type[DatasetException]) – The exception class to raise on error.

Raises:

error_cls – An error submitting the transaction.

_delete_table() None[source]

Deletes the entire table from Azure Table Storage.

Returns:

None

Raises:

DeleteError – If the table could not be deleted.

_create_table() None[source]

Creates a table in Azure Table Storage if it does not exist.

Returns:

None

Raises:

CreateError – If the table could not be created due to an error other than it already existing.

read(**_kwargs: Any) None[source]

Read Azure Table Storage dataset.

Parameters:

_kwargs – Additional keyword arguments

Returns:

None

Raises:

ReadError – If there is an error reading from Azure Table Storage.

create(**_kwargs: Any) None[source]

Create an entity in Azure Table Storage.

Returns:

None

Raises:

CreateError – If the entity could not be created.

update(**_kwargs: Any) None[source]

Update an entity in Azure Table Storage.

Returns:

None

delete(**_kwargs: Any) None[source]

Delete specific entities from Azure Table Storage.

Only entities specified in self.input are deleted, matched by PartitionKey and RowKey.

Parameters:

_kwargs – Additional keyword arguments

Returns:

None

Raises:

DeleteError – If there is an error deleting from Azure Table Storage.

rename() NoReturn[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:
  • RenameError – If the operation fails.

  • NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

close() None[source]

No need to close the linked service. Just to comply with the interface.

Returns:

None

list() NoReturn[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:
  • ListError – If the operation fails.

  • NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

purge(**_kwargs: Any) None[source]

Purge all entities from the table or drop the entire table.

If delete_table=True in settings, deletes the entire table. Otherwise, deletes all entities from the table, leaving it empty.

Returns:

None

Raises:

DeleteError – If there is an error purging from Azure Table Storage.

upsert(**_kwargs: Any) None[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:
  • UpsertError – If the operation fails.

  • NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

get_details() dict[str, Any][source]

Get details about the dataset.

Returns:

dict[str, Any]