ds_provider_grasp_py_lib.dataset

File: __init__.py Region: ds_provider_grasp_py_lib/dataset

Grasp Datasets

This module provides access to both Grasp Cart and Grasp Ingress datasets.

Example

>>> dataset = GraspCartDataset(
...     id=uuid.uuid4(),
...     name="cart-dataset",
...     version="1.0.0",
...     deserializer=PandasDeserializer(format=DatasetStorageFormatType.JSON),
...     serializer=PandasSerializer(format=DatasetStorageFormatType.JSON),
...     settings=GraspCartDatasetSettings(
...         owner_id="owner_id",
...         product_group_name="product_group_name",
...         product_name="product_name",
...         version="version",
...         include_history=True,
...     ),
...     linked_service=GraspAwsLinkedService(
...         id=uuid.uuid4(),
...         name="aws-linked-service",
...         version="1.0.0",
...         settings=GraspAwsLinkedServiceSettings(
...             access_key_id="access_key_id",
...             access_key_secret="access_key_secret",
...             region="region",
...         ),
...     ),
... )
>>> dataset.read()
>>> data = dataset.output

Submodules

Classes

GraspCartDataset

Tabular dataset object which identifies data within a data store,

GraspCartDatasetSettings

Settings for Grasp Cart dataset operations.

GraspIngressDataset

Tabular dataset object which identifies data within a data store,

GraspIngressDatasetSettings

Settings for Grasp Ingress dataset operations.

Package Contents

class ds_provider_grasp_py_lib.dataset.GraspCartDataset[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[AWSLinkedServiceType, GraspCartDatasetSettingsType, ds_resource_plugin_py_lib.common.serde.serialize.AwsWranglerSerializer, ds_resource_plugin_py_lib.common.serde.deserialize.AwsWranglerDeserializer], Generic[AWSLinkedServiceType, GraspCartDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: AWSLinkedServiceType
settings: GraspCartDatasetSettingsType
__post_init__() None[source]
property type: ds_provider_grasp_py_lib.enums.ResourceType

Get the type of the dataset.

_get_s3_path(tenant_id: str) str[source]
create() None[source]

Insert all rows in self.input into the target as a single atomic transaction. Must not delete, update, or overwrite existing data.

Raises:
  • CreateError – If the operation fails.

  • NotSupportedError – If the provider does not support create.

See also

Full contract: docs/DATASET_CONTRACT.mdcreate()

read() None[source]

Read data from the Grasp Cart dataset.

Raises:

ReadError – If the read operation fails, including when no files are found at the S3 path or when the S3 path is invalid.

delete() NoReturn[source]

Remove specific rows from the target matched by identity columns defined in self.settings. Atomic. Idempotent.

Raises:
  • DeleteError – If the operation fails.

  • NotSupportedError – If the provider does not support delete.

See also

Full contract: docs/DATASET_CONTRACT.mddelete()

update() NoReturn[source]

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:
  • UpdateError – If the operation fails.

  • NotSupportedError – If the provider does not support update.

See also

Full contract: docs/DATASET_CONTRACT.mdupdate()

upsert() NoReturn[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:
  • UpsertError – If the operation fails.

  • NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

rename() NoReturn[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:
  • RenameError – If the operation fails.

  • NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

purge() NoReturn[source]

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:
  • PurgeError – If the operation fails.

  • NotSupportedError – If the provider does not support purge.

See also

Full contract: docs/DATASET_CONTRACT.mdpurge()

list() NoReturn[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:
  • ListError – If the operation fails.

  • NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

close() None[source]

Close the dataset.

class ds_provider_grasp_py_lib.dataset.GraspCartDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

Settings for Grasp Cart dataset operations.

owner_id: str

The owner ID of the cart.

product_group_name: str

The product group name of the cart.

product_name: str

The product name of the cart.

version: str = '1.0'

The version of the cart.

include_history: bool = False

Whether to include history in the cart.

class ds_provider_grasp_py_lib.dataset.GraspIngressDataset[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[AWSLinkedServiceType, GraspIngressDatasetSettingsType, ds_resource_plugin_py_lib.common.serde.serialize.AwsWranglerSerializer, ds_resource_plugin_py_lib.common.serde.deserialize.AwsWranglerDeserializer], Generic[AWSLinkedServiceType, GraspIngressDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: AWSLinkedServiceType
settings: GraspIngressDatasetSettingsType
serializer: ds_resource_plugin_py_lib.common.serde.serialize.AwsWranglerSerializer | None
deserializer: ds_resource_plugin_py_lib.common.serde.deserialize.AwsWranglerDeserializer | None
property type: ds_provider_grasp_py_lib.enums.ResourceType

Get the type of the dataset.

_get_s3_path(tenant_id: str, session_id: str) str[source]

Get the S3 path for the Grasp Ingress dataset.

Returns:

The S3 path for the Grasp Ingress dataset.

Return type:

str

create() None[source]

Insert all rows in self.input into the target as a single atomic transaction. Must not delete, update, or overwrite existing data.

Raises:
  • CreateError – If the operation fails.

  • NotSupportedError – If the provider does not support create.

See also

Full contract: docs/DATASET_CONTRACT.mdcreate()

read() None[source]

Read data from the Grasp Ingress dataset.

Raises:

ReadError – If the read operation fails, including when no files are found at the S3 path or when the S3 path is invalid.

delete() NoReturn[source]

Remove specific rows from the target matched by identity columns defined in self.settings. Atomic. Idempotent.

Raises:
  • DeleteError – If the operation fails.

  • NotSupportedError – If the provider does not support delete.

See also

Full contract: docs/DATASET_CONTRACT.mddelete()

update() NoReturn[source]

Update existing rows in the target matched by identity columns defined in self.settings. Atomic. Must not insert new rows.

Raises:
  • UpdateError – If the operation fails.

  • NotSupportedError – If the provider does not support update.

See also

Full contract: docs/DATASET_CONTRACT.mdupdate()

upsert() NoReturn[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:
  • UpsertError – If the operation fails.

  • NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

rename() NoReturn[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:
  • RenameError – If the operation fails.

  • NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

purge() NoReturn[source]

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:
  • PurgeError – If the operation fails.

  • NotSupportedError – If the provider does not support purge.

See also

Full contract: docs/DATASET_CONTRACT.mdpurge()

list() NoReturn[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:
  • ListError – If the operation fails.

  • NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

close() None[source]

Close the dataset.

class ds_provider_grasp_py_lib.dataset.GraspIngressDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

Settings for Grasp Ingress dataset operations.