ds_provider_simployer_py_lib.dataset

File: __init__.py Region: ds_provider_simployer_py_lib/dataset

Description

This module implements a dataset for Simployer APIs, focusing on Simployer-specific data products and parameters rather than generic HTTP concerns.

Includes custom serializers/deserializers tailored to Simployer’s API contract.

Submodules

Classes

SimployerDataset

Tabular dataset object which identifies data within a data store,

SimployerDatasetSettings

The object containing the settings of the dataset.

Package Contents

class ds_provider_simployer_py_lib.dataset.SimployerDataset[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[SimployerLinkedServiceType, SimployerDatasetSettingsType, ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer, ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer], Generic[SimployerLinkedServiceType, SimployerDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: SimployerLinkedServiceType
settings: SimployerDatasetSettingsType
serializer: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None
deserializer: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None
property type: ds_provider_simployer_py_lib.enums.ResourceType

Get the type of the dataset.

property supports_checkpoint: bool

Whether this provider supports incremental loads via self.checkpoint.

This implementation uses a simple dictionary-based checkpoint structure to support resuming paginated reads:

  • On a full load, self.checkpoint is expected to be empty ({}) or None. In this case, read() starts from page 1.

  • After each successfully read page, read() sets self.checkpoint = {"last_page": page}, where page is the last completed page number.

  • On a subsequent run, if self.checkpoint contains a "last_page" entry, read() resumes from last_page + 1 and continues fetching data from the Simployer API.

This allows consumers to perform incremental loads by persisting and reusing the checkpoint between executions, avoiding re-reading pages that were already processed successfully.

Returns:

True if checkpointing is supported, False otherwise.

Return type:

bool

read() None[source]

Read data from the requested endpoint of the Simployer API.

Raises:

ReadError – If reading data fails.

_read_single_resource(session: Any) None[source]

Fetch a single resource by ID.

_read_collection(session: Any) None[source]

Fetch collection with pagination and update checkpoint with max ‘updated’ value.

create() None[source]

Insert rows into Simployer API.

Reads from self.input (which must be a pandas DataFrame) and POSTs to the configured endpoint. Results are stored in self.output.

Input Requirement:
  • self.input must be a pandas DataFrame with columns matching the Simployer endpoint schema.

  • Only one record per create() call is allowed (one row in the DataFrame).

  • Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

Example

import pandas as pd data = {

“firstName”: “John”, “lastName”: “Doe”, “primaryEmail”: “john.doe@example.com”, “affiliatedOrganizationId”: “org-12345”

} dataset.input = pd.DataFrame([data]) dataset.create()

The caller can also provide raw data and use the deserializer to convert:

dataset.input = dataset.deserializer.deserialize(raw_bytes) dataset.create()

Raises:
  • NotSupportedError – If the configured data product does not support create (POST).

  • CreateError – If creating records fails.

delete() None[source]

Delete rows from Simployer API.

Reads from self.input (which must be a pandas DataFrame) and issues a DELETE to the configured endpoint. Results are stored in self.output.

Capacity Limit:

The Simployer API accepts only 1 record per DELETE request. If self.input contains more than 1 row, this method raises DeleteError. The caller must batch: split self.input into single-row chunks and call delete() once per chunk.

Input Requirement:
  • self.input must be a pandas DataFrame.

  • Only one record per delete() call is allowed (one row in the DataFrame).

  • Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

Example

import pandas as pd dataset.input = pd.DataFrame([{“id”: “12345”}]) dataset.delete()

Per contract:
  • Empty input is a no-op (returns immediately without contacting the backend).

  • Deleting a row that does not exist is not an error and is idempotent.

Raises:
  • NotSupportedError – If the configured data product does not support delete (DELETE).

  • DeleteError – If the input exceeds capacity (more than 1 row) or if deletion fails.

update() None[source]

Update an existing row in Simployer API.

Reads from self.input (which must be a pandas DataFrame) and PUTs to the configured endpoint. Results are stored in self.output.

Capacity Limit:

The Simployer API accepts only 1 record per PUT request. If self.input contains more than 1 row, this method raises UpdateError. The caller must batch: split self.input into single-row chunks and call update() once per chunk.

Input Requirement:
  • self.input must be a pandas DataFrame with one row.

  • Only one record per update() call is allowed (one row in the DataFrame).

  • Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

Example

import pandas as pd dataset.input = pd.DataFrame([{“id”: “12345”, “status”: “active”}]) dataset.update()

Per contract:
  • Empty input is a no-op (returns immediately without contacting the backend).

  • Must not insert new rows (non-existent resources result in an error).

  • Idempotent: Yes. Updating a row to the same values has no effect.

Raises:
  • NotSupportedError – If the configured data product does not support update (PUT).

  • UpdateError – If the input exceeds capacity (more than 1 row) or if the update fails.

close() None[source]

Release any resources held by the dataset.

For Simployer, the dataset holds no resources directly. Connection lifecycle is managed by the linked service.

rename() None[source]

Rename the resource in the backend. Atomic. Not idempotent.

Raises:
  • RenameError – If the operation fails.

  • NotSupportedError – If the provider does not support renaming.

See also

Full contract: docs/DATASET_CONTRACT.mdrename()

list() None[source]

Discover available resources and populate self.output with a DataFrame of resources and their metadata. Idempotent.

Raises:
  • ListError – If the operation fails.

  • NotSupportedError – If the provider does not support listing.

See also

Full contract: docs/DATASET_CONTRACT.mdlist()

upsert() None[source]

Insert rows that do not exist, update rows that do, matched by identity columns defined in self.settings. Atomic.

Raises:
  • UpsertError – If the operation fails.

  • NotSupportedError – If the provider does not support upsert.

See also

Full contract: docs/DATASET_CONTRACT.mdupsert()

purge() None[source]

Remove all content from the target. self.input is not used. Atomic. Idempotent.

Raises:
  • PurgeError – If the operation fails.

  • NotSupportedError – If the provider does not support purge.

See also

Full contract: docs/DATASET_CONTRACT.mdpurge()

_build_params(page: int) dict[str, Any][source]

Build query parameters for the API request.

_build_checkpoint(last_page: int) dict[str, Any][source]

Build checkpoint dictionary for incremental load support.

Sets from_date to the ISO-8601 string representation of the maximum ‘updated’ value in self.output if available. This ensures the checkpoint is JSON-serializable.

Checkpoint structure:
{

“last_page”: int, “page_size”: int, “to_date”: str | None (ISO-8601 format), “from_date”: str | None (ISO-8601 format), “data_product”: str

}

_build_url(data_product: ds_provider_simployer_py_lib.enums.SimployerDataProducts, mode: str = 'read') str[source]

Construct the API endpoint URL based on the data product and operation mode. Handles path parameters for read, create, update, and delete operations. :param data_product: The SimployerDataProducts enum value indicating which API endpoint to target. :param mode: Operation mode (“read”, “create”, “update”, “delete”). :return: The full URL for the API request. :raises ReadError: If mode is “read” and endpoint is missing or parameter cannot be resolved. :raises CreateError: If mode is “create” and endpoint is missing or parameter cannot be resolved. :raises UpdateError: If mode is “update” and endpoint is missing or parameter cannot be resolved. :raises DeleteError: If mode is “delete” and endpoint is missing or parameter cannot be resolved.

class ds_provider_simployer_py_lib.dataset.SimployerDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

The object containing the settings of the dataset.

data_product: ds_provider_simployer_py_lib.enums.SimployerDataProducts | None = None

Data product associated with this dataset (e.g., “employees”).

Used to determine the API endpoint and other settings.

read: ReadSettings

Settings for read().