ds_provider_simployer_py_lib.dataset ==================================== .. py:module:: ds_provider_simployer_py_lib.dataset .. autoapi-nested-parse:: **File:** ``__init__.py`` **Region:** ``ds_provider_simployer_py_lib/dataset`` Description ----------- This module implements a dataset for Simployer APIs, focusing on Simployer-specific data products and parameters rather than generic HTTP concerns. Includes custom serializers/deserializers tailored to Simployer's API contract. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_provider_simployer_py_lib/dataset/simployer/index Classes ------- .. autoapisummary:: ds_provider_simployer_py_lib.dataset.SimployerDataset ds_provider_simployer_py_lib.dataset.SimployerDatasetSettings Package Contents ---------------- .. py:class:: SimployerDataset Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`SimployerLinkedServiceType`\ , :py:obj:`SimployerDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`SimployerLinkedServiceType`\ , :py:obj:`SimployerDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: SimployerLinkedServiceType .. py:attribute:: settings :type: SimployerDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None .. py:property:: type :type: ds_provider_simployer_py_lib.enums.ResourceType Get the type of the dataset. .. py:property:: supports_checkpoint :type: bool Whether this provider supports incremental loads via ``self.checkpoint``. This implementation uses a simple dictionary-based checkpoint structure to support resuming paginated reads: - On a full load, ``self.checkpoint`` is expected to be empty (``{}``) or ``None``. In this case, :meth:`read` starts from page ``1``. - After each successfully read page, :meth:`read` sets ``self.checkpoint = {"last_page": page}``, where ``page`` is the last completed page number. - On a subsequent run, if ``self.checkpoint`` contains a ``"last_page"`` entry, :meth:`read` resumes from ``last_page + 1`` and continues fetching data from the Simployer API. This allows consumers to perform incremental loads by persisting and reusing the checkpoint between executions, avoiding re-reading pages that were already processed successfully. :returns: True if checkpointing is supported, False otherwise. :rtype: bool .. py:method:: read() -> None Read data from the requested endpoint of the Simployer API. :raises ReadError: If reading data fails. .. py:method:: _read_single_resource(session: Any) -> None Fetch a single resource by ID. .. py:method:: _read_collection(session: Any) -> None Fetch collection with pagination and update checkpoint with max 'updated' value. .. py:method:: create() -> None Insert rows into Simployer API. Reads from self.input (which must be a pandas DataFrame) and POSTs to the configured endpoint. Results are stored in self.output. Input Requirement: - self.input must be a pandas DataFrame with columns matching the Simployer endpoint schema. - Only one record per create() call is allowed (one row in the DataFrame). - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input. .. rubric:: Example import pandas as pd data = { "firstName": "John", "lastName": "Doe", "primaryEmail": "john.doe@example.com", "affiliatedOrganizationId": "org-12345" } dataset.input = pd.DataFrame([data]) dataset.create() The caller can also provide raw data and use the deserializer to convert: dataset.input = dataset.deserializer.deserialize(raw_bytes) dataset.create() :raises NotSupportedError: If the configured data product does not support create (POST). :raises CreateError: If creating records fails. .. py:method:: delete() -> None Delete rows from Simployer API. Reads from self.input (which must be a pandas DataFrame) and issues a DELETE to the configured endpoint. Results are stored in self.output. Capacity Limit: The Simployer API accepts only 1 record per DELETE request. If self.input contains more than 1 row, this method raises DeleteError. The caller must batch: split self.input into single-row chunks and call delete() once per chunk. Input Requirement: - self.input must be a pandas DataFrame. - Only one record per delete() call is allowed (one row in the DataFrame). - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input. .. rubric:: Example import pandas as pd dataset.input = pd.DataFrame([{"id": "12345"}]) dataset.delete() Per contract: - Empty input is a no-op (returns immediately without contacting the backend). - Deleting a row that does not exist is not an error and is idempotent. :raises NotSupportedError: If the configured data product does not support delete (DELETE). :raises DeleteError: If the input exceeds capacity (more than 1 row) or if deletion fails. .. py:method:: update() -> None Update an existing row in Simployer API. Reads from self.input (which must be a pandas DataFrame) and PUTs to the configured endpoint. Results are stored in self.output. Capacity Limit: The Simployer API accepts only 1 record per PUT request. If self.input contains more than 1 row, this method raises UpdateError. The caller must batch: split self.input into single-row chunks and call update() once per chunk. Input Requirement: - self.input must be a pandas DataFrame with one row. - Only one record per update() call is allowed (one row in the DataFrame). - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input. .. rubric:: Example import pandas as pd dataset.input = pd.DataFrame([{"id": "12345", "status": "active"}]) dataset.update() Per contract: - Empty input is a no-op (returns immediately without contacting the backend). - Must not insert new rows (non-existent resources result in an error). - Idempotent: Yes. Updating a row to the same values has no effect. :raises NotSupportedError: If the configured data product does not support update (PUT). :raises UpdateError: If the input exceeds capacity (more than 1 row) or if the update fails. .. py:method:: close() -> None Release any resources held by the dataset. For Simployer, the dataset holds no resources directly. Connection lifecycle is managed by the linked service. .. py:method:: rename() -> None Rename the resource in the backend. Atomic. Not idempotent. :raises RenameError: If the operation fails. :raises NotSupportedError: If the provider does not support renaming. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()`` .. py:method:: list() -> None Discover available resources and populate ``self.output`` with a DataFrame of resources and their metadata. Idempotent. :raises ListError: If the operation fails. :raises NotSupportedError: If the provider does not support listing. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()`` .. py:method:: upsert() -> None Insert rows that do not exist, update rows that do, matched by identity columns defined in ``self.settings``. Atomic. :raises UpsertError: If the operation fails. :raises NotSupportedError: If the provider does not support upsert. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()`` .. py:method:: purge() -> None Remove all content from the target. ``self.input`` is not used. Atomic. Idempotent. :raises PurgeError: If the operation fails. :raises NotSupportedError: If the provider does not support purge. .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``purge()`` .. py:method:: _build_params(page: int) -> dict[str, Any] Build query parameters for the API request. .. py:method:: _build_checkpoint(last_page: int) -> dict[str, Any] Build checkpoint dictionary for incremental load support. Sets from_date to the ISO-8601 string representation of the maximum 'updated' value in self.output if available. This ensures the checkpoint is JSON-serializable. Checkpoint structure: { "last_page": int, "page_size": int, "to_date": str | None (ISO-8601 format), "from_date": str | None (ISO-8601 format), "data_product": str } .. py:method:: _build_url(data_product: ds_provider_simployer_py_lib.enums.SimployerDataProducts, mode: str = 'read') -> str Construct the API endpoint URL based on the data product and operation mode. Handles path parameters for read, create, update, and delete operations. :param data_product: The SimployerDataProducts enum value indicating which API endpoint to target. :param mode: Operation mode ("read", "create", "update", "delete"). :return: The full URL for the API request. :raises ReadError: If mode is "read" and endpoint is missing or parameter cannot be resolved. :raises CreateError: If mode is "create" and endpoint is missing or parameter cannot be resolved. :raises UpdateError: If mode is "update" and endpoint is missing or parameter cannot be resolved. :raises DeleteError: If mode is "delete" and endpoint is missing or parameter cannot be resolved. .. py:class:: SimployerDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` The object containing the settings of the dataset. .. py:attribute:: data_product :type: ds_provider_simployer_py_lib.enums.SimployerDataProducts | None :value: None Data product associated with this dataset (e.g., "employees"). Used to determine the API endpoint and other settings. .. py:attribute:: read :type: ReadSettings Settings for read().