ds_provider_simployer_py_lib.dataset
====================================

.. py:module:: ds_provider_simployer_py_lib.dataset

.. autoapi-nested-parse::

   **File:** ``__init__.py``
   **Region:** ``ds_provider_simployer_py_lib/dataset``

   Description
   -----------
   This module implements a dataset for Simployer APIs, focusing on Simployer-specific
   data products and parameters rather than generic HTTP concerns.

   Includes custom serializers/deserializers tailored to Simployer's API contract.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/ds_provider_simployer_py_lib/dataset/simployer/index


Classes
-------

.. autoapisummary::

   ds_provider_simployer_py_lib.dataset.SimployerDataset
   ds_provider_simployer_py_lib.dataset.SimployerDatasetSettings


Package Contents
----------------

.. py:class:: SimployerDataset

   Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`SimployerLinkedServiceType`\ , :py:obj:`SimployerDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`SimployerLinkedServiceType`\ , :py:obj:`SimployerDatasetSettingsType`\ ]


   Tabular dataset object which identifies data within a data store,
   such as table/csv/json/parquet/parquetdataset/ and other documents.

   The input of the dataset is a pandas DataFrame.
   The output of the dataset is a pandas DataFrame.


   .. py:attribute:: linked_service
      :type:  SimployerLinkedServiceType


   .. py:attribute:: settings
      :type:  SimployerDatasetSettingsType


   .. py:attribute:: serializer
      :type:  ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None


   .. py:attribute:: deserializer
      :type:  ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None


   .. py:property:: type
      :type: ds_provider_simployer_py_lib.enums.ResourceType


      Get the type of the dataset.


   .. py:property:: supports_checkpoint
      :type: bool


      Whether this provider supports incremental loads via ``self.checkpoint``.

      This implementation uses a simple dictionary-based checkpoint structure to
      support resuming paginated reads:

      - On a full load, ``self.checkpoint`` is expected to be empty (``{}``) or
        ``None``. In this case, :meth:`read` starts from page ``1``.
      - After each successfully read page, :meth:`read` sets
        ``self.checkpoint = {"last_page": page}``, where ``page`` is the last
        completed page number.
      - On a subsequent run, if ``self.checkpoint`` contains a ``"last_page"``
        entry, :meth:`read` resumes from ``last_page + 1`` and continues
        fetching data from the Simployer API.

      This allows consumers to perform incremental loads by persisting and
      reusing the checkpoint between executions, avoiding re-reading pages that
      were already processed successfully.

      :returns: True if checkpointing is supported, False otherwise.
      :rtype: bool


   .. py:method:: read() -> None

      Read data from the requested endpoint of the Simployer API.

      :raises ReadError: If reading data fails.


   .. py:method:: _read_single_resource(session: Any) -> None

      Fetch a single resource by ID.


   .. py:method:: _read_collection(session: Any) -> None

      Fetch collection with pagination and update checkpoint with max 'updated' value.


   .. py:method:: create() -> None

      Insert rows into Simployer API.

      Reads from self.input (which must be a pandas DataFrame) and POSTs to the configured endpoint.
      Results are stored in self.output.

      Input Requirement:
          - self.input must be a pandas DataFrame with columns matching the Simployer endpoint schema.
          - Only one record per create() call is allowed (one row in the DataFrame).
          - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

      .. rubric:: Example

      import pandas as pd
      data = {
          "firstName": "John",
          "lastName": "Doe",
          "primaryEmail": "john.doe@example.com",
          "affiliatedOrganizationId": "org-12345"
      }
      dataset.input = pd.DataFrame([data])
      dataset.create()

      The caller can also provide raw data and use the deserializer to convert:
          dataset.input = dataset.deserializer.deserialize(raw_bytes)
          dataset.create()

      :raises NotSupportedError: If the configured data product does not support create (POST).
      :raises CreateError: If creating records fails.


   .. py:method:: delete() -> None

      Delete rows from Simployer API.

      Reads from self.input (which must be a pandas DataFrame) and issues a DELETE to the configured endpoint.
      Results are stored in self.output.

      Capacity Limit:
          The Simployer API accepts only 1 record per DELETE request.
          If self.input contains more than 1 row, this method raises DeleteError.
          The caller must batch: split self.input into single-row chunks and call delete() once per chunk.

      Input Requirement:
          - self.input must be a pandas DataFrame.
          - Only one record per delete() call is allowed (one row in the DataFrame).
          - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

      .. rubric:: Example

      import pandas as pd
      dataset.input = pd.DataFrame([{"id": "12345"}])
      dataset.delete()

      Per contract:
          - Empty input is a no-op (returns immediately without contacting the backend).
          - Deleting a row that does not exist is not an error and is idempotent.

      :raises NotSupportedError: If the configured data product does not support delete (DELETE).
      :raises DeleteError: If the input exceeds capacity (more than 1 row) or if deletion fails.


   .. py:method:: update() -> None

      Update an existing row in Simployer API.

      Reads from self.input (which must be a pandas DataFrame) and PUTs to the configured endpoint.
      Results are stored in self.output.

      Capacity Limit:
          The Simployer API accepts only 1 record per PUT request.
          If self.input contains more than 1 row, this method raises UpdateError.
          The caller must batch: split self.input into single-row chunks and call update() once per chunk.

      Input Requirement:
          - self.input must be a pandas DataFrame with one row.
          - Only one record per update() call is allowed (one row in the DataFrame).
          - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

      .. rubric:: Example

      import pandas as pd
      dataset.input = pd.DataFrame([{"id": "12345", "status": "active"}])
      dataset.update()

      Per contract:
          - Empty input is a no-op (returns immediately without contacting the backend).
          - Must not insert new rows (non-existent resources result in an error).
          - Idempotent: Yes. Updating a row to the same values has no effect.

      :raises NotSupportedError: If the configured data product does not support update (PUT).
      :raises UpdateError: If the input exceeds capacity (more than 1 row) or if the update fails.


   .. py:method:: close() -> None

      Release any resources held by the dataset.

      For Simployer, the dataset holds no resources directly.
      Connection lifecycle is managed by the linked service.


   .. py:method:: rename() -> None

      Rename the resource in the backend. Atomic. Not idempotent.

      :raises RenameError: If the operation fails.
      :raises NotSupportedError: If the provider does not support renaming.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()``


   .. py:method:: list() -> None

      Discover available resources and populate ``self.output`` with a
      DataFrame of resources and their metadata. Idempotent.

      :raises ListError: If the operation fails.
      :raises NotSupportedError: If the provider does not support listing.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()``


   .. py:method:: upsert() -> None

      Insert rows that do not exist, update rows that do, matched by
      identity columns defined in ``self.settings``. Atomic.

      :raises UpsertError: If the operation fails.
      :raises NotSupportedError: If the provider does not support upsert.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()``


   .. py:method:: purge() -> None

      Remove all content from the target. ``self.input`` is not used.
      Atomic. Idempotent.

      :raises PurgeError: If the operation fails.
      :raises NotSupportedError: If the provider does not support purge.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``purge()``


   .. py:method:: _build_params(page: int) -> dict[str, Any]

      Build query parameters for the API request.


   .. py:method:: _build_checkpoint(last_page: int) -> dict[str, Any]

      Build checkpoint dictionary for incremental load support.

      Sets from_date to the ISO-8601 string representation of the maximum 'updated'
      value in self.output if available. This ensures the checkpoint is JSON-serializable.

      Checkpoint structure:
          {
              "last_page": int,
              "page_size": int,
              "to_date": str | None (ISO-8601 format),
              "from_date": str | None (ISO-8601 format),
              "data_product": str
          }


   .. py:method:: _build_url(data_product: ds_provider_simployer_py_lib.enums.SimployerDataProducts, mode: str = 'read') -> str

      Construct the API endpoint URL based on the data product and operation mode.
      Handles path parameters for read, create, update, and delete operations.
      :param data_product: The SimployerDataProducts enum value indicating which API endpoint to target.
      :param mode: Operation mode ("read", "create", "update", "delete").
      :return: The full URL for the API request.
      :raises ReadError: If mode is "read" and endpoint is missing or parameter cannot be resolved.
      :raises CreateError: If mode is "create" and endpoint is missing or parameter cannot be resolved.
      :raises UpdateError: If mode is "update" and endpoint is missing or parameter cannot be resolved.
      :raises DeleteError: If mode is "delete" and endpoint is missing or parameter cannot be resolved.


.. py:class:: SimployerDatasetSettings

   Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings`


   The object containing the settings of the dataset.


   .. py:attribute:: data_product
      :type:  ds_provider_simployer_py_lib.enums.SimployerDataProducts | None
      :value: None


      Data product associated with this dataset (e.g., "employees").

      Used to determine the API endpoint and other settings.


   .. py:attribute:: read
      :type:  ReadSettings

      Settings for read().