ds_provider_simployer_py_lib.dataset.simployer
==============================================

.. py:module:: ds_provider_simployer_py_lib.dataset.simployer

.. autoapi-nested-parse::

   **File:** ``simployer.py``
   **Region:** ``ds_provider_simployer_py_lib/dataset/simployer.py``

   Simployer Dataset

   This module implements a dataset for Simployer APIs.

   .. rubric:: Example

   >>> from uuid import uuid4
   >>> dataset = SimployerDataset(
   ...     id=uuid4(),
   ...     name="employees_dataset",
   ...     version="1.0.0",
   ...     settings=SimployerDatasetSettings(
   ...         data_product=SimployerDataProducts.EMPLOYEES,
   ...         read=ReadSettings(page_size=100),
   ...     ),
   ...     linked_service=SimployerLinkedService(
   ...         id=uuid4(),
   ...         name="simployer_connection",
   ...         version="1.0.0",
   ...         settings=SimployerLinkedServiceSettings(
   ...             client_id="your_client_id",
   ...             client_secret="your_client_secret",
   ...         ),
   ...     ),
   ... )
   >>> linked_service = dataset.linked_service
   >>> linked_service.connect()
   >>> dataset.read()
   >>> data = dataset.output


Attributes
----------

.. autoapisummary::

   ds_provider_simployer_py_lib.dataset.simployer.logger
   ds_provider_simployer_py_lib.dataset.simployer.SimployerDatasetSettingsType
   ds_provider_simployer_py_lib.dataset.simployer.SimployerLinkedServiceType


Classes
-------

.. autoapisummary::

   ds_provider_simployer_py_lib.dataset.simployer.ReadSettings
   ds_provider_simployer_py_lib.dataset.simployer.SimployerDatasetSettings
   ds_provider_simployer_py_lib.dataset.simployer.SimployerDataset


Module Contents
---------------

.. py:data:: logger

.. py:class:: ReadSettings

   Bases: :py:obj:`ds_common_serde_py_lib.Serializable`


   Settings specific to the read() operation.

   These settings only apply when reading data from the API
   and do not affect create(), update() or delete() operations

   When set, pagination settings are ignored and a single GET request is made
   to the resource-specific endpoint (e.g., /v1/persons/{id}).
   Note: Not all endpoints support single-record lookup (e.g., /v1/employees).


   .. py:attribute:: resource_id
      :type:  str | None
      :value: None


      For read operations, if set, indicates a single resource lookup by ID (e.g., /v1/persons/{id}).
      If None, read() performs based on read settings (pagination, filters, etc.) on the collection endpoint (e.g., /v1/employees).
      Note: Not all data products support single-record lookup.


   .. py:attribute:: page
      :type:  int
      :value: 1


      Page number for pagination. Default is 1.


   .. py:attribute:: page_size
      :type:  int
      :value: 100


      Number of records per page for pagination. Default is 100.


   .. py:attribute:: from_date
      :type:  str | None
      :value: None


      Start date for filtering data, in YYYY-MM-DD format. Optional.


   .. py:attribute:: to_date
      :type:  str | None
      :value: None


      End date for filtering data, in YYYY-MM-DD format. Optional.


   .. py:attribute:: filters
      :type:  dict[str, Any] | None
      :value: None


      Additional filters for the API request. Optional.


.. py:class:: SimployerDatasetSettings

   Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings`


   The object containing the settings of the dataset.


   .. py:attribute:: data_product
      :type:  ds_provider_simployer_py_lib.enums.SimployerDataProducts | None
      :value: None


      Data product associated with this dataset (e.g., "employees").

      Used to determine the API endpoint and other settings.


   .. py:attribute:: read
      :type:  ReadSettings

      Settings for read().


.. py:data:: SimployerDatasetSettingsType

.. py:data:: SimployerLinkedServiceType

.. py:class:: SimployerDataset

   Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`SimployerLinkedServiceType`\ , :py:obj:`SimployerDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`SimployerLinkedServiceType`\ , :py:obj:`SimployerDatasetSettingsType`\ ]


   Tabular dataset object which identifies data within a data store,
   such as table/csv/json/parquet/parquetdataset/ and other documents.

   The input of the dataset is a pandas DataFrame.
   The output of the dataset is a pandas DataFrame.


   .. py:attribute:: linked_service
      :type:  SimployerLinkedServiceType


   .. py:attribute:: settings
      :type:  SimployerDatasetSettingsType


   .. py:attribute:: serializer
      :type:  ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None


   .. py:attribute:: deserializer
      :type:  ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None


   .. py:property:: type
      :type: ds_provider_simployer_py_lib.enums.ResourceType


      Get the type of the dataset.


   .. py:property:: supports_checkpoint
      :type: bool


      Whether this provider supports incremental loads via ``self.checkpoint``.

      This implementation uses a simple dictionary-based checkpoint structure to
      support resuming paginated reads:

      - On a full load, ``self.checkpoint`` is expected to be empty (``{}``) or
        ``None``. In this case, :meth:`read` starts from page ``1``.
      - After each successfully read page, :meth:`read` sets
        ``self.checkpoint = {"last_page": page}``, where ``page`` is the last
        completed page number.
      - On a subsequent run, if ``self.checkpoint`` contains a ``"last_page"``
        entry, :meth:`read` resumes from ``last_page + 1`` and continues
        fetching data from the Simployer API.

      This allows consumers to perform incremental loads by persisting and
      reusing the checkpoint between executions, avoiding re-reading pages that
      were already processed successfully.

      :returns: True if checkpointing is supported, False otherwise.
      :rtype: bool


   .. py:method:: read() -> None

      Read data from the requested endpoint of the Simployer API.

      :raises ReadError: If reading data fails.


   .. py:method:: _read_single_resource(session: Any) -> None

      Fetch a single resource by ID.


   .. py:method:: _read_collection(session: Any) -> None

      Fetch collection with pagination and update checkpoint with max 'updated' value.


   .. py:method:: create() -> None

      Insert rows into Simployer API.

      Reads from self.input (which must be a pandas DataFrame) and POSTs to the configured endpoint.
      Results are stored in self.output.

      Input Requirement:
          - self.input must be a pandas DataFrame with columns matching the Simployer endpoint schema.
          - Only one record per create() call is allowed (one row in the DataFrame).
          - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

      .. rubric:: Example

      import pandas as pd
      data = {
          "firstName": "John",
          "lastName": "Doe",
          "primaryEmail": "john.doe@example.com",
          "affiliatedOrganizationId": "org-12345"
      }
      dataset.input = pd.DataFrame([data])
      dataset.create()

      The caller can also provide raw data and use the deserializer to convert:
          dataset.input = dataset.deserializer.deserialize(raw_bytes)
          dataset.create()

      :raises NotSupportedError: If the configured data product does not support create (POST).
      :raises CreateError: If creating records fails.


   .. py:method:: delete() -> None

      Delete rows from Simployer API.

      Reads from self.input (which must be a pandas DataFrame) and issues a DELETE to the configured endpoint.
      Results are stored in self.output.

      Capacity Limit:
          The Simployer API accepts only 1 record per DELETE request.
          If self.input contains more than 1 row, this method raises DeleteError.
          The caller must batch: split self.input into single-row chunks and call delete() once per chunk.

      Input Requirement:
          - self.input must be a pandas DataFrame.
          - Only one record per delete() call is allowed (one row in the DataFrame).
          - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

      .. rubric:: Example

      import pandas as pd
      dataset.input = pd.DataFrame([{"id": "12345"}])
      dataset.delete()

      Per contract:
          - Empty input is a no-op (returns immediately without contacting the backend).
          - Deleting a row that does not exist is not an error and is idempotent.

      :raises NotSupportedError: If the configured data product does not support delete (DELETE).
      :raises DeleteError: If the input exceeds capacity (more than 1 row) or if deletion fails.


   .. py:method:: update() -> None

      Update an existing row in Simployer API.

      Reads from self.input (which must be a pandas DataFrame) and PUTs to the configured endpoint.
      Results are stored in self.output.

      Capacity Limit:
          The Simployer API accepts only 1 record per PUT request.
          If self.input contains more than 1 row, this method raises UpdateError.
          The caller must batch: split self.input into single-row chunks and call update() once per chunk.

      Input Requirement:
          - self.input must be a pandas DataFrame with one row.
          - Only one record per update() call is allowed (one row in the DataFrame).
          - Users must convert their data (dict, JSON, etc.) to a DataFrame before assigning to self.input.

      .. rubric:: Example

      import pandas as pd
      dataset.input = pd.DataFrame([{"id": "12345", "status": "active"}])
      dataset.update()

      Per contract:
          - Empty input is a no-op (returns immediately without contacting the backend).
          - Must not insert new rows (non-existent resources result in an error).
          - Idempotent: Yes. Updating a row to the same values has no effect.

      :raises NotSupportedError: If the configured data product does not support update (PUT).
      :raises UpdateError: If the input exceeds capacity (more than 1 row) or if the update fails.


   .. py:method:: close() -> None

      Release any resources held by the dataset.

      For Simployer, the dataset holds no resources directly.
      Connection lifecycle is managed by the linked service.


   .. py:method:: rename() -> None

      Rename the resource in the backend. Atomic. Not idempotent.

      :raises RenameError: If the operation fails.
      :raises NotSupportedError: If the provider does not support renaming.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``rename()``


   .. py:method:: list() -> None

      Discover available resources and populate ``self.output`` with a
      DataFrame of resources and their metadata. Idempotent.

      :raises ListError: If the operation fails.
      :raises NotSupportedError: If the provider does not support listing.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``list()``


   .. py:method:: upsert() -> None

      Insert rows that do not exist, update rows that do, matched by
      identity columns defined in ``self.settings``. Atomic.

      :raises UpsertError: If the operation fails.
      :raises NotSupportedError: If the provider does not support upsert.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``upsert()``


   .. py:method:: purge() -> None

      Remove all content from the target. ``self.input`` is not used.
      Atomic. Idempotent.

      :raises PurgeError: If the operation fails.
      :raises NotSupportedError: If the provider does not support purge.

      .. seealso:: Full contract: ``docs/DATASET_CONTRACT.md`` -- ``purge()``


   .. py:method:: _build_params(page: int) -> dict[str, Any]

      Build query parameters for the API request.


   .. py:method:: _build_checkpoint(last_page: int) -> dict[str, Any]

      Build checkpoint dictionary for incremental load support.

      Sets from_date to the ISO-8601 string representation of the maximum 'updated'
      value in self.output if available. This ensures the checkpoint is JSON-serializable.

      Checkpoint structure:
          {
              "last_page": int,
              "page_size": int,
              "to_date": str | None (ISO-8601 format),
              "from_date": str | None (ISO-8601 format),
              "data_product": str
          }


   .. py:method:: _build_url(data_product: ds_provider_simployer_py_lib.enums.SimployerDataProducts, mode: str = 'read') -> str

      Construct the API endpoint URL based on the data product and operation mode.
      Handles path parameters for read, create, update, and delete operations.
      :param data_product: The SimployerDataProducts enum value indicating which API endpoint to target.
      :param mode: Operation mode ("read", "create", "update", "delete").
      :return: The full URL for the API request.
      :raises ReadError: If mode is "read" and endpoint is missing or parameter cannot be resolved.
      :raises CreateError: If mode is "create" and endpoint is missing or parameter cannot be resolved.
      :raises UpdateError: If mode is "update" and endpoint is missing or parameter cannot be resolved.
      :raises DeleteError: If mode is "delete" and endpoint is missing or parameter cannot be resolved.