ds_provider_microsoft_py_lib.dataset
====================================

.. py:module:: ds_provider_microsoft_py_lib.dataset

.. autoapi-nested-parse::

   **File:** ``__init__.py``
   **Region:** ``ds-provider-microsoft-py-lib/dataset``

   Dataset module for Microsoft provider.

   Example:
   >>> dataset = MsSqlTable(
   ...    linked_service=MsSqlLinkedService(...),
   ...    settings=MsSqlTableDatasetSettings(
   ...        table="your_table_name",
   ...        schema="your_schema_name",
   ...        read=ReadSettings(
   ...         limit=10,
   ...         columns=["id", "color", "score", "active"],
   ...         filters={"active": True},
   ...         order_by=[("score", "desc"), "id"],
   ...         ),
   ...    )
   ... )
   >>> dataset.read()


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/ds_provider_microsoft_py_lib/dataset/mssql/index


Classes
-------

.. autoapisummary::

   ds_provider_microsoft_py_lib.dataset.MsSqlTable
   ds_provider_microsoft_py_lib.dataset.MsSqlTableDatasetSettings


Package Contents
----------------

.. py:class:: MsSqlTable

   Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`MsSqlLinkedServiceType`\ , :py:obj:`MsSqlTableDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`MsSqlLinkedServiceType`\ , :py:obj:`MsSqlTableDatasetSettingsType`\ ]


   Tabular dataset object which identifies data within a data store,
   such as table/csv/json/parquet/parquetdataset/ and other documents.

   The input of the dataset is a pandas DataFrame.
   The output of the dataset is a pandas DataFrame.


   .. py:attribute:: linked_service
      :type:  MsSqlLinkedServiceType


   .. py:attribute:: settings
      :type:  MsSqlTableDatasetSettingsType


   .. py:attribute:: serializer
      :type:  ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None


   .. py:attribute:: deserializer
      :type:  ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None


   .. py:property:: type
      :type: ds_provider_microsoft_py_lib.enums.ResourceType


      Get the type of the Dataset.

      :returns: ResourceType


   .. py:method:: create(**_kwargs: Any) -> None

      Create/write data to the specified table.

      Writes self.input (pandas DataFrame) to the database table with the
      configured create settings (mode, etc.).

      :param _kwargs: Additional keyword arguments to pass to the request.

      :raises ConnectionError: If the connection fails.
      :raises CreateError: If the create operation fails.


   .. py:method:: read(**_kwargs: Any) -> None

      Read rows from the configured table into `self.output`.

      :param _kwargs: Additional keyword arguments for interface compatibility.

      :returns: None

      :raises ReadError: If reading data fails.


   .. py:method:: purge(**_kwargs: Any) -> None

      Remove all content from the target table.

      Drops the entire table, leaving the structure empty. Per contract,
      the target is empty after purge() returns. This is idempotent --
      purging an already-empty (or non-existent) table is a no-op.

      :param _kwargs: Additional keyword arguments (ignored).

      :raises ConnectionError: If the connection is not established.
      :raises PurgeError: If the purge operation fails.


   .. py:method:: delete(**_kwargs: Any) -> None

      Delete specific rows from the target table.

      Removes only the rows in self.input, matched by all columns as identity.
      Per contract: empty input is a no-op (returns immediately).
      Deleting a row that does not exist is not an error.

      :param _kwargs: Additional keyword arguments (ignored).

      :raises ConnectionError: If the connection is not established.
      :raises DeleteError: If the delete operation fails.


   .. py:method:: update(**_kwargs: Any) -> None

      Update existing rows in the target table.

      This operation is not supported for SQL Server datasets at this time.

      :param _kwargs: Additional keyword arguments (ignored).

      :raises NotSupportedError: Always -- update is not supported.


   .. py:method:: rename(**_kwargs: Any) -> None

      Rename a resource (table) in the backend.

      This operation is not supported for SQL Server datasets at this time.

      :param _kwargs: Additional keyword arguments (ignored).

      :raises NotSupportedError: Always -- rename is not supported.


   .. py:method:: close() -> None

      Clean up the connection to the backend.

      Per contract: must be safe to call multiple times and never raise.

      :returns: None


   .. py:method:: list(**_kwargs: Any) -> None

      Discover available resources (tables) in the schema.

      Uses SQLAlchemy's Inspector to reflect and retrieve all tables
      in the configured schema with their metadata (type: table or view).

      :param _kwargs: Additional keyword arguments (ignored).

      :raises ConnectionError: If the connection is not established.
      :raises ListError: If the list operation fails.


   .. py:method:: upsert(**_kwargs: Any) -> None

      Insert or update rows in the target table.

      This operation is not supported for SQL Server datasets at this time.

      :param _kwargs: Additional keyword arguments (ignored).

      :raises NotSupportedError: Always -- upsert is not supported.


   .. py:method:: _get_table() -> sqlalchemy.Table

      Get the SQLAlchemy Table object for the configured schema and table.

      :returns: The SQLAlchemy Table object.
      :rtype: Table


   .. py:method:: _pandas_dtype_to_sqlalchemy(dtypes: pandas.Series) -> dict[str, Any]
      :staticmethod:


      Convert pandas dtypes Series to a dict mapping column names to SQLAlchemy types.

      :param dtypes: Pandas Series where index is column names and values are dtypes.

      :returns: Dictionary mapping column names to SQLAlchemy types.
      :rtype: dict[str, Any]


   .. py:method:: _validate_column(table: sqlalchemy.Table, column_name: str) -> None

      Validate that a column exists in the table.

      :param table: The SQLAlchemy Table object.
      :param column_name: The name of the column to validate.

      :raises ValueError: If the column doesn't exist in the table.


   .. py:method:: _validate_columns(table: sqlalchemy.Table, column_names: collections.abc.Sequence[str]) -> None

      Validate that all requested columns exist in the reflected table.

      :param table: Reflected SQLAlchemy table.
      :param column_names: Column names to validate.

      :returns: None

      :raises ValidationError: If one or more columns do not exist in the table.


   .. py:method:: _build_select_columns(table: sqlalchemy.Table) -> sqlalchemy.sql.Select[Any]

      Build a SELECT statement for configured columns or all columns.

      :param table: Reflected SQLAlchemy table.

      :returns: SELECT statement with chosen columns.
      :rtype: Select[Any]

      :raises ValidationError: If any selected column does not exist.


   .. py:method:: _build_filters(stmt: sqlalchemy.sql.Select[Any], table: sqlalchemy.Table) -> sqlalchemy.sql.Select[Any]

      Apply equality filters from read settings to the SELECT statement.

      :param stmt: Current SELECT statement.
      :param table: Reflected SQLAlchemy table.

      :returns: SELECT statement with WHERE conditions applied.
      :rtype: Select[Any]

      :raises ValidationError: If any filter column does not exist.


   .. py:method:: _build_order_by(stmt: sqlalchemy.sql.Select[Any], table: sqlalchemy.Table) -> sqlalchemy.sql.Select[Any]

      Apply ORDER BY clauses from read settings to the SELECT statement.

      :param stmt: Current SELECT statement.
      :param table: Reflected SQLAlchemy table.

      :returns: SELECT statement with ORDER BY applied.
      :rtype: Select[Any]

      :raises ValidationError: If any order-by column does not exist.


   .. py:method:: _quote_identifier(name: str) -> str

      Quote identifiers safely for SQL Server using SQLAlchemy's identifier preparer.

      Reject identifiers containing obvious injection primitives like quotes, semicolons,
      or brackets before quoting.

      :param name: The identifier name to quote.

      :returns: The safely quoted identifier.
      :rtype: str

      :raises ValueError: If the identifier contains unsafe characters.


   .. py:method:: get_details() -> dict[str, Any]

      Get details about the dataset.

      Constructs and returns a dictionary containing metadata about the current
      dataset configuration, including table name, schema name, and optional
      query filters and delete settings.

      :returns:

                A dictionary containing:
                    - table_name (str): The name of the target table
                    - schema_name (str): The schema containing the table
                    - query_filter (Any, optional): Filter criteria if specified
                    - delete_table (str, optional): Delete table setting if specified
      :rtype: dict[str, Any]


   .. py:method:: _is_na_scalar(v: Any) -> bool
      :staticmethod:


      Check whether *v* is a scalar NA value (NaN, NaT, None, pd.NA).

      ``pd.isna()`` returns an array-like result for non-scalar inputs
      (list, tuple, dict, ndarray), which makes a bare ``if pd.isna(v)``
      raise ``ValueError: The truth value of an array is ambiguous``.
      This helper guards against that by only calling ``pd.isna`` on
      values that are known to be scalar.

      :param v: Any value from a record dict.

      :returns: ``True`` when *v* is a scalar NA-like value.
      :rtype: bool


   .. py:method:: _sanitize_records(records: collections.abc.Sequence[dict[collections.abc.Hashable, Any]]) -> collections.abc.Sequence[dict[collections.abc.Hashable, Any]]
      :staticmethod:


      Replace NaN and NaT values with None in record dicts.

      SQL Server rejects ``float('nan')`` over the TDS/ODBC protocol with
      *"The supplied value is not a valid instance of data type float"*.
      Converting these sentinel values to ``None`` causes SQLAlchemy to emit
      proper SQL ``NULL`` parameters instead.

      Non-scalar values (lists, tuples, dicts, ndarrays) are left as-is
      because ``pd.isna()`` returns an array-like result for them, which
      cannot be evaluated as a boolean.

      :param records: Row dicts produced by ``DataFrame.to_dict(orient="records")``.

      :returns: The same rows with NaN/NaT replaced by None.
      :rtype: Sequence[dict[Hashable, Any]]


   .. py:method:: _get_identity_columns(table: sqlalchemy.Table) -> collections.abc.Sequence[str]
      :staticmethod:


      Return the names of identity (auto-increment) columns on *table*.

      :param table: A reflected or constructed SQLAlchemy Table.

      :returns: Column names that have an identity property.
      :rtype: Sequence[str]


   .. py:method:: _set_identity_insert(conn: Any, *, enabled: bool) -> None

      Toggle ``IDENTITY_INSERT`` for the configured table.

      :param conn: Active SQLAlchemy connection.
      :param enabled: ``True`` to turn identity insert ON, ``False`` for OFF.


   .. py:method:: _copy_into_table(conn: Any, table: sqlalchemy.Table, content: pandas.DataFrame) -> None

      Insert rows from a DataFrame into a SQL Server table.

      Handles identity-column awareness (toggling ``IDENTITY_INSERT``) and
      sanitises NaN / NaT values so that SQL Server receives valid parameters.

      :param conn: SQLAlchemy connection inside an active transaction.
      :param table: SQLAlchemy Table object (metadata only).
      :param content: DataFrame containing rows to insert.


   .. py:method:: _resolve_create_primary_key_columns(content: pandas.DataFrame) -> collections.abc.Sequence[str] | None

      Resolve and validate create-time primary key columns.

      :param content: Input DataFrame used for table creation.

      :returns: Primary key columns for new table creation.
      :rtype: Sequence[str] | None

      :raises ValidationError: If `primary_key` is enabled but columns are invalid.


   .. py:method:: _build_table_from_input(content: pandas.DataFrame) -> sqlalchemy.Table

      Build a SQLAlchemy Table definition from input DataFrame dtypes.

      :param content: Input DataFrame to build the table from.

      :returns: SQLAlchemy Table definition.
      :rtype: Table


   .. py:method:: _output_from_empty_input() -> pandas.DataFrame

      Build a consistent empty-operation output while preserving input schema.

      :returns: Empty dataframe or a schema-preserving input copy.
      :rtype: pd.DataFrame


   .. py:method:: _validate_read_settings() -> None

      Validate read settings before query construction.

      :returns: None

      :raises ValidationError: If limit or order direction is invalid.


.. py:class:: MsSqlTableDatasetSettings

   Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings`


   The object containing the settings of the dataset.


   .. py:attribute:: table
      :type:  str

      Table name for dataset operations.


   .. py:attribute:: schema
      :type:  str

      Schema for dataset operations.


   .. py:attribute:: read
      :type:  ReadSettings

      Settings for read().


   .. py:attribute:: create
      :type:  CreateSettings

      Settings for create().