ds_provider_microsoft_py_lib.dataset ==================================== .. py:module:: ds_provider_microsoft_py_lib.dataset .. autoapi-nested-parse:: **File:** ``__init__.py`` **Region:** ``ds-provider-microsoft-py-lib/dataset`` Dataset module for Microsoft provider. Example: >>> dataset = MsSqlTable( ... linked_service=MsSqlLinkedService(...), ... settings=MsSqlTableDatasetSettings( ... table="your_table_name", ... schema="your_schema_name", ... read=ReadSettings( ... limit=10, ... columns=["id", "color", "score", "active"], ... filters={"active": True}, ... order_by=[("score", "desc"), "id"], ... ), ... ) ... ) >>> dataset.read() Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_provider_microsoft_py_lib/dataset/mssql/index Classes ------- .. autoapisummary:: ds_provider_microsoft_py_lib.dataset.MsSqlTable ds_provider_microsoft_py_lib.dataset.MsSqlTableDatasetSettings Package Contents ---------------- .. py:class:: MsSqlTable Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`MsSqlLinkedServiceType`\ , :py:obj:`MsSqlTableDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`MsSqlLinkedServiceType`\ , :py:obj:`MsSqlTableDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: MsSqlLinkedServiceType .. py:attribute:: settings :type: MsSqlTableDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer | None .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer | None .. py:property:: type :type: ds_provider_microsoft_py_lib.enums.ResourceType Get the type of the Dataset. :returns: ResourceType .. py:method:: create(**_kwargs: Any) -> None Create/write data to the specified table. Writes self.input (pandas DataFrame) to the database table with the configured create settings (mode, etc.). :param _kwargs: Additional keyword arguments to pass to the request. :raises ConnectionError: If the connection fails. :raises CreateError: If the create operation fails. .. py:method:: read(**_kwargs: Any) -> None Read rows from the configured table into `self.output`. :param _kwargs: Additional keyword arguments for interface compatibility. :returns: None :raises ReadError: If reading data fails. .. py:method:: purge(**_kwargs: Any) -> None Remove all content from the target table. Drops the entire table, leaving the structure empty. Per contract, the target is empty after purge() returns. This is idempotent -- purging an already-empty (or non-existent) table is a no-op. :param _kwargs: Additional keyword arguments (ignored). :raises ConnectionError: If the connection is not established. :raises PurgeError: If the purge operation fails. .. py:method:: delete(**_kwargs: Any) -> None Delete specific rows from the target table. Removes only the rows in self.input, matched by all columns as identity. Per contract: empty input is a no-op (returns immediately). Deleting a row that does not exist is not an error. :param _kwargs: Additional keyword arguments (ignored). :raises ConnectionError: If the connection is not established. :raises DeleteError: If the delete operation fails. .. py:method:: update(**_kwargs: Any) -> None Update existing rows in the target table. This operation is not supported for SQL Server datasets at this time. :param _kwargs: Additional keyword arguments (ignored). :raises NotSupportedError: Always -- update is not supported. .. py:method:: rename(**_kwargs: Any) -> None Rename a resource (table) in the backend. This operation is not supported for SQL Server datasets at this time. :param _kwargs: Additional keyword arguments (ignored). :raises NotSupportedError: Always -- rename is not supported. .. py:method:: close() -> None Clean up the connection to the backend. Per contract: must be safe to call multiple times and never raise. :returns: None .. py:method:: list(**_kwargs: Any) -> None Discover available resources (tables) in the schema. Uses SQLAlchemy's Inspector to reflect and retrieve all tables in the configured schema with their metadata (type: table or view). :param _kwargs: Additional keyword arguments (ignored). :raises ConnectionError: If the connection is not established. :raises ListError: If the list operation fails. .. py:method:: upsert(**_kwargs: Any) -> None Insert or update rows in the target table. This operation is not supported for SQL Server datasets at this time. :param _kwargs: Additional keyword arguments (ignored). :raises NotSupportedError: Always -- upsert is not supported. .. py:method:: _get_table() -> sqlalchemy.Table Get the SQLAlchemy Table object for the configured schema and table. :returns: The SQLAlchemy Table object. :rtype: Table .. py:method:: _pandas_dtype_to_sqlalchemy(dtypes: pandas.Series) -> dict[str, Any] :staticmethod: Convert pandas dtypes Series to a dict mapping column names to SQLAlchemy types. :param dtypes: Pandas Series where index is column names and values are dtypes. :returns: Dictionary mapping column names to SQLAlchemy types. :rtype: dict[str, Any] .. py:method:: _validate_column(table: sqlalchemy.Table, column_name: str) -> None Validate that a column exists in the table. :param table: The SQLAlchemy Table object. :param column_name: The name of the column to validate. :raises ValueError: If the column doesn't exist in the table. .. py:method:: _validate_columns(table: sqlalchemy.Table, column_names: collections.abc.Sequence[str]) -> None Validate that all requested columns exist in the reflected table. :param table: Reflected SQLAlchemy table. :param column_names: Column names to validate. :returns: None :raises ValidationError: If one or more columns do not exist in the table. .. py:method:: _build_select_columns(table: sqlalchemy.Table) -> sqlalchemy.sql.Select[Any] Build a SELECT statement for configured columns or all columns. :param table: Reflected SQLAlchemy table. :returns: SELECT statement with chosen columns. :rtype: Select[Any] :raises ValidationError: If any selected column does not exist. .. py:method:: _build_filters(stmt: sqlalchemy.sql.Select[Any], table: sqlalchemy.Table) -> sqlalchemy.sql.Select[Any] Apply equality filters from read settings to the SELECT statement. :param stmt: Current SELECT statement. :param table: Reflected SQLAlchemy table. :returns: SELECT statement with WHERE conditions applied. :rtype: Select[Any] :raises ValidationError: If any filter column does not exist. .. py:method:: _build_order_by(stmt: sqlalchemy.sql.Select[Any], table: sqlalchemy.Table) -> sqlalchemy.sql.Select[Any] Apply ORDER BY clauses from read settings to the SELECT statement. :param stmt: Current SELECT statement. :param table: Reflected SQLAlchemy table. :returns: SELECT statement with ORDER BY applied. :rtype: Select[Any] :raises ValidationError: If any order-by column does not exist. .. py:method:: _quote_identifier(name: str) -> str Quote identifiers safely for SQL Server using SQLAlchemy's identifier preparer. Reject identifiers containing obvious injection primitives like quotes, semicolons, or brackets before quoting. :param name: The identifier name to quote. :returns: The safely quoted identifier. :rtype: str :raises ValueError: If the identifier contains unsafe characters. .. py:method:: get_details() -> dict[str, Any] Get details about the dataset. Constructs and returns a dictionary containing metadata about the current dataset configuration, including table name, schema name, and optional query filters and delete settings. :returns: A dictionary containing: - table_name (str): The name of the target table - schema_name (str): The schema containing the table - query_filter (Any, optional): Filter criteria if specified - delete_table (str, optional): Delete table setting if specified :rtype: dict[str, Any] .. py:method:: _is_na_scalar(v: Any) -> bool :staticmethod: Check whether *v* is a scalar NA value (NaN, NaT, None, pd.NA). ``pd.isna()`` returns an array-like result for non-scalar inputs (list, tuple, dict, ndarray), which makes a bare ``if pd.isna(v)`` raise ``ValueError: The truth value of an array is ambiguous``. This helper guards against that by only calling ``pd.isna`` on values that are known to be scalar. :param v: Any value from a record dict. :returns: ``True`` when *v* is a scalar NA-like value. :rtype: bool .. py:method:: _sanitize_records(records: collections.abc.Sequence[dict[collections.abc.Hashable, Any]]) -> collections.abc.Sequence[dict[collections.abc.Hashable, Any]] :staticmethod: Replace NaN and NaT values with None in record dicts. SQL Server rejects ``float('nan')`` over the TDS/ODBC protocol with *"The supplied value is not a valid instance of data type float"*. Converting these sentinel values to ``None`` causes SQLAlchemy to emit proper SQL ``NULL`` parameters instead. Non-scalar values (lists, tuples, dicts, ndarrays) are left as-is because ``pd.isna()`` returns an array-like result for them, which cannot be evaluated as a boolean. :param records: Row dicts produced by ``DataFrame.to_dict(orient="records")``. :returns: The same rows with NaN/NaT replaced by None. :rtype: Sequence[dict[Hashable, Any]] .. py:method:: _get_identity_columns(table: sqlalchemy.Table) -> collections.abc.Sequence[str] :staticmethod: Return the names of identity (auto-increment) columns on *table*. :param table: A reflected or constructed SQLAlchemy Table. :returns: Column names that have an identity property. :rtype: Sequence[str] .. py:method:: _set_identity_insert(conn: Any, *, enabled: bool) -> None Toggle ``IDENTITY_INSERT`` for the configured table. :param conn: Active SQLAlchemy connection. :param enabled: ``True`` to turn identity insert ON, ``False`` for OFF. .. py:method:: _copy_into_table(conn: Any, table: sqlalchemy.Table, content: pandas.DataFrame) -> None Insert rows from a DataFrame into a SQL Server table. Handles identity-column awareness (toggling ``IDENTITY_INSERT``) and sanitises NaN / NaT values so that SQL Server receives valid parameters. :param conn: SQLAlchemy connection inside an active transaction. :param table: SQLAlchemy Table object (metadata only). :param content: DataFrame containing rows to insert. .. py:method:: _resolve_create_primary_key_columns(content: pandas.DataFrame) -> collections.abc.Sequence[str] | None Resolve and validate create-time primary key columns. :param content: Input DataFrame used for table creation. :returns: Primary key columns for new table creation. :rtype: Sequence[str] | None :raises ValidationError: If `primary_key` is enabled but columns are invalid. .. py:method:: _build_table_from_input(content: pandas.DataFrame) -> sqlalchemy.Table Build a SQLAlchemy Table definition from input DataFrame dtypes. :param content: Input DataFrame to build the table from. :returns: SQLAlchemy Table definition. :rtype: Table .. py:method:: _output_from_empty_input() -> pandas.DataFrame Build a consistent empty-operation output while preserving input schema. :returns: Empty dataframe or a schema-preserving input copy. :rtype: pd.DataFrame .. py:method:: _validate_read_settings() -> None Validate read settings before query construction. :returns: None :raises ValidationError: If limit or order direction is invalid. .. py:class:: MsSqlTableDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` The object containing the settings of the dataset. .. py:attribute:: table :type: str Table name for dataset operations. .. py:attribute:: schema :type: str Schema for dataset operations. .. py:attribute:: read :type: ReadSettings Settings for read(). .. py:attribute:: create :type: CreateSettings Settings for create().