ds_protocol_sftp_py_lib ======================= .. py:module:: ds_protocol_sftp_py_lib .. autoapi-nested-parse:: **File:** ``__init__.py`` **Region:** ``ds-protocol-sftp-py-lib`` Description ----------- A Python package from the ds-protocol-sftp-py-lib library. .. rubric:: Example .. code-block:: python from ds_protocol_sftp_py_lib import __version__ print(f"Package version: {__version__}") Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_protocol_sftp_py_lib/dataset/index /autoapi/ds_protocol_sftp_py_lib/enums/index /autoapi/ds_protocol_sftp_py_lib/errors/index /autoapi/ds_protocol_sftp_py_lib/linked_service/index Attributes ---------- .. autoapisummary:: ds_protocol_sftp_py_lib.__version__ Classes ------- .. autoapisummary:: ds_protocol_sftp_py_lib.SftpDataset ds_protocol_sftp_py_lib.SftpDatasetSettings ds_protocol_sftp_py_lib.SftpLinkedService ds_protocol_sftp_py_lib.SftpLinkedServiceSettings Package Contents ---------------- .. py:class:: SftpDataset Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`SftpLinkedServiceType`\ , :py:obj:`SftpDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`SftpLinkedServiceType`\ , :py:obj:`SftpDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: SftpLinkedServiceType .. py:attribute:: settings :type: SftpDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer .. py:property:: type :type: ds_protocol_sftp_py_lib.enums.ResourceType Get the type of the dataset. .. py:method:: read() -> None Read files from the SFTP server. :returns: The output is stored in the `output` attribute as a DataFrame containing the contents of the matched files. :rtype: None :raises ReadError: If there is an error reading from the SFTP dataset. .. py:method:: create() -> None Create data on the SFTP server. .. note:: This method is **not idempotent**. If called multiple times with the same parameters, it will raise a CreateError if the file already exists. If a network or server error occurs after the file is created but before the method returns, retrying may result in a CreateError due to the file's existence. Orchestration and retry policies should account for this non-idempotent behavior. :returns: None :raises CreateError: If there is an error creating the dataset on the SFTP server, or if the file already exists. .. py:method:: update() -> None Update operation is not supported for in this provider. :returns: None :raises NotSupportedError: Always raised since update is not supported for SftpDataset. .. py:method:: upsert() -> None Upsert a file on the SFTP server. If the file already exists, it will be overwritten. :returns: None :raises UpsertError: If there is an error upserting the dataset on the SFTP server. .. py:method:: delete() -> None Delete operation is not supported for in this provider. :returns: None :raises NotSupportedError: Always raised since delete is not supported for SftpDataset. .. py:method:: purge() -> None Purge the dataset, deleting all files matching the pattern from the SFTP server. :returns: None :raises PurgeError: If there is an error purging files from the SFTP server .. py:method:: list() -> None List the files in the directory on the SFTP server based on the specified pattern and settings. :returns: The output is stored in the `output` attribute as a DataFrame containing the file information. :rtype: None :raises ListError: If there is an error listing the files in the SFTP dataset. .. py:method:: rename() -> None Rename operation is not supported for in this provider. :returns: None :raises NotSupportedError: Always raised since rename is not supported for SftpDataset. .. py:method:: close() -> None Close any open connections or resources. :returns: None .. py:method:: _get_folder_and_file_path() -> str Get combined path of folder_path and file_name, using forward slashes. This ensures consistent path formatting across Windows, Linux, and macOS. It also replaces any Windows-style backslashes with forward slashes. :returns: The full file path as a POSIX-style string. :rtype: str .. py:method:: _ensure_file_does_not_exist(remote_path: str) -> None Ensure the target file does not already exist on the SFTP server. :param remote_path: Full target file path on the SFTP server. :type remote_path: str :raises FileExistsError: If the target file already exists. .. py:method:: _list_directory(path: str) -> list[paramiko.SFTPAttributes] List the files in the specified directory on the SFTP server. :param path: The directory path to list files from. :type path: str :returns: A list of SFTPAttributes for the files in the directory. :rtype: list[SFTPAttributes] .. py:method:: _get_files_by_pattern(path: str, fnmatch_pattern: str) -> list[paramiko.SFTPAttributes] Get files from the SFTP server that match the specified pattern. :param path: The directory path to search for files. :type path: str :param fnmatch_pattern: The pattern to match file names against. :type fnmatch_pattern: str :returns: A list of SFTPAttributes for the matching files. :rtype: list[SFTPAttributes] .. py:method:: _ensure_sftp_directory(remote_directory: str, max_depth: int = 20) -> None Ensure that the specified directory exists on the SFTP server. If it does not exist, create it. :param remote_directory: The directory path to ensure on the SFTP server. :type remote_directory: str :param max_depth: The maximum directory depth to traverse when ensuring the directory exists. Default is 20. :type max_depth: int :returns: None :raises CreateError: If the maximum directory depth is exceeded while ensuring the SFTP directory. .. py:method:: _read_files_as_dataframe(files: list[paramiko.SFTPAttributes]) -> pandas.DataFrame Read the dataset from the SFTP server as a dataframe. :param files: List of SFTPAttributes for the files to read. :type files: list[SFTPAttributes] :returns: The combined data from the files as a single DataFrame. :rtype: pd.DataFrame .. py:method:: _list_directory_files(files: list[paramiko.SFTPAttributes]) -> pandas.DataFrame List the files in the directory as a dataframe. :param files: List of SFTPAttributes for the files to list. :type files: list[SFTPAttributes] :returns: A dataframe containing the file information. :rtype: pd.DataFrame .. py:class:: SftpDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for the SFTP dataset. .. py:attribute:: folder_path :type: str Path to the folder containing the file(s) to read/write on the SFTP server. .. py:attribute:: file_name :type: str Name of the file to read/write on the SFTP server. .. py:attribute:: list :type: ListSettings Settings for listing the SFTP dataset. .. py:class:: SftpLinkedService Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.linked_service.LinkedService`\ [\ :py:obj:`SftpLinkedServiceSettingsType`\ ], :py:obj:`Generic`\ [\ :py:obj:`SftpLinkedServiceSettingsType`\ ] SFTP Linked Service implementation. .. attribute:: settings Linked service settings. :type: SftpLinkedServiceSettingsType .. attribute:: _connection Underlying SFTP client connection. :type: SFTPClient | None .. attribute:: _sftp Sftp provider instance. :type: Sftp | None .. py:attribute:: settings :type: SftpLinkedServiceSettingsType .. py:attribute:: _sftp :type: ds_protocol_sftp_py_lib.utils.sftp.provider.Sftp | None :value: None .. py:property:: type :type: ds_protocol_sftp_py_lib.enums.ResourceType Get the type of linked service. :returns: The type of the linked service. :rtype: ResourceType .. py:property:: connection :type: ds_protocol_sftp_py_lib.utils.sftp.provider.Sftp Get the SFTP client connection. :returns: The active SFTP client connection. :rtype: Sftp :raises ConnectionError: If the connection is not initialized. .. py:method:: _init_sftp() -> ds_protocol_sftp_py_lib.utils.sftp.provider.Sftp Initialize the Sftp client. :returns: An initialized Sftp provider instance. :rtype: Sftp .. py:method:: connect() -> None Initialize the Sftp client instance if not already initialized. :raises ConnectionError: If connection fails. :raises AuthenticationError: If authentication fails. .. py:method:: test_connection() -> tuple[bool, str] Perform a lightweight health check against the SFTP backend. Uses the SFTP client's listdir method to check connectivity and authentication. :returns: - (True, message) if successful. - (False, error message) otherwise. :rtype: tuple[bool, str] .. py:method:: close() -> None Close the linked service. Sets the _sftp attribute to None to indicate the connection is closed. :raises ConnectionError: If closing the SFTP connection fails. .. py:class:: SftpLinkedServiceSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.linked_service.LinkedServiceSettings` Settings for SFTP Linked Service connections. .. attribute:: host SFTP server hostname. :type: str .. attribute:: username Username for authentication. :type: str .. attribute:: password Password for authentication. :type: str | None .. attribute:: private_key Private key for authentication. :type: str | None .. attribute:: passphrase Passphrase for private key. :type: str | None .. attribute:: timeout Connection timeout in seconds. :type: float | None .. attribute:: host_key_fingerprint Expected host key fingerprint. :type: str | None .. attribute:: host_key_validation Whether to validate host key. :type: bool .. attribute:: port SFTP server port. :type: int .. py:attribute:: host :type: str Hostname or IP address of the SFTP server. .. py:attribute:: username :type: str Username for authentication. .. py:attribute:: password :type: str | None :value: None Password for authentication. .. py:attribute:: private_key :type: str | None :value: None Private key for authentication. .. py:attribute:: passphrase :type: str | None :value: None Passphrase for private key. .. py:attribute:: timeout :type: float | None :value: None Connection timeout in seconds. .. py:attribute:: host_key_fingerprint :type: str | None :value: None Expected host key fingerprint (base64-encoded MD5, as produced by Paramiko's get_fingerprint(); e.g., 'AbCdEfGhIjKlMnOpQrStUvWxYz0123456789abcdEf=='). .. py:attribute:: host_key_validation :type: bool :value: True Whether to validate host key. .. py:attribute:: port :type: int :value: 22 SFTP server port. .. py:data:: __version__