ds_protocol_sftp_py_lib.dataset =============================== .. py:module:: ds_protocol_sftp_py_lib.dataset .. autoapi-nested-parse:: **File**: `__init__.py` **Region**: `ds_protocol_sftp_py_lib/dataset` SFTP Dataset This module implements the SFTP Dataset, which is a dataset that can be used to read and write data from an SFTP server. .. rubric:: Example >>> dataset = SftpDataset( ... id=uuid.uuid4(), ... name="My SFTP Dataset", ... version="1.0", ... deserializer=PandasDeserializer(), ... serializer=PandasSerializer(), ... settings=SftpDatasetSettings( ... folder_path="/path/to/dataset.csv", ... file_name="dataset.csv", ... ), ... linked_service=SftpLinkedService( ... id=uuid.uuid4(), ... name="My SFTP Linked Service", ... version="1.0.0", ... settings=SftpLinkedServiceSettings( ... host="sftp.example.com", ... port=22, ... username="username", ... password="password", ... ), ... ) ... ) ... dataset.read() ... data = dataset.output Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/ds_protocol_sftp_py_lib/dataset/sftp/index Classes ------- .. autoapisummary:: ds_protocol_sftp_py_lib.dataset.SftpDataset ds_protocol_sftp_py_lib.dataset.SftpDatasetSettings Package Contents ---------------- .. py:class:: SftpDataset Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset`\ [\ :py:obj:`SftpLinkedServiceType`\ , :py:obj:`SftpDatasetSettingsType`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer`\ , :py:obj:`ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer`\ ], :py:obj:`Generic`\ [\ :py:obj:`SftpLinkedServiceType`\ , :py:obj:`SftpDatasetSettingsType`\ ] Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents. The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame. .. py:attribute:: linked_service :type: SftpLinkedServiceType .. py:attribute:: settings :type: SftpDatasetSettingsType .. py:attribute:: serializer :type: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer .. py:attribute:: deserializer :type: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer .. py:property:: type :type: ds_protocol_sftp_py_lib.enums.ResourceType Get the type of the dataset. .. py:method:: read() -> None Read files from the SFTP server. :returns: The output is stored in the `output` attribute as a DataFrame containing the contents of the matched files. :rtype: None :raises ReadError: If there is an error reading from the SFTP dataset. .. py:method:: create() -> None Create data on the SFTP server. .. note:: This method is **not idempotent**. If called multiple times with the same parameters, it will raise a CreateError if the file already exists. If a network or server error occurs after the file is created but before the method returns, retrying may result in a CreateError due to the file's existence. Orchestration and retry policies should account for this non-idempotent behavior. :returns: None :raises CreateError: If there is an error creating the dataset on the SFTP server, or if the file already exists. .. py:method:: update() -> None Update operation is not supported for in this provider. :returns: None :raises NotSupportedError: Always raised since update is not supported for SftpDataset. .. py:method:: upsert() -> None Upsert a file on the SFTP server. If the file already exists, it will be overwritten. :returns: None :raises UpsertError: If there is an error upserting the dataset on the SFTP server. .. py:method:: delete() -> None Delete operation is not supported for in this provider. :returns: None :raises NotSupportedError: Always raised since delete is not supported for SftpDataset. .. py:method:: purge() -> None Purge the dataset, deleting all files matching the pattern from the SFTP server. :returns: None :raises PurgeError: If there is an error purging files from the SFTP server .. py:method:: list() -> None List the files in the directory on the SFTP server based on the specified pattern and settings. :returns: The output is stored in the `output` attribute as a DataFrame containing the file information. :rtype: None :raises ListError: If there is an error listing the files in the SFTP dataset. .. py:method:: rename() -> None Rename operation is not supported for in this provider. :returns: None :raises NotSupportedError: Always raised since rename is not supported for SftpDataset. .. py:method:: close() -> None Close any open connections or resources. :returns: None .. py:method:: _get_folder_and_file_path() -> str Get combined path of folder_path and file_name, using forward slashes. This ensures consistent path formatting across Windows, Linux, and macOS. It also replaces any Windows-style backslashes with forward slashes. :returns: The full file path as a POSIX-style string. :rtype: str .. py:method:: _ensure_file_does_not_exist(remote_path: str) -> None Ensure the target file does not already exist on the SFTP server. :param remote_path: Full target file path on the SFTP server. :type remote_path: str :raises FileExistsError: If the target file already exists. .. py:method:: _list_directory(path: str) -> list[paramiko.SFTPAttributes] List the files in the specified directory on the SFTP server. :param path: The directory path to list files from. :type path: str :returns: A list of SFTPAttributes for the files in the directory. :rtype: list[SFTPAttributes] .. py:method:: _get_files_by_pattern(path: str, fnmatch_pattern: str) -> list[paramiko.SFTPAttributes] Get files from the SFTP server that match the specified pattern. :param path: The directory path to search for files. :type path: str :param fnmatch_pattern: The pattern to match file names against. :type fnmatch_pattern: str :returns: A list of SFTPAttributes for the matching files. :rtype: list[SFTPAttributes] .. py:method:: _ensure_sftp_directory(remote_directory: str, max_depth: int = 20) -> None Ensure that the specified directory exists on the SFTP server. If it does not exist, create it. :param remote_directory: The directory path to ensure on the SFTP server. :type remote_directory: str :param max_depth: The maximum directory depth to traverse when ensuring the directory exists. Default is 20. :type max_depth: int :returns: None :raises CreateError: If the maximum directory depth is exceeded while ensuring the SFTP directory. .. py:method:: _read_files_as_dataframe(files: list[paramiko.SFTPAttributes]) -> pandas.DataFrame Read the dataset from the SFTP server as a dataframe. :param files: List of SFTPAttributes for the files to read. :type files: list[SFTPAttributes] :returns: The combined data from the files as a single DataFrame. :rtype: pd.DataFrame .. py:method:: _list_directory_files(files: list[paramiko.SFTPAttributes]) -> pandas.DataFrame List the files in the directory as a dataframe. :param files: List of SFTPAttributes for the files to list. :type files: list[SFTPAttributes] :returns: A dataframe containing the file information. :rtype: pd.DataFrame .. py:class:: SftpDatasetSettings Bases: :py:obj:`ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings` Settings for the SFTP dataset. .. py:attribute:: folder_path :type: str Path to the folder containing the file(s) to read/write on the SFTP server. .. py:attribute:: file_name :type: str Name of the file to read/write on the SFTP server. .. py:attribute:: list :type: ListSettings Settings for listing the SFTP dataset.