ds_protocol_sftp_py_lib.dataset.sftp¶
File: sftp.py
Region: ds_protocol_sftp_py_lib/dataset/sftp
SFTP Dataset
This module implements a dataset for SFTP connections.
Example
>>> from ds_protocol_sftp_py_lib.dataset import SftpDataset, SftpDatasetSettings
>>> from ds_protocol_sftp_py_lib.linked_service import SftpLinkedService, SftpLinkedServiceSettings
>>> from ds_resource_plugin_py_lib.common.serde.deserialize import PandasDeserializer
>>> from ds_resource_plugin_py_lib.common.serde.serialize import PandasSerializer
>>> from ds_resource_plugin_py_lib.common.resource.dataset import DatasetStorageFormatType
>>> dataset = SftpDataset(
... deserializer=PandasDeserializer(format=DatasetStorageFormatType.JSON),
... serializer=PandasSerializer(format=DatasetStorageFormatType.JSON),
... settings=SftpDatasetSettings(
... folder_path="/path/to/dataset",
... file_name="dataset_*.json",
... ),
... linked_service=SftpLinkedService(
... settings=SftpLinkedServiceSettings(
... host="sftp.example.com",
... username="user",
... password="password123",
... encrypted_credential="encrypted_cred",
... private_key=None,
... passphrase=None,
... timeout=30.0,
... host_key_fingerprint=None,
... host_key_validation=True,
... port=22,
... ),
... ),
... )
>>> dataset.read()
>>> data = dataset.output
Attributes¶
Classes¶
Settings for listing the SFTP dataset. |
|
Settings for the SFTP dataset. |
|
Tabular dataset object which identifies data within a data store, |
Module Contents¶
- ds_protocol_sftp_py_lib.dataset.sftp.logger¶
- class ds_protocol_sftp_py_lib.dataset.sftp.ListSettings[source]¶
Bases:
ds_common_serde_py_lib.SerializableSettings for listing the SFTP dataset.
- download: bool = False¶
Whether to download (supply the dataframe with content) the files when listing the SFTP dataset.
- class ds_protocol_sftp_py_lib.dataset.sftp.SftpDatasetSettings[source]¶
Bases:
ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettingsSettings for the SFTP dataset.
- folder_path: str¶
Path to the folder containing the file(s) to read/write on the SFTP server.
- file_name: str¶
Name of the file to read/write on the SFTP server.
- list: ListSettings¶
Settings for listing the SFTP dataset.
- ds_protocol_sftp_py_lib.dataset.sftp.SftpDatasetSettingsType¶
- ds_protocol_sftp_py_lib.dataset.sftp.SftpLinkedServiceType¶
- class ds_protocol_sftp_py_lib.dataset.sftp.SftpDataset[source]¶
Bases:
ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[SftpLinkedServiceType,SftpDatasetSettingsType,ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer,ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer],Generic[SftpLinkedServiceType,SftpDatasetSettingsType]Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.
The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.
- linked_service: SftpLinkedServiceType¶
- settings: SftpDatasetSettingsType¶
- serializer: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer¶
- deserializer: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer¶
- property type: ds_protocol_sftp_py_lib.enums.ResourceType¶
Get the type of the dataset.
- read() None[source]¶
Read files from the SFTP server.
- Returns:
The output is stored in the output attribute as a DataFrame containing the contents of the matched files.
- Return type:
None
- Raises:
ReadError – If there is an error reading from the SFTP dataset.
- create() None[source]¶
Create data on the SFTP server.
Note
This method is not idempotent. If called multiple times with the same parameters, it will raise a CreateError if the file already exists. If a network or server error occurs after the file is created but before the method returns, retrying may result in a CreateError due to the file’s existence. Orchestration and retry policies should account for this non-idempotent behavior.
- Returns:
None
- Raises:
CreateError – If there is an error creating the dataset on the SFTP server, or if the file already exists.
- update() None[source]¶
Update operation is not supported for in this provider.
- Returns:
None
- Raises:
NotSupportedError – Always raised since update is not supported for SftpDataset.
- upsert() None[source]¶
Upsert a file on the SFTP server. If the file already exists, it will be overwritten.
- Returns:
None
- Raises:
UpsertError – If there is an error upserting the dataset on the SFTP server.
- delete() None[source]¶
Delete operation is not supported for in this provider.
- Returns:
None
- Raises:
NotSupportedError – Always raised since delete is not supported for SftpDataset.
- purge() None[source]¶
Purge the dataset, deleting all files matching the pattern from the SFTP server.
- Returns:
None
- Raises:
PurgeError – If there is an error purging files from the SFTP server
- list() None[source]¶
List the files in the directory on the SFTP server based on the specified pattern and settings.
- Returns:
The output is stored in the output attribute as a DataFrame containing the file information.
- Return type:
None
- Raises:
ListError – If there is an error listing the files in the SFTP dataset.
- rename() None[source]¶
Rename operation is not supported for in this provider.
- Returns:
None
- Raises:
NotSupportedError – Always raised since rename is not supported for SftpDataset.
- _get_folder_and_file_path() str[source]¶
Get combined path of folder_path and file_name, using forward slashes. This ensures consistent path formatting across Windows, Linux, and macOS. It also replaces any Windows-style backslashes with forward slashes.
- Returns:
The full file path as a POSIX-style string.
- Return type:
str
- _ensure_file_does_not_exist(remote_path: str) None[source]¶
Ensure the target file does not already exist on the SFTP server.
- Parameters:
remote_path (str) – Full target file path on the SFTP server.
- Raises:
FileExistsError – If the target file already exists.
- _list_directory(path: str) list[paramiko.SFTPAttributes][source]¶
List the files in the specified directory on the SFTP server.
- Parameters:
path (str) – The directory path to list files from.
- Returns:
A list of SFTPAttributes for the files in the directory.
- Return type:
list[SFTPAttributes]
- _get_files_by_pattern(path: str, fnmatch_pattern: str) list[paramiko.SFTPAttributes][source]¶
Get files from the SFTP server that match the specified pattern.
- Parameters:
path (str) – The directory path to search for files.
fnmatch_pattern (str) – The pattern to match file names against.
- Returns:
A list of SFTPAttributes for the matching files.
- Return type:
list[SFTPAttributes]
- _ensure_sftp_directory(remote_directory: str, max_depth: int = 20) None[source]¶
Ensure that the specified directory exists on the SFTP server. If it does not exist, create it.
- Parameters:
remote_directory (str) – The directory path to ensure on the SFTP server.
max_depth (int) – The maximum directory depth to traverse when ensuring the directory exists. Default is 20.
- Returns:
None
- Raises:
CreateError – If the maximum directory depth is exceeded while ensuring the SFTP directory.
- _read_files_as_dataframe(files: list[paramiko.SFTPAttributes]) pandas.DataFrame[source]¶
Read the dataset from the SFTP server as a dataframe.
- Parameters:
files (list[SFTPAttributes]) – List of SFTPAttributes for the files to read.
- Returns:
The combined data from the files as a single DataFrame.
- Return type:
pd.DataFrame
- _list_directory_files(files: list[paramiko.SFTPAttributes]) pandas.DataFrame[source]¶
List the files in the directory as a dataframe.
- Parameters:
files (list[SFTPAttributes]) – List of SFTPAttributes for the files to list.
- Returns:
A dataframe containing the file information.
- Return type:
pd.DataFrame