ds_protocol_sftp_py_lib

File: __init__.py Region: ds-protocol-sftp-py-lib

Description

A Python package from the ds-protocol-sftp-py-lib library.

Example

from ds_protocol_sftp_py_lib import __version__

print(f"Package version: {__version__}")

Submodules

Attributes

__version__

Classes

SftpDataset

Tabular dataset object which identifies data within a data store,

SftpDatasetSettings

Settings for the SFTP dataset.

SftpLinkedService

SFTP Linked Service implementation.

SftpLinkedServiceSettings

Settings for SFTP Linked Service connections.

Package Contents

class ds_protocol_sftp_py_lib.SftpDataset[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.TabularDataset[SftpLinkedServiceType, SftpDatasetSettingsType, ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer, ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer], Generic[SftpLinkedServiceType, SftpDatasetSettingsType]

Tabular dataset object which identifies data within a data store, such as table/csv/json/parquet/parquetdataset/ and other documents.

The input of the dataset is a pandas DataFrame. The output of the dataset is a pandas DataFrame.

linked_service: SftpLinkedServiceType
settings: SftpDatasetSettingsType
serializer: ds_resource_plugin_py_lib.common.serde.serialize.PandasSerializer
deserializer: ds_resource_plugin_py_lib.common.serde.deserialize.PandasDeserializer
property type: ds_protocol_sftp_py_lib.enums.ResourceType

Get the type of the dataset.

read() None[source]

Read files from the SFTP server.

Returns:

The output is stored in the output attribute as a DataFrame containing the contents of the matched files.

Return type:

None

Raises:

ReadError – If there is an error reading from the SFTP dataset.

create() None[source]

Create data on the SFTP server.

Note

This method is not idempotent. If called multiple times with the same parameters, it will raise a CreateError if the file already exists. If a network or server error occurs after the file is created but before the method returns, retrying may result in a CreateError due to the file’s existence. Orchestration and retry policies should account for this non-idempotent behavior.

Returns:

None

Raises:

CreateError – If there is an error creating the dataset on the SFTP server, or if the file already exists.

update() None[source]

Update operation is not supported for in this provider.

Returns:

None

Raises:

NotSupportedError – Always raised since update is not supported for SftpDataset.

upsert() None[source]

Upsert a file on the SFTP server. If the file already exists, it will be overwritten.

Returns:

None

Raises:

UpsertError – If there is an error upserting the dataset on the SFTP server.

delete() None[source]

Delete operation is not supported for in this provider.

Returns:

None

Raises:

NotSupportedError – Always raised since delete is not supported for SftpDataset.

purge() None[source]

Purge the dataset, deleting all files matching the pattern from the SFTP server.

Returns:

None

Raises:

PurgeError – If there is an error purging files from the SFTP server

list() None[source]

List the files in the directory on the SFTP server based on the specified pattern and settings.

Returns:

The output is stored in the output attribute as a DataFrame containing the file information.

Return type:

None

Raises:

ListError – If there is an error listing the files in the SFTP dataset.

rename() None[source]

Rename operation is not supported for in this provider.

Returns:

None

Raises:

NotSupportedError – Always raised since rename is not supported for SftpDataset.

close() None[source]

Close any open connections or resources.

Returns:

None

_get_folder_and_file_path() str[source]

Get combined path of folder_path and file_name, using forward slashes. This ensures consistent path formatting across Windows, Linux, and macOS. It also replaces any Windows-style backslashes with forward slashes.

Returns:

The full file path as a POSIX-style string.

Return type:

str

_ensure_file_does_not_exist(remote_path: str) None[source]

Ensure the target file does not already exist on the SFTP server.

Parameters:

remote_path (str) – Full target file path on the SFTP server.

Raises:

FileExistsError – If the target file already exists.

_list_directory(path: str) list[paramiko.SFTPAttributes][source]

List the files in the specified directory on the SFTP server.

Parameters:

path (str) – The directory path to list files from.

Returns:

A list of SFTPAttributes for the files in the directory.

Return type:

list[SFTPAttributes]

_get_files_by_pattern(path: str, fnmatch_pattern: str) list[paramiko.SFTPAttributes][source]

Get files from the SFTP server that match the specified pattern.

Parameters:
  • path (str) – The directory path to search for files.

  • fnmatch_pattern (str) – The pattern to match file names against.

Returns:

A list of SFTPAttributes for the matching files.

Return type:

list[SFTPAttributes]

_ensure_sftp_directory(remote_directory: str, max_depth: int = 20) None[source]

Ensure that the specified directory exists on the SFTP server. If it does not exist, create it.

Parameters:
  • remote_directory (str) – The directory path to ensure on the SFTP server.

  • max_depth (int) – The maximum directory depth to traverse when ensuring the directory exists. Default is 20.

Returns:

None

Raises:

CreateError – If the maximum directory depth is exceeded while ensuring the SFTP directory.

_read_files_as_dataframe(files: list[paramiko.SFTPAttributes]) pandas.DataFrame[source]

Read the dataset from the SFTP server as a dataframe.

Parameters:

files (list[SFTPAttributes]) – List of SFTPAttributes for the files to read.

Returns:

The combined data from the files as a single DataFrame.

Return type:

pd.DataFrame

_list_directory_files(files: list[paramiko.SFTPAttributes]) pandas.DataFrame[source]

List the files in the directory as a dataframe.

Parameters:

files (list[SFTPAttributes]) – List of SFTPAttributes for the files to list.

Returns:

A dataframe containing the file information.

Return type:

pd.DataFrame

class ds_protocol_sftp_py_lib.SftpDatasetSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.dataset.DatasetSettings

Settings for the SFTP dataset.

folder_path: str

Path to the folder containing the file(s) to read/write on the SFTP server.

file_name: str

Name of the file to read/write on the SFTP server.

list: ListSettings

Settings for listing the SFTP dataset.

class ds_protocol_sftp_py_lib.SftpLinkedService[source]

Bases: ds_resource_plugin_py_lib.common.resource.linked_service.LinkedService[SftpLinkedServiceSettingsType], Generic[SftpLinkedServiceSettingsType]

SFTP Linked Service implementation.

settings

Linked service settings.

Type:

SftpLinkedServiceSettingsType

_connection

Underlying SFTP client connection.

Type:

SFTPClient | None

_sftp

Sftp provider instance.

Type:

Sftp | None

settings: SftpLinkedServiceSettingsType
_sftp: ds_protocol_sftp_py_lib.utils.sftp.provider.Sftp | None = None
property type: ds_protocol_sftp_py_lib.enums.ResourceType

Get the type of linked service.

Returns:

The type of the linked service.

Return type:

ResourceType

property connection: ds_protocol_sftp_py_lib.utils.sftp.provider.Sftp

Get the SFTP client connection.

Returns:

The active SFTP client connection.

Return type:

Sftp

Raises:

ConnectionError – If the connection is not initialized.

_init_sftp() ds_protocol_sftp_py_lib.utils.sftp.provider.Sftp[source]

Initialize the Sftp client.

Returns:

An initialized Sftp provider instance.

Return type:

Sftp

connect() None[source]

Initialize the Sftp client instance if not already initialized.

Raises:
  • ConnectionError – If connection fails.

  • AuthenticationError – If authentication fails.

test_connection() tuple[bool, str][source]

Perform a lightweight health check against the SFTP backend.

Uses the SFTP client’s listdir method to check connectivity and authentication.

Returns:

  • (True, message) if successful.

  • (False, error message) otherwise.

Return type:

tuple[bool, str]

close() None[source]

Close the linked service.

Sets the _sftp attribute to None to indicate the connection is closed.

Raises:

ConnectionError – If closing the SFTP connection fails.

class ds_protocol_sftp_py_lib.SftpLinkedServiceSettings[source]

Bases: ds_resource_plugin_py_lib.common.resource.linked_service.LinkedServiceSettings

Settings for SFTP Linked Service connections.

host

SFTP server hostname.

Type:

str

username

Username for authentication.

Type:

str

password

Password for authentication.

Type:

str | None

private_key

Private key for authentication.

Type:

str | None

passphrase

Passphrase for private key.

Type:

str | None

timeout

Connection timeout in seconds.

Type:

float | None

host_key_fingerprint

Expected host key fingerprint.

Type:

str | None

host_key_validation

Whether to validate host key.

Type:

bool

port

SFTP server port.

Type:

int

host: str

Hostname or IP address of the SFTP server.

username: str

Username for authentication.

password: str | None = None

Password for authentication.

private_key: str | None = None

Private key for authentication.

passphrase: str | None = None

Passphrase for private key.

timeout: float | None = None

Connection timeout in seconds.

host_key_fingerprint: str | None = None

Expected host key fingerprint (base64-encoded MD5, as produced by Paramiko’s get_fingerprint(); e.g., ‘AbCdEfGhIjKlMnOpQrStUvWxYz0123456789abcdEf==’).

host_key_validation: bool = True

Whether to validate host key.

port: int = 22

SFTP server port.

ds_protocol_sftp_py_lib.__version__