ds_stoa.fetch

This module serves as the entry point for the data fetching functionality from the GraspDP datalake.

It exposes the fetch function, which is designed to retrieve data from the datalake using pre-signed URLs. This function is capable of fetching data in parallel, significantly improving performance for large datasets.

The fetch function returns the data as a Pandas DataFrame, making it immediately useful for data analysis and manipulation tasks.

Example usage:

from ds_stoa.fetch import fetch

# Example pre-signed URLs (these would be provided by your data provider)
pre_signed_urls = {
    "dataset1": "http://example.com/path/to/dataset1.parquet",
    "dataset2": "http://example.com/path/to/dataset2.parquet",
}

# Fetching data and loading it into a DataFrame
dataframe = fetch(pre_signed_urls)
print(dataframe)

Submodules

Functions

fetch(→ pandas.DataFrame)

Fetch data from a collection of pre-signed URLs in

Package Contents

ds_stoa.fetch.fetch(pre_signed_urls: Dict) pandas.DataFrame

Fetch data from a collection of pre-signed URLs in parallel and consolidate into a single DataFrame.

Parameters:

pre_signed_urls (Dict[str, str]) – A dictionary where keys are identifiers and values are pre-signed URLs.

Returns:

A consolidated Pandas DataFrame containing data from all fetched URLs.

Return type:

pd.DataFrame

Example:

pre_signed_urls = {
    "file1": "http://example.com/data1.parquet",
    "file2": "http://example.com/data2.parquet",
}
dataframe = fetch(pre_signed_urls)