ds_stoa.fetch._fetch

Module for fetching data from GraspDP datalake.

This module provides methods for fetching data from the GraspDP datalake. It enables users to retrieve data efficiently using pre-signed URLs provided by the Stoa API. The primary functionality is encapsulated in the fetch function, which retrieves data in parallel from multiple URLs and returns a consolidated Pandas DataFrame.

Dependencies: - pandas: For handling and consolidating data into DataFrames. - requests: For making HTTP requests to fetch data from URLs. - concurrent.futures: For parallel execution of data fetching. - utils.logger: For logging errors and information.

Example usage:

pre_signed_urls = {
    "file1": "http://example.com/data1.parquet",
    "file2": "http://example.com/data2.parquet",
}
dataframe = fetch(pre_signed_urls)
print(dataframe)

Attributes

LOGGER

Functions

fetch_url(→ io.BytesIO)

Fetch data from a given URL and return it as a BytesIO object.

fetch(→ pandas.DataFrame)

Fetch data from a collection of pre-signed URLs in

Module Contents

ds_stoa.fetch._fetch.LOGGER
ds_stoa.fetch._fetch.fetch_url(url: str) io.BytesIO

Fetch data from a given URL and return it as a BytesIO object.

Parameters:

url (str) – The URL to fetch the data from.

Returns:

A BytesIO object containing the fetched data.

Return type:

BytesIO

Example:

>>> fetch_url("http://example.com/data.parquet")
ds_stoa.fetch._fetch.fetch(pre_signed_urls: Dict) pandas.DataFrame

Fetch data from a collection of pre-signed URLs in parallel and consolidate into a single DataFrame.

Parameters:

pre_signed_urls (Dict[str, str]) – A dictionary where keys are identifiers and values are pre-signed URLs.

Returns:

A consolidated Pandas DataFrame containing data from all fetched URLs.

Return type:

pd.DataFrame

Example:

pre_signed_urls = {
    "file1": "http://example.com/data1.parquet",
    "file2": "http://example.com/data2.parquet",
}
dataframe = fetch(pre_signed_urls)