ds_stoa.fetch._fetch¶
Module for fetching data from GraspDP datalake.
This module provides methods for fetching data from the GraspDP datalake. It enables users to retrieve data efficiently using pre-signed URLs provided by the Stoa API. The primary functionality is encapsulated in the fetch function, which retrieves data in parallel from multiple URLs and returns a consolidated Pandas DataFrame.
Dependencies: - pandas: For handling and consolidating data into DataFrames. - requests: For making HTTP requests to fetch data from URLs. - concurrent.futures: For parallel execution of data fetching. - utils.logger: For logging errors and information.
Example usage:
pre_signed_urls = {
"file1": "http://example.com/data1.parquet",
"file2": "http://example.com/data2.parquet",
}
dataframe = fetch(pre_signed_urls)
print(dataframe)
Attributes¶
Functions¶
|
Fetch data from a given URL and return it as a BytesIO object. |
|
Fetch data from a collection of pre-signed URLs in |
Module Contents¶
- ds_stoa.fetch._fetch.LOGGER¶
- ds_stoa.fetch._fetch.fetch_url(url: str) io.BytesIO¶
Fetch data from a given URL and return it as a BytesIO object.
- Parameters:
url (str) – The URL to fetch the data from.
- Returns:
A BytesIO object containing the fetched data.
- Return type:
BytesIO
Example:
>>> fetch_url("http://example.com/data.parquet")
- ds_stoa.fetch._fetch.fetch(pre_signed_urls: Dict) pandas.DataFrame¶
Fetch data from a collection of pre-signed URLs in parallel and consolidate into a single DataFrame.
- Parameters:
pre_signed_urls (Dict[str, str]) – A dictionary where keys are identifiers and values are pre-signed URLs.
- Returns:
A consolidated Pandas DataFrame containing data from all fetched URLs.
- Return type:
pd.DataFrame
Example:
pre_signed_urls = { "file1": "http://example.com/data1.parquet", "file2": "http://example.com/data2.parquet", } dataframe = fetch(pre_signed_urls)