Library utilities¶
Connection to metadata services¶
DatasetAPIProvider¶
- class algoseek_connector.dataset_api.DatasetAPIProvider(config: DatasetAPIConfiguration | None = None)¶
Provide access to metadata API v2.
By default, a timeout of 5 s is set to all requests.
- Parameters:
- tokenAuthToken
The API authentication token.
Methods
get:
Request data from an endpoint using the GET method.
list_datagroups:
List available data groups.
list_datasets:
List available datasets.
get_dataset:
Get the metadata of a dataset.
get_data_group:
Get the metadata of a data group.
get_dataset_details:
Get extended information of a dataset.
get_dataset_name:
Get the dataset name used when creating dataset instances.
get_dataset_destination_id:
Get a dataset destination id using a dataset name.
- get(endpoint: str, **kwargs) Response¶
Request data using the GET method.
- Parameters:
- endpointstr
The endpoint to request data from
- kwargsdict
Keyword arguments passed to
requests.Session.get()
- get_data_group(internal_name: str) DataGroupApiInfo¶
Retrieve data group information.
- Parameters:
- internal_namestr
The data group internal name as registered in the dataset API.
- get_dataset(destination_id: int) DatasetVersionApiInfo¶
Retrieve dataset destination information.
- Parameters:
- destination_idint
The dataset destination id as registered in the dataset API.
Client Protocols¶
Implementation of the ClientProtocol for the different data sources. These classes should not be instantiated directly.
ClickHouseClient¶
- class algoseek_connector.clickhouse.ClickHouseClient(client: Client)¶
Manage dataset retrieval from ClickHouse DB.
- Parameters:
- clientclickhouse_connect.Client
Methods
create_function_handle:
Create a FunctionHandle instance.
execute:
Execute raw SQL queries.
download:
Not Implemented.
fetch:
Retrieve data in Python native format using
sqlalchemy.sql.selectable.Select.fetch_iter:
Retrieve data in Python native format using
sqlalchemy.sql.selectable.Select. Stream results.fetch_dataframe:
Retrieve data as a Pandas DataFrame using
sqlalchemy.sql.selectable.Select.fetch_iter_dataframe:
Retrieve data as a Pandas DataFrame using
sqlalchemy.sql.selectable.Select. Stream results.list_datagroups:
List available data groups.
list_datasets:
List available datasets.
get_dataset_columns:
Create a list of
sqlalchemy.Columnfor a dataset.compile:
Converts a
sqlalchemy.sql.selectable.Selectinto aStore_to_s3:
Store query results into a S3 object.
- compile(stmt: Select, **kwargs) CompiledQuery¶
Convert a stmt into an SQL string.
- create_function_handle() FunctionHandle¶
Get a FunctionHandler instance.
- download(dataset: str, download_path: Path, date: date | str | tuple[date | str, date | str], symbols: str | list[str], expiration_date: date | str | tuple[date | str, date | str])¶
Not implemented.
- execute(sql: str, parameters: dict | None = None, output: str = 'python', **kwargs)¶
Execute raw SQL queries.
- Parameters:
- sqlstr
Parametrized sql query.
- parametersdict or None, default=None
Query parameters.
- output{“python”, “dataframe”}
Wether to output data using a dictionary or a Pandas DataFrame.
- kwargs
Optional parameters passed to clickhouse-connect Client.query method.
- Returns:
- dict or pandas.DataFrame
- fetch(query: CompiledQuery, **kwargs) dict[str, tuple]¶
Retrieve data using a select statement.
- Parameters:
- queryCompiledQuery
The query statement to fetch.
- kwargs
Optional parameters passed to clickhouse-connect Client.query method.
- Returns:
- dict[str, tuple]
A mapping from column names to values retrieved.
- fetch_dataframe(query: CompiledQuery, **kwargs) DataFrame¶
Execute a Select statement and output data as a Pandas DataFrame.
- Parameters:
- queryCompiledQuery
The query statement to fetch.
- kwargs
Optional parameters passed to clickhouse-connect Client.query_df method.
- Returns:
- pandas.DataFrame
- fetch_iter(query: CompiledQuery, size: int, **kwargs) Generator[dict[str, tuple], None, None]¶
Retrieve data with result streaming using a select statement.
- Parameters:
- queryCompiledQuery
The query statement to fetch.
- sizeint
Sets the max_block_size_parameter of the ClickHouse DataBase. Values lower than
8912are ignored. Overwrites values passed using settings as optional parameter- kwargs
Optional parameters passed to clickhouse-connect Client.query_column_block_stream method.
- Yields:
- dict[str, tuple]
A mapping from column names to values retrieved.
- fetch_iter_dataframe(query: CompiledQuery, size: int, **kwargs) Generator[DataFrame, None, None]¶
Yield pandas DataFrame in chunks.
- Parameters:
- queryCompiledQuery
The query statement to fetch.
- sizeint
Sets the max_block_size_parameter of the ClickHouse DataBase. Values lower than
8912are ignored. Overwrites values passed using settings as optional parameter- kwargs
Optional parameters passed to clickhouse-connect Client.query_df_stream method.
- Yields:
- pandas.DataFrame
- get_dataset_columns(group: str, dataset: str) list[Column]¶
Create SQLAlchemy columns for the dataset.
- Parameters:
- groupstr
Data group name.
- datasetstr
Dataset name.
- Returns:
- DatasetMetadata
- Raises:
- ValueError
If an invalid data group or dataset name are provided.
- store_to_s3(query: CompiledQuery, bucket: str, key: str, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, **kwargs)¶
Execute a query and store results into an S3 object.
- Parameters:
- queryCompiledQuery
The query statement to fetch.
- bucketstr
The bucket name used to store the query.
- keystr
The name of the object where the query is going to be stored.
- profile_namestr or None, default=None
If a profile name is specified, the access key and secret key are retrieved from ~/.aws/credentials and the parameters aws_access_key_id and aws_secret_access_key are ignored. If
None, this field is ignored.- aws_access_key_idstr or None, default=None
The AWS access key associated with an IAM user or role.
- aws_secret_access_keystr or None, default=None
Thee secret key associated with the access key.
- kwargs
Key-value arguments passed to clickhouse-connect Client.query method.
- Raises:
- ValueError
If a non-existing bucket name is passed or if trying to overwrite an existing object.
S3DownloaderClient¶
- class algoseek_connector.s3.S3DownloaderClient(session: Session, api: DatasetAPIProvider)¶
ClientProtocol for downloading files from S3.
- Parameters:
- sessionboto3.Session
- api: :py:class:`algoseek_connector.metadata_api.BaseAPIConsumer`
Methods
create_function_handle:
Not Implemented.
execute:
Not Implemented.
download:
Download dataset files using filters.
fetch:
Not Implemented.
fetch_iter:
Not Implemented.
fetch_dataframe:
Not Implemented.
fetch_iter_dataframe:
Not Implemented.
list_datagroups:
List available data groups.
list_datasets:
List available datasets.
get_dataset_columns:
Not Implemented.
compile:
Not Implemented.
Store_to_s3:
Not Implemented.
- compile(stmt)¶
Compile a SQLAlchemy Select statement into a CompiledQuery.
- create_function_handle()¶
Create a FunctionHandle instance.
- download(dataset_text_id: str, download_path: Path, date: date | str | tuple[date | str, date | str], symbols: str | list[str], expiration_date: date | str | tuple[date | str, date | str] | None = None)¶
Download data from the dataset.
- Parameters:
- dataset_text_idstr
The dataset text id.
- download_pathpathlib.Path
Path to a directory to download dataset files.
- datestr, datetime.date or tuple
Download data in this date range. Dates can be passed as a str with yyyymmdd format or as date objects. If a tuple is passed, it is interpreted as a date range and all dates in the closed interval between the two dates are generated. I a single date is passed, download data from this specific date.
- symbolsstr or list[str]
Download data associated with these symbols.
- expiration_datestr, datetime.date or tuple
Download data with expiration dates in this date range. Dates must be passed used the same format used for the date parameter.
- fetch(query, **kwargs)¶
Fetch a select query.
- fetch_dataframe(query, **kwargs)¶
Fetch a select query and output results as a Pandas DataFrame.
- fetch_iter_dataframe(query, size: int, **kwargs)¶
Yield a select query in chunks, using pandas DataFrames.
- get_dataset_columns(group: str, dataset: str) DataSetDescription¶
Create a dataset.
- store_to_s3(query: CompiledQuery, path: str, aws_key_id: str, aws_secret_access_key: str)¶
Download query to S3.