Library utilities

Connection to metadata services

DatasetAPIProvider

class algoseek_connector.dataset_api.DatasetAPIProvider(config: DatasetAPIConfiguration | None = None)

Provide access to metadata API v2.

By default, a timeout of 5 s is set to all requests.

Parameters:
tokenAuthToken

The API authentication token.

Methods

get:

Request data from an endpoint using the GET method.

list_datagroups:

List available data groups.

list_datasets:

List available datasets.

get_dataset:

Get the metadata of a dataset.

get_data_group:

Get the metadata of a data group.

get_dataset_details:

Get extended information of a dataset.

get_dataset_name:

Get the dataset name used when creating dataset instances.

get_dataset_destination_id:

Get a dataset destination id using a dataset name.

get(endpoint: str, **kwargs) Response

Request data using the GET method.

Parameters:
endpointstr

The endpoint to request data from

kwargsdict

Keyword arguments passed to requests.Session.get()

get_data_group(internal_name: str) DataGroupApiInfo

Retrieve data group information.

Parameters:
internal_namestr

The data group internal name as registered in the dataset API.

get_dataset(destination_id: int) DatasetVersionApiInfo

Retrieve dataset destination information.

Parameters:
destination_idint

The dataset destination id as registered in the dataset API.

get_dataset_destination_id(name: str) int

Get the dataset destination id from its name.

get_dataset_details(destination_id: int) DatasetDetails

Retrieve dataset schema and long description.

Parameters:
destination_idint

The dataset destination id as registered in the dataset API.

get_dataset_name(destination_id: int) str

Create a unique display name for a dataset.

list_data_groups() list[str]

List all available data groups.

list_dataset_destinations() list[int]

List all available dataset destinations.

Client Protocols

Implementation of the ClientProtocol for the different data sources. These classes should not be instantiated directly.

ClickHouseClient

class algoseek_connector.clickhouse.ClickHouseClient(client: Client)

Manage dataset retrieval from ClickHouse DB.

Parameters:
clientclickhouse_connect.Client

Methods

create_function_handle:

Create a FunctionHandle instance.

execute:

Execute raw SQL queries.

download:

Not Implemented.

fetch:

Retrieve data in Python native format using sqlalchemy.sql.selectable.Select.

fetch_iter:

Retrieve data in Python native format using sqlalchemy.sql.selectable.Select. Stream results.

fetch_dataframe:

Retrieve data as a Pandas DataFrame using sqlalchemy.sql.selectable.Select.

fetch_iter_dataframe:

Retrieve data as a Pandas DataFrame using sqlalchemy.sql.selectable.Select. Stream results.

list_datagroups:

List available data groups.

list_datasets:

List available datasets.

get_dataset_columns:

Create a list of sqlalchemy.Column for a dataset.

compile:

Converts a sqlalchemy.sql.selectable.Select into a

Store_to_s3:

Store query results into a S3 object.

compile(stmt: Select, **kwargs) CompiledQuery

Convert a stmt into an SQL string.

create_function_handle() FunctionHandle

Get a FunctionHandler instance.

download(dataset: str, download_path: Path, date: date | str | tuple[date | str, date | str], symbols: str | list[str], expiration_date: date | str | tuple[date | str, date | str])

Not implemented.

execute(sql: str, parameters: dict | None = None, output: str = 'python', **kwargs)

Execute raw SQL queries.

Parameters:
sqlstr

Parametrized sql query.

parametersdict or None, default=None

Query parameters.

output{“python”, “dataframe”}

Wether to output data using a dictionary or a Pandas DataFrame.

kwargs

Optional parameters passed to clickhouse-connect Client.query method.

Returns:
dict or pandas.DataFrame
fetch(query: CompiledQuery, **kwargs) dict[str, tuple]

Retrieve data using a select statement.

Parameters:
queryCompiledQuery

The query statement to fetch.

kwargs

Optional parameters passed to clickhouse-connect Client.query method.

Returns:
dict[str, tuple]

A mapping from column names to values retrieved.

fetch_dataframe(query: CompiledQuery, **kwargs) DataFrame

Execute a Select statement and output data as a Pandas DataFrame.

Parameters:
queryCompiledQuery

The query statement to fetch.

kwargs

Optional parameters passed to clickhouse-connect Client.query_df method.

Returns:
pandas.DataFrame
fetch_iter(query: CompiledQuery, size: int, **kwargs) Generator[dict[str, tuple], None, None]

Retrieve data with result streaming using a select statement.

Parameters:
queryCompiledQuery

The query statement to fetch.

sizeint

Sets the max_block_size_parameter of the ClickHouse DataBase. Values lower than 8912 are ignored. Overwrites values passed using settings as optional parameter

kwargs

Optional parameters passed to clickhouse-connect Client.query_column_block_stream method.

Yields:
dict[str, tuple]

A mapping from column names to values retrieved.

fetch_iter_dataframe(query: CompiledQuery, size: int, **kwargs) Generator[DataFrame, None, None]

Yield pandas DataFrame in chunks.

Parameters:
queryCompiledQuery

The query statement to fetch.

sizeint

Sets the max_block_size_parameter of the ClickHouse DataBase. Values lower than 8912 are ignored. Overwrites values passed using settings as optional parameter

kwargs

Optional parameters passed to clickhouse-connect Client.query_df_stream method.

Yields:
pandas.DataFrame
get_dataset_columns(group: str, dataset: str) list[Column]

Create SQLAlchemy columns for the dataset.

Parameters:
groupstr

Data group name.

datasetstr

Dataset name.

Returns:
DatasetMetadata
Raises:
ValueError

If an invalid data group or dataset name are provided.

list_datagroups() list[str]

List available groups.

list_datasets(group: str) list[str]

List available datasets in the data group.

store_to_s3(query: CompiledQuery, bucket: str, key: str, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, **kwargs)

Execute a query and store results into an S3 object.

Parameters:
queryCompiledQuery

The query statement to fetch.

bucketstr

The bucket name used to store the query.

keystr

The name of the object where the query is going to be stored.

profile_namestr or None, default=None

If a profile name is specified, the access key and secret key are retrieved from ~/.aws/credentials and the parameters aws_access_key_id and aws_secret_access_key are ignored. If None, this field is ignored.

aws_access_key_idstr or None, default=None

The AWS access key associated with an IAM user or role.

aws_secret_access_keystr or None, default=None

Thee secret key associated with the access key.

kwargs

Key-value arguments passed to clickhouse-connect Client.query method.

Raises:
ValueError

If a non-existing bucket name is passed or if trying to overwrite an existing object.

S3DownloaderClient

class algoseek_connector.s3.S3DownloaderClient(session: Session, api: DatasetAPIProvider)

ClientProtocol for downloading files from S3.

Parameters:
sessionboto3.Session
api: :py:class:`algoseek_connector.metadata_api.BaseAPIConsumer`

Methods

create_function_handle:

Not Implemented.

execute:

Not Implemented.

download:

Download dataset files using filters.

fetch:

Not Implemented.

fetch_iter:

Not Implemented.

fetch_dataframe:

Not Implemented.

fetch_iter_dataframe:

Not Implemented.

list_datagroups:

List available data groups.

list_datasets:

List available datasets.

get_dataset_columns:

Not Implemented.

compile:

Not Implemented.

Store_to_s3:

Not Implemented.

compile(stmt)

Compile a SQLAlchemy Select statement into a CompiledQuery.

create_function_handle()

Create a FunctionHandle instance.

download(dataset_text_id: str, download_path: Path, date: date | str | tuple[date | str, date | str], symbols: str | list[str], expiration_date: date | str | tuple[date | str, date | str] | None = None)

Download data from the dataset.

Parameters:
dataset_text_idstr

The dataset text id.

download_pathpathlib.Path

Path to a directory to download dataset files.

datestr, datetime.date or tuple

Download data in this date range. Dates can be passed as a str with yyyymmdd format or as date objects. If a tuple is passed, it is interpreted as a date range and all dates in the closed interval between the two dates are generated. I a single date is passed, download data from this specific date.

symbolsstr or list[str]

Download data associated with these symbols.

expiration_datestr, datetime.date or tuple

Download data with expiration dates in this date range. Dates must be passed used the same format used for the date parameter.

execute(sql: str, parameters: dict | None, output: str, **kwargs)

Execute raw SQL query.

fetch(query, **kwargs)

Fetch a select query.

fetch_dataframe(query, **kwargs)

Fetch a select query and output results as a Pandas DataFrame.

fetch_iter(query, size: int, **kwargs)

Yield a select query in chunks.

fetch_iter_dataframe(query, size: int, **kwargs)

Yield a select query in chunks, using pandas DataFrames.

get_dataset_columns(group: str, dataset: str) DataSetDescription

Create a dataset.

list_datagroups() list[str]

List available data groups.

list_datasets(group_text_id: str) list[str]

List available data groups.

store_to_s3(query: CompiledQuery, path: str, aws_key_id: str, aws_secret_access_key: str)

Download query to S3.