Library utilities¶

Connection to metadata services¶

DatasetAPIProvider¶

class algoseek_connector.dataset_api.DatasetAPIProvider(config: DatasetAPIConfiguration | None = None)¶

Provide access to metadata API v2.

By default, a timeout of 5 s is set to all requests.

Parameters:

tokenAuthToken: The API authentication token.

Methods

get:	Request data from an endpoint using the GET method.
list_datagroups:	List available data groups.
list_datasets:	List available datasets.
get_dataset:	Get the metadata of a dataset.
get_data_group:	Get the metadata of a data group.
get_dataset_details:	Get extended information of a dataset.
get_dataset_name:	Get the dataset name used when creating dataset instances.
get_dataset_destination_id:	Get a dataset destination id using a dataset name.

get(endpoint: str, **kwargs) → Response¶

Request data using the GET method.

Parameters:

endpointstr: The endpoint to request data from
kwargsdict: Keyword arguments passed to requests.Session.get()

get_data_group(internal_name: str) → DataGroupApiInfo¶

Retrieve data group information.

Parameters:

internal_namestr: The data group internal name as registered in the dataset API.

get_dataset(destination_id: int) → DatasetVersionApiInfo¶

Retrieve dataset destination information.

Parameters:

destination_idint: The dataset destination id as registered in the dataset API.

get_dataset_destination_id(name: str) → int¶: Get the dataset destination id from its name.

get_dataset_details(destination_id: int) → DatasetDetails¶

Retrieve dataset schema and long description.

Parameters:

destination_idint: The dataset destination id as registered in the dataset API.

get_dataset_name(destination_id: int) → str¶: Create a unique display name for a dataset.

list_data_groups() → list[str]¶: List all available data groups.

list_dataset_destinations() → list[int]¶: List all available dataset destinations.

Client Protocols¶

Implementation of the ClientProtocol for the different data sources. These classes should not be instantiated directly.

ClickHouseClient¶

class algoseek_connector.clickhouse.ClickHouseClient(client: Client)¶

Manage dataset retrieval from ClickHouse DB.

Parameters:

clientclickhouse_connect.Client

Methods

create_function_handle:	Create a FunctionHandle instance.
execute:	Execute raw SQL queries.
download:	Not Implemented.
fetch:	Retrieve data in Python native format using `sqlalchemy.sql.selectable.Select`.
fetch_iter:	Retrieve data in Python native format using `sqlalchemy.sql.selectable.Select`. Stream results.
fetch_dataframe:	Retrieve data as a Pandas DataFrame using `sqlalchemy.sql.selectable.Select`.
fetch_iter_dataframe:	Retrieve data as a Pandas DataFrame using `sqlalchemy.sql.selectable.Select`. Stream results.
list_datagroups:	List available data groups.
list_datasets:	List available datasets.
get_dataset_columns:	Create a list of `sqlalchemy.Column` for a dataset.
compile:	Converts a `sqlalchemy.sql.selectable.Select` into a
Store_to_s3:	Store query results into a S3 object.

compile(stmt: Select, **kwargs) → CompiledQuery¶: Convert a stmt into an SQL string.

create_function_handle() → FunctionHandle¶: Get a FunctionHandler instance.

download(dataset: str, download_path: Path, date: date | str | tuple[date | str, date | str], symbols: str | list[str], expiration_date: date | str | tuple[date | str, date | str])¶: Not implemented.

execute(sql: str, parameters: dict | None = None, output: str = 'python', **kwargs)¶

Execute raw SQL queries.

Parameters:

sqlstr: Parametrized sql query.
parametersdict or None, default=None: Query parameters.
output{“python”, “dataframe”}: Wether to output data using a dictionary or a Pandas DataFrame.
kwargs: Optional parameters passed to clickhouse-connect Client.query method.

Returns:

dict or pandas.DataFrame

fetch(query: CompiledQuery, **kwargs) → dict[str, tuple]¶

Retrieve data using a select statement.

Parameters:

queryCompiledQuery: The query statement to fetch.
kwargs: Optional parameters passed to clickhouse-connect Client.query method.

Returns:

dict[str, tuple]: A mapping from column names to values retrieved.

fetch_dataframe(query: CompiledQuery, **kwargs) → DataFrame¶

Execute a Select statement and output data as a Pandas DataFrame.

Parameters:

queryCompiledQuery: The query statement to fetch.
kwargs: Optional parameters passed to clickhouse-connect Client.query_df method.

Returns:

pandas.DataFrame

fetch_iter(query: CompiledQuery, size: int, **kwargs) → Generator[dict[str, tuple], None, None]¶

Retrieve data with result streaming using a select statement.

Parameters:

queryCompiledQuery: The query statement to fetch.
sizeint: Sets the max_block_size_parameter of the ClickHouse DataBase. Values lower than 8912 are ignored. Overwrites values passed using settings as optional parameter
kwargs: Optional parameters passed to clickhouse-connect Client.query_column_block_stream method.

Yields:

dict[str, tuple]: A mapping from column names to values retrieved.

fetch_iter_dataframe(query: CompiledQuery, size: int, **kwargs) → Generator[DataFrame, None, None]¶

Yield pandas DataFrame in chunks.

Parameters:

queryCompiledQuery: The query statement to fetch.
sizeint: Sets the max_block_size_parameter of the ClickHouse DataBase. Values lower than 8912 are ignored. Overwrites values passed using settings as optional parameter
kwargs: Optional parameters passed to clickhouse-connect Client.query_df_stream method.

Yields:

pandas.DataFrame

get_dataset_columns(group: str, dataset: str) → list[Column]¶

Create SQLAlchemy columns for the dataset.

Parameters:

groupstr: Data group name.
datasetstr: Dataset name.

Returns:

DatasetMetadata

Raises:

ValueError: If an invalid data group or dataset name are provided.

list_datagroups() → list[str]¶: List available groups.

list_datasets(group: str) → list[str]¶: List available datasets in the data group.

store_to_s3(query: CompiledQuery, bucket: str, key: str, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, **kwargs)¶

Execute a query and store results into an S3 object.

Parameters:

queryCompiledQuery: The query statement to fetch.
bucketstr: The bucket name used to store the query.
keystr: The name of the object where the query is going to be stored.
profile_namestr or None, default=None: If a profile name is specified, the access key and secret key are retrieved from ~/.aws/credentials and the parameters aws_access_key_id and aws_secret_access_key are ignored. If None, this field is ignored.
aws_access_key_idstr or None, default=None: The AWS access key associated with an IAM user or role.
aws_secret_access_keystr or None, default=None: Thee secret key associated with the access key.
kwargs: Key-value arguments passed to clickhouse-connect Client.query method.

Raises:

ValueError: If a non-existing bucket name is passed or if trying to overwrite an existing object.

S3DownloaderClient¶

class algoseek_connector.s3.S3DownloaderClient(session: Session, api: DatasetAPIProvider)¶

ClientProtocol for downloading files from S3.

Parameters:

sessionboto3.Session
api: :py:class:`algoseek_connector.metadata_api.BaseAPIConsumer`

Methods

create_function_handle:	Not Implemented.
execute:	Not Implemented.
download:	Download dataset files using filters.
fetch:	Not Implemented.
fetch_iter:	Not Implemented.
fetch_dataframe:	Not Implemented.
fetch_iter_dataframe:	Not Implemented.
list_datagroups:	List available data groups.
list_datasets:	List available datasets.
get_dataset_columns:	Not Implemented.
compile:	Not Implemented.
Store_to_s3:	Not Implemented.

compile(stmt)¶: Compile a SQLAlchemy Select statement into a CompiledQuery.

create_function_handle()¶: Create a FunctionHandle instance.

Download data from the dataset.

Parameters:

dataset_text_idstr: The dataset text id.
download_pathpathlib.Path: Path to a directory to download dataset files.
datestr, datetime.date or tuple: Download data in this date range. Dates can be passed as a str with yyyymmdd format or as date objects. If a tuple is passed, it is interpreted as a date range and all dates in the closed interval between the two dates are generated. I a single date is passed, download data from this specific date.
symbolsstr or list[str]: Download data associated with these symbols.
expiration_datestr, datetime.date or tuple: Download data with expiration dates in this date range. Dates must be passed used the same format used for the date parameter.

execute(sql: str, parameters: dict | None, output: str, **kwargs)¶: Execute raw SQL query.

fetch(query, **kwargs)¶: Fetch a select query.

fetch_dataframe(query, **kwargs)¶: Fetch a select query and output results as a Pandas DataFrame.

fetch_iter(query, size: int, **kwargs)¶: Yield a select query in chunks.

fetch_iter_dataframe(query, size: int, **kwargs)¶: Yield a select query in chunks, using pandas DataFrames.

get_dataset_columns(group: str, dataset: str) → DataSetDescription¶: Create a dataset.

list_datagroups() → list[str]¶: List available data groups.

list_datasets(group_text_id: str) → list[str]¶: List available data groups.

store_to_s3(query: CompiledQuery, path: str, aws_key_id: str, aws_secret_access_key: str)¶: Download query to S3.

Navigation

Related Topics

Library utilities¶

Connection to metadata services¶

DatasetAPIProvider¶

Client Protocols¶

ClickHouseClient¶

S3DownloaderClient¶