Dataset API

ResourceManager

class algoseek_connector.manager.ResourceManager

Manage data sources available to an user.

Methods

create_data_source:

Create a new DataSource instance.

list_data_source:

List available data sources.

create_data_source(name: str, **kwargs) DataSource

Create a connection to a data source.

Parameters:
namestr

Name of an available data source.

kwargsdict

Key-value parameters passed to the ClientProtocol used by the data source.

Returns:
DataSource

See also

list_data_sources()

Provides a list text ids from available data sources.

list_data_sources() list[str]

List available data sources.

DataSource

class algoseek_connector.base.DataSource(client: ClientProtocol, description_provider: DescriptionProvider)

Manage the connection to a data source.

See here for a guide on how to work with data sources.

Attributes:
clientClientProtocol

Provide the connection to the actual data source.

description_providerDescriptionProvider

Provide descriptions and metadata for data groups and datasets.

groupsDataGroupMapping

Maintain the collection of available DataGroups.

Methods

fetch_datagroup:

Retrieve a data group from the data source.

list_datagroups:

List available data groups.

fetch_datagroup(name: str) DataGroup

Retrieve a data group.

list_datagroups() list[str]

List available data groups.

DataGroup

class algoseek_connector.base.DataGroup(source: DataSource, description: DataGroupDescription)

Manage a collection of related datasets.

Parameters:
sourceDataSource

The data source where the data groups belongs.

descriptionDataGroupDescription

The data group description.

Methods

fetch_dataset:

Retrieves a dataset from the data source.

list_datasets:

List available datasets.

property description: DataGroupDescription

Get the data group description.

fetch_dataset(name: str) DataSet

Load a dataset from a data source.

Parameters:
namestr

The dataset name.

Raises:
InvalidDataSetName

If an invalid dataset name is provided.

list_datasets() list[str]

List available datasets.

property source: DataSource

Get the data source.

DataSetFetcher

class algoseek_connector.base.DataSetFetcher(group: DataGroup, name: str)

Lightweight representation of a dataset.

Manages creation of DataSet instances for querying data using SQL and data downloading.

Methods

download:

Download data files.

fetch:

Create a DataSet instance.

property description: DataSetDescription

Get the dataset name.

download(download_path: Path, date: date_like | tuple[date_like, date_like], symbols: str | list[str], expiration_date: date_like | tuple[date_like, date_like] | None = None)

Download data from the dataset.

Parameters:
download_pathpathlib.Path

Path to a directory to download dataset files.

datestr, datetime.date or tuple

Download data in this date range. Dates can be passed as a str with yyyymmdd format or as date objects. If a tuple is passed, it is interpreted as a date range and all dates in the closed interval between the two dates are generated. If a single date is passed, download data from this specific date.

symbolsstr or list[str]

Download data associated with these symbols.

expiration_datestr, datetime.date, tuple or None, default=None

Download data with expiration dates in this date range. Dates must be passed using the same format used for the date parameter.

fetch() DataSet

Create a dataset instance.

DataSet allow to fetch data using SQL-like queries. See here for a detailed description on how work with datasets.

property group: DataGroup

Get the dataset group.

property source: DataSource

Get the data source.

DataSet

class algoseek_connector.base.DataSet(group: DataGroup, description: DataSetDescription)

Retrieve data from a data source using SQL queries.

See here for a detailed description on how work with datasets.

Attributes:
cColumnHandle

A handle object for dataset columns.

descriptionDataSetDescription

Get the dataset name.

groupDataGroup

Get the dataset group.

sourceDataSource

Get the data source client.

Methods

compile:

Convert a sqlalchemy.Select statement into a CompiledQuery.

fetch:

Retrieve data from the data source.

fetch_dataframe:

Retrieve data from the data source as a pandas DataFrame.

fetch_iter:

Retrieve data in chunks from the data source.

get_function_handle:

Create a FunctionHandle object.

get_column_handle:

Create a column handle object.

select:

Build a sqlalchemy.Select statement using method chaining.

compile(stmt: Select) CompiledQuery

Compiles the statement into a dialect-specific SQL string.

property description: DataSetDescription

Get the dataset name.

execute(sql: str, parameters: dict | None = None, output: str = 'python', size: int | None = None, **kwargs) dict | DataFrame

Execute raw SQL queries.

Parameters:
sqlstr

Parametrized SQL statement.

parametersdict or None

query parameters.

output{“python”, “dataframe”}

Output format for query results.

sizeint or None

If a size is specified, split the results in chunks of the specified size.

kwargsdict

Extra keyword arguments passed to the underlying client.

fetch(stmt: Select, **kwargs) dict[str, tuple]

Fetch data using a select statement.

Parameters:
stmtSelect

A SQLAlchemy Select statement created using the select method.

kwargs

Optional parameters passed to the underlying ClientProtocol.fetch method.

fetch_dataframe(stmt: Select, **kwargs) DataFrame

Fetch data using a select statement. Output columns as Pandas DataFrame.

Parameters:
stmtSelect

A SQLAlchemy Select statement created using the select method.

kwargs

Optional parameters passed to the underlying client fetch_dataframe method.

Returns:
pandas.DataFrame
fetch_iter(stmt: Select, size: int, **kwargs) Generator[dict[str, tuple], None, None]

Stream data using a select statement.

Parameters:
stmtSelect

A SQLAlchemy Select statement created using the select method.

sizeint

The size of each data chunk.

kwargs

Optional parameters passed to the underlying client fetch_iter method.

Yields:
dict[str, tuple]

A dictionary with column name/column data pairs.

fetch_iter_dataframe(stmt: Select, size: int, **kwargs) Generator[DataFrame, None, None]

Stream data using a select statement. Output data as Pandas DataFrame.

Parameters:
stmtSelect

A SQLAlchemy Select statement created using the select method.

sizeint

The size of each data chunk.

kwargs

Optional parameters passed to the underlying client fetch_iter_dataframe method.

Yields:
pandas.DataFrame
get_column_handle() ColumnHandle

Get a handler object for fast access to dataset columns.

get_function_handle() FunctionHandle

Get a handle for fast access to supported functions.

property group: DataGroup

Get the dataset group.

head(n: int = 10) DataFrame

Retrieve the first n rows of a dataset.

Parameters:
n: int, default=10

The number of rows to retrieve.

Returns:
pandas.DataFrame
select(*args: Column, exclude: Sequence[Column] | None = None) Select

Create a select statement using chained methods with SQL-like syntax.

See here for a detailed guide on how to create select statements.

Parameters:
argstuple of Columns

Sequence of columns included in the select statement. If no columns are provided, use all columns in the dataset.

excludesequence of Columns or None, default=None

List of columns to exclude from the select statement.

Returns:
sqlalchemy.sql.selectable.Select
property source: DataSource

Get the data source client.

store_to_s3(stmt: Select, bucket: str, key: str, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None)

Execute a query and store results into an S3 object.

Parameters:
stmtSelect

A SQLAlchemy Select statement created using the select method.

bucketstr

The bucket name used to store the query.

keystr

The name of the object where the query is going to be stored.

profile_namestr or None, default=None

If a profile name is specified, the access key and secret key are retrieved from ~/.aws/credentials and the parameters aws_access_key_id and aws_secret_access_key are ignored. If None, this field is ignored.

aws_access_key_idstr or None, default=None

The AWS access key associated with an IAM user or role.

aws_secret_access_keystr or None, default=None

Thee secret key associated with the access key.

kwargs

Key-value arguments passed to clickhouse-connect Client.query method.

ColumnHandle

class algoseek_connector.base.ColumnHandle(table: Table)

Handle for fast access to a dataset columns.

Support access to a dataset columns by attribute or by key.

See here for a guide on how to use column handles.

FunctionHandle

class algoseek_connector.base.FunctionHandle(function_names: list[str])

Handle for SQL functions.

See here for a guide on how to use function handles.

CompiledQuery

class algoseek_connector.base.CompiledQuery(sql: str, parameters: dict)

Container class for compiled queries.

Attributes:
sqlstr

Parametrized SQL statement.

parametersstr

Query parameters.

DataGroupDescription

class algoseek_connector.base.DataGroupDescription(name: str, description: str | None = None, display_name: str | None = None)

Container class for datagroup metadata.

Attributes:
name: str

The data group name.

display_namestr or None, default=None

Name used for pretty print.

descriptionstr or None, default=None

The data group description.

Methods

html:

Get an HTML representation of the data group.

html() str

Create an HTML description of the data group.

DataSetDescription

class algoseek_connector.base.DataSetDescription(name: str, group: str, columns: list[ColumnDescription], display_name: str | None = None, description: str | None = None, granularity: str | None = None, pdf_url: str | None = None, sample_data_url: str | None = None)

Store data used to create dataset instances.

Attributes:
name: str

The dataset name.

group: str

The datagroup name.

description: str

The dataset description.

columns: list[ColumnDescription] or None, default=None

The dataset columns.

display_name: str or None, default=None

The display name of the dataset.

granularity: str or None, default=None

The time granularity of the dataset.

pdf_url: str or None, default=None

URL to PDF documentation.

sample_data_url: str or None, default=None

Methods

get_table_name:

Get the table name of the dataset using the notation group.dataset.

html:

Get an HTML representation of the dataset.

get_table_name() str

Get the table name in the format group.name.

html() str

Create an HTML description of the dataset.

ColumnDescription

class algoseek_connector.base.ColumnDescription(name: str, type: str, description: str | None = None)

Store column metadata from a dataset.

Attributes:
name: str

The column name.

type: str

The column type.

descriptionstr, default=””

The column description

Methods

get_type_name:

Get the type name of the column.

get_type_args:

Get a list of type arguments.

html:

Get an HTML representation of the column.

get_type_args() list[str]

Get the type arguments.

get_type_name() str

Get the type name.

html() str

Create a description of the column as an HTML row.