Dataset API¶

ResourceManager¶

class algoseek_connector.manager.ResourceManager¶

Manage data sources available to an user.

Methods

create_data_source:	Create a new DataSource instance.
list_data_source:	List available data sources.

create_data_source(name: str, **kwargs) → DataSource¶

Create a connection to a data source.

Parameters:

namestr: Name of an available data source.
kwargsdict: Key-value parameters passed to the ClientProtocol used by the data source.

Returns:

DataSource

DataSource¶

class algoseek_connector.base.DataSource(client: ClientProtocol, description_provider: DescriptionProvider)¶

Manage the connection to a data source.

See here for a guide on how to work with data sources.

Attributes:

clientClientProtocol: Provide the connection to the actual data source.
description_providerDescriptionProvider: Provide descriptions and metadata for data groups and datasets.
groupsDataGroupMapping: Maintain the collection of available DataGroups.

Methods

fetch_datagroup:	Retrieve a data group from the data source.
list_datagroups:	List available data groups.

fetch_datagroup(name: str) → DataGroup¶: Retrieve a data group.

list_datagroups() → list[str]¶: List available data groups.

DataGroup¶

class algoseek_connector.base.DataGroup(source: DataSource, description: DataGroupDescription)¶

Manage a collection of related datasets.

Parameters:

sourceDataSource: The data source where the data groups belongs.
descriptionDataGroupDescription: The data group description.

Methods

fetch_dataset:	Retrieves a dataset from the data source.
list_datasets:	List available datasets.

property description: DataGroupDescription¶: Get the data group description.

fetch_dataset(name: str) → DataSet¶

Load a dataset from a data source.

Parameters:

namestr: The dataset name.

Raises:

InvalidDataSetName: If an invalid dataset name is provided.

list_datasets() → list[str]¶: List available datasets.

property source: DataSource¶: Get the data source.

DataSetFetcher¶

class algoseek_connector.base.DataSetFetcher(group: DataGroup, name: str)¶

Lightweight representation of a dataset.

Manages creation of DataSet instances for querying data using SQL and data downloading.

Methods

download:	Download data files.
fetch:	Create a DataSet instance.

property description: DataSetDescription¶: Get the dataset name.

download(download_path: Path, date: date_like | tuple[date_like, date_like], symbols: str | list[str], expiration_date: date_like | tuple[date_like, date_like] | None = None)¶

Download data from the dataset.

Parameters:

download_pathpathlib.Path: Path to a directory to download dataset files.
datestr, datetime.date or tuple: Download data in this date range. Dates can be passed as a str with yyyymmdd format or as date objects. If a tuple is passed, it is interpreted as a date range and all dates in the closed interval between the two dates are generated. If a single date is passed, download data from this specific date.
symbolsstr or list[str]: Download data associated with these symbols.
expiration_datestr, datetime.date, tuple or None, default=None: Download data with expiration dates in this date range. Dates must be passed using the same format used for the date parameter.

fetch() → DataSet¶

Create a dataset instance.

DataSet allow to fetch data using SQL-like queries. See here for a detailed description on how work with datasets.

property group: DataGroup¶: Get the dataset group.

property source: DataSource¶: Get the data source.

DataSet¶

class algoseek_connector.base.DataSet(group: DataGroup, description: DataSetDescription)¶

Retrieve data from a data source using SQL queries.

See here for a detailed description on how work with datasets.

Attributes:

cColumnHandle: A handle object for dataset columns.
descriptionDataSetDescription: Get the dataset name.
groupDataGroup: Get the dataset group.
sourceDataSource: Get the data source client.

Methods

compile:	Convert a sqlalchemy.Select statement into a CompiledQuery.
fetch:	Retrieve data from the data source.
fetch_dataframe:	Retrieve data from the data source as a pandas DataFrame.
fetch_iter:	Retrieve data in chunks from the data source.
get_function_handle:	Create a FunctionHandle object.
get_column_handle:	Create a column handle object.
select:	Build a sqlalchemy.Select statement using method chaining.

compile(stmt: Select) → CompiledQuery¶: Compiles the statement into a dialect-specific SQL string.

property description: DataSetDescription¶: Get the dataset name.

execute(sql: str, parameters: dict | None = None, output: str = 'python', size: int | None = None, **kwargs) → dict | DataFrame¶

Execute raw SQL queries.

Parameters:

sqlstr: Parametrized SQL statement.
parametersdict or None: query parameters.
output{“python”, “dataframe”}: Output format for query results.
sizeint or None: If a size is specified, split the results in chunks of the specified size.
kwargsdict: Extra keyword arguments passed to the underlying client.

fetch(stmt: Select, **kwargs) → dict[str, tuple]¶

Fetch data using a select statement.

Parameters:

stmtSelect: A SQLAlchemy Select statement created using the select method.
kwargs: Optional parameters passed to the underlying ClientProtocol.fetch method.

fetch_dataframe(stmt: Select, **kwargs) → DataFrame¶

Fetch data using a select statement. Output columns as Pandas DataFrame.

Parameters:

stmtSelect: A SQLAlchemy Select statement created using the select method.
kwargs: Optional parameters passed to the underlying client fetch_dataframe method.

Returns:

pandas.DataFrame

fetch_iter(stmt: Select, size: int, **kwargs) → Generator[dict[str, tuple], None, None]¶

Stream data using a select statement.

Parameters:

stmtSelect: A SQLAlchemy Select statement created using the select method.
sizeint: The size of each data chunk.
kwargs: Optional parameters passed to the underlying client fetch_iter method.

Yields:

dict[str, tuple]: A dictionary with column name/column data pairs.

fetch_iter_dataframe(stmt: Select, size: int, **kwargs) → Generator[DataFrame, None, None]¶

Stream data using a select statement. Output data as Pandas DataFrame.

Parameters:

stmtSelect: A SQLAlchemy Select statement created using the select method.
sizeint: The size of each data chunk.
kwargs: Optional parameters passed to the underlying client fetch_iter_dataframe method.

Yields:

pandas.DataFrame

get_column_handle() → ColumnHandle¶: Get a handler object for fast access to dataset columns.

get_function_handle() → FunctionHandle¶: Get a handle for fast access to supported functions.

property group: DataGroup¶: Get the dataset group.

head(n: int = 10) → DataFrame¶

Retrieve the first n rows of a dataset.

Parameters:

n: int, default=10: The number of rows to retrieve.

Returns:

pandas.DataFrame

select(*args: Column, exclude: Sequence[Column] | None = None) → Select¶

Create a select statement using chained methods with SQL-like syntax.

See here for a detailed guide on how to create select statements.

Parameters:

argstuple of Columns: Sequence of columns included in the select statement. If no columns are provided, use all columns in the dataset.
excludesequence of Columns or None, default=None: List of columns to exclude from the select statement.

Returns:

sqlalchemy.sql.selectable.Select

property source: DataSource¶: Get the data source client.

store_to_s3(stmt: Select, bucket: str, key: str, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None)¶

Execute a query and store results into an S3 object.

Parameters:

stmtSelect: A SQLAlchemy Select statement created using the select method.
bucketstr: The bucket name used to store the query.
keystr: The name of the object where the query is going to be stored.
profile_namestr or None, default=None: If a profile name is specified, the access key and secret key are retrieved from ~/.aws/credentials and the parameters aws_access_key_id and aws_secret_access_key are ignored. If None, this field is ignored.
aws_access_key_idstr or None, default=None: The AWS access key associated with an IAM user or role.
aws_secret_access_keystr or None, default=None: Thee secret key associated with the access key.
kwargs: Key-value arguments passed to clickhouse-connect Client.query method.

ColumnHandle¶

class algoseek_connector.base.ColumnHandle(table: Table)¶

Handle for fast access to a dataset columns.

Support access to a dataset columns by attribute or by key.

See here for a guide on how to use column handles.

FunctionHandle¶

class algoseek_connector.base.FunctionHandle(function_names: list[str])¶

Handle for SQL functions.

See here for a guide on how to use function handles.

CompiledQuery¶

class algoseek_connector.base.CompiledQuery(sql: str, parameters: dict)¶

Container class for compiled queries.

Attributes:

sqlstr: Parametrized SQL statement.
parametersstr: Query parameters.

DataGroupDescription¶

class algoseek_connector.base.DataGroupDescription(name: str, description: str | None = None, display_name: str | None = None)¶

Container class for datagroup metadata.

Attributes:

name: str: The data group name.
display_namestr or None, default=None: Name used for pretty print.
descriptionstr or None, default=None: The data group description.

Methods

html:

Get an HTML representation of the data group.

html() → str¶: Create an HTML description of the data group.

DataSetDescription¶

class algoseek_connector.base.DataSetDescription(name: str, group: str, columns: list[ColumnDescription], display_name: str | None = None, description: str | None = None, granularity: str | None = None, pdf_url: str | None = None, sample_data_url: str | None = None)¶

Store data used to create dataset instances.

Attributes:

name: str: The dataset name.
group: str: The datagroup name.
description: str: The dataset description.
columns: list[ColumnDescription] or None, default=None: The dataset columns.
display_name: str or None, default=None: The display name of the dataset.
granularity: str or None, default=None: The time granularity of the dataset.
pdf_url: str or None, default=None: URL to PDF documentation.
sample_data_url: str or None, default=None

Methods

get_table_name:	Get the table name of the dataset using the notation `group.dataset`.
html:	Get an HTML representation of the dataset.

get_table_name() → str¶: Get the table name in the format group.name.

html() → str¶: Create an HTML description of the dataset.

ColumnDescription¶

class algoseek_connector.base.ColumnDescription(name: str, type: str, description: str | None = None)¶

Store column metadata from a dataset.

Attributes:

name: str: The column name.
type: str: The column type.
descriptionstr, default=””: The column description

Methods

get_type_name:	Get the type name of the column.
get_type_args:	Get a list of type arguments.
html:	Get an HTML representation of the column.

get_type_args() → list[str]¶: Get the type arguments.

get_type_name() → str¶: Get the type name.

html() → str¶: Create a description of the column as an HTML row.

Navigation

Related Topics

Dataset API¶

ResourceManager¶

DataSource¶

DataGroup¶

DataSetFetcher¶

DataSet¶

ColumnHandle¶

FunctionHandle¶

CompiledQuery¶

DataGroupDescription¶

DataSetDescription¶

ColumnDescription¶