Dataset API¶
ResourceManager¶
- class algoseek_connector.manager.ResourceManager¶
Manage data sources available to an user.
Methods
create_data_source:
Create a new DataSource instance.
list_data_source:
List available data sources.
- create_data_source(name: str, **kwargs) DataSource¶
Create a connection to a data source.
- Parameters:
- namestr
Name of an available data source.
- kwargsdict
Key-value parameters passed to the ClientProtocol used by the data source.
- Returns:
- DataSource
See also
list_data_sources()Provides a list text ids from available data sources.
DataSource¶
- class algoseek_connector.base.DataSource(client: ClientProtocol, description_provider: DescriptionProvider)¶
Manage the connection to a data source.
See here for a guide on how to work with data sources.
- Attributes:
- clientClientProtocol
Provide the connection to the actual data source.
- description_providerDescriptionProvider
Provide descriptions and metadata for data groups and datasets.
- groupsDataGroupMapping
Maintain the collection of available DataGroups.
Methods
fetch_datagroup:
Retrieve a data group from the data source.
list_datagroups:
List available data groups.
DataGroup¶
- class algoseek_connector.base.DataGroup(source: DataSource, description: DataGroupDescription)¶
Manage a collection of related datasets.
- Parameters:
- sourceDataSource
The data source where the data groups belongs.
- descriptionDataGroupDescription
The data group description.
Methods
fetch_dataset:
Retrieves a dataset from the data source.
list_datasets:
List available datasets.
- property description: DataGroupDescription¶
Get the data group description.
- fetch_dataset(name: str) DataSet¶
Load a dataset from a data source.
- Parameters:
- namestr
The dataset name.
- Raises:
- InvalidDataSetName
If an invalid dataset name is provided.
- property source: DataSource¶
Get the data source.
DataSetFetcher¶
- class algoseek_connector.base.DataSetFetcher(group: DataGroup, name: str)¶
Lightweight representation of a dataset.
Manages creation of DataSet instances for querying data using SQL and data downloading.
Methods
download:
Download data files.
fetch:
Create a DataSet instance.
- property description: DataSetDescription¶
Get the dataset name.
- download(download_path: Path, date: date_like | tuple[date_like, date_like], symbols: str | list[str], expiration_date: date_like | tuple[date_like, date_like] | None = None)¶
Download data from the dataset.
- Parameters:
- download_pathpathlib.Path
Path to a directory to download dataset files.
- datestr, datetime.date or tuple
Download data in this date range. Dates can be passed as a str with yyyymmdd format or as date objects. If a tuple is passed, it is interpreted as a date range and all dates in the closed interval between the two dates are generated. If a single date is passed, download data from this specific date.
- symbolsstr or list[str]
Download data associated with these symbols.
- expiration_datestr, datetime.date, tuple or None, default=None
Download data with expiration dates in this date range. Dates must be passed using the same format used for the date parameter.
- fetch() DataSet¶
Create a dataset instance.
DataSet allow to fetch data using SQL-like queries. See here for a detailed description on how work with datasets.
- property source: DataSource¶
Get the data source.
DataSet¶
- class algoseek_connector.base.DataSet(group: DataGroup, description: DataSetDescription)¶
Retrieve data from a data source using SQL queries.
See here for a detailed description on how work with datasets.
- Attributes:
- cColumnHandle
A handle object for dataset columns.
descriptionDataSetDescriptionGet the dataset name.
groupDataGroupGet the dataset group.
sourceDataSourceGet the data source client.
Methods
compile:
Convert a sqlalchemy.Select statement into a CompiledQuery.
fetch:
Retrieve data from the data source.
fetch_dataframe:
Retrieve data from the data source as a pandas DataFrame.
fetch_iter:
Retrieve data in chunks from the data source.
get_function_handle:
Create a FunctionHandle object.
get_column_handle:
Create a column handle object.
select:
Build a sqlalchemy.Select statement using method chaining.
- compile(stmt: Select) CompiledQuery¶
Compiles the statement into a dialect-specific SQL string.
- property description: DataSetDescription¶
Get the dataset name.
- execute(sql: str, parameters: dict | None = None, output: str = 'python', size: int | None = None, **kwargs) dict | DataFrame¶
Execute raw SQL queries.
- Parameters:
- sqlstr
Parametrized SQL statement.
- parametersdict or None
query parameters.
- output{“python”, “dataframe”}
Output format for query results.
- sizeint or None
If a size is specified, split the results in chunks of the specified size.
- kwargsdict
Extra keyword arguments passed to the underlying client.
- fetch(stmt: Select, **kwargs) dict[str, tuple]¶
Fetch data using a select statement.
- Parameters:
- stmtSelect
A SQLAlchemy Select statement created using the select method.
- kwargs
Optional parameters passed to the underlying ClientProtocol.fetch method.
- fetch_dataframe(stmt: Select, **kwargs) DataFrame¶
Fetch data using a select statement. Output columns as Pandas DataFrame.
- Parameters:
- stmtSelect
A SQLAlchemy Select statement created using the select method.
- kwargs
Optional parameters passed to the underlying client fetch_dataframe method.
- Returns:
- pandas.DataFrame
- fetch_iter(stmt: Select, size: int, **kwargs) Generator[dict[str, tuple], None, None]¶
Stream data using a select statement.
- Parameters:
- stmtSelect
A SQLAlchemy Select statement created using the select method.
- sizeint
The size of each data chunk.
- kwargs
Optional parameters passed to the underlying client fetch_iter method.
- Yields:
- dict[str, tuple]
A dictionary with column name/column data pairs.
- fetch_iter_dataframe(stmt: Select, size: int, **kwargs) Generator[DataFrame, None, None]¶
Stream data using a select statement. Output data as Pandas DataFrame.
- Parameters:
- stmtSelect
A SQLAlchemy Select statement created using the select method.
- sizeint
The size of each data chunk.
- kwargs
Optional parameters passed to the underlying client fetch_iter_dataframe method.
- Yields:
- pandas.DataFrame
- get_column_handle() ColumnHandle¶
Get a handler object for fast access to dataset columns.
- get_function_handle() FunctionHandle¶
Get a handle for fast access to supported functions.
- head(n: int = 10) DataFrame¶
Retrieve the first n rows of a dataset.
- Parameters:
- n: int, default=10
The number of rows to retrieve.
- Returns:
- pandas.DataFrame
- select(*args: Column, exclude: Sequence[Column] | None = None) Select¶
Create a select statement using chained methods with SQL-like syntax.
See here for a detailed guide on how to create select statements.
- Parameters:
- argstuple of Columns
Sequence of columns included in the select statement. If no columns are provided, use all columns in the dataset.
- excludesequence of Columns or None, default=None
List of columns to exclude from the select statement.
- Returns:
- property source: DataSource¶
Get the data source client.
- store_to_s3(stmt: Select, bucket: str, key: str, profile_name: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None)¶
Execute a query and store results into an S3 object.
- Parameters:
- stmtSelect
A SQLAlchemy Select statement created using the select method.
- bucketstr
The bucket name used to store the query.
- keystr
The name of the object where the query is going to be stored.
- profile_namestr or None, default=None
If a profile name is specified, the access key and secret key are retrieved from ~/.aws/credentials and the parameters aws_access_key_id and aws_secret_access_key are ignored. If
None, this field is ignored.- aws_access_key_idstr or None, default=None
The AWS access key associated with an IAM user or role.
- aws_secret_access_keystr or None, default=None
Thee secret key associated with the access key.
- kwargs
Key-value arguments passed to clickhouse-connect Client.query method.
ColumnHandle¶
FunctionHandle¶
CompiledQuery¶
DataGroupDescription¶
- class algoseek_connector.base.DataGroupDescription(name: str, description: str | None = None, display_name: str | None = None)¶
Container class for datagroup metadata.
- Attributes:
- name: str
The data group name.
- display_namestr or None, default=None
Name used for pretty print.
- descriptionstr or None, default=None
The data group description.
Methods
html:
Get an HTML representation of the data group.
DataSetDescription¶
- class algoseek_connector.base.DataSetDescription(name: str, group: str, columns: list[ColumnDescription], display_name: str | None = None, description: str | None = None, granularity: str | None = None, pdf_url: str | None = None, sample_data_url: str | None = None)¶
Store data used to create dataset instances.
- Attributes:
- name: str
The dataset name.
- group: str
The datagroup name.
- description: str
The dataset description.
- columns: list[ColumnDescription] or None, default=None
The dataset columns.
- display_name: str or None, default=None
The display name of the dataset.
- granularity: str or None, default=None
The time granularity of the dataset.
- pdf_url: str or None, default=None
URL to PDF documentation.
- sample_data_url: str or None, default=None
Methods
get_table_name:
Get the table name of the dataset using the notation
group.dataset.html:
Get an HTML representation of the dataset.
ColumnDescription¶
- class algoseek_connector.base.ColumnDescription(name: str, type: str, description: str | None = None)¶
Store column metadata from a dataset.
- Attributes:
- name: str
The column name.
- type: str
The column type.
- descriptionstr, default=””
The column description
Methods
get_type_name:
Get the type name of the column.
get_type_args:
Get a list of type arguments.
html:
Get an HTML representation of the column.