optrade.data
optrade.data.contracts
- class Contract(root, start_date, exp, strike, interval_min, right)[source]
Bases:
objectA class representing an options contract with methods for optimal contract selection.
The Contract class defines the structure of an options contract including the underlying security, dates, strike price, and other key parameters.
- Parameters:
root (str)
start_date (str)
exp (str)
strike (float)
interval_min (int)
right (str)
- __init__(root, start_date, exp, strike, interval_min, right)[source]
Initialize a Contract instance.
- Parameters:
root (str) – Root symbol of the underlying security (e.g., “AAPL” representing Apple Inc.)
start_date (str) – Start date in YYYYMMDD format (e.g., “20241107” representing November 7, 2024)
exp (str) – Expiration date in YYYYMMDD format (e.g., “20241206” representing December 6, 2024)
strike (float) – Strike price (e.g., 225 representing $225)
interval_min (int) – Interval in minutes (e.g., 1 representing 1 minute)
right (str) – Option type (‘C’ for call, ‘P’ for put)
- Returns:
None
- classmethod find_optimal(root, start_date, interval_min, right, target_tte, tte_tolerance, moneyness, strike_band=0.05, hist_vol=None, volatility_scaled=False, volatility_scalar=1.0, verbose=True, warning=False, dev_mode=False)[source]
Find the optimal contract for a given security, start date, and approximate TTE.
- Parameters:
root (str) – Underlying stock symbol
start_date (str) – Start date for the contract in YYYYMMDD format
interval_min (int) – Interval in minutes
right (str) – Option type (C for call, P for put)
target_tte (int) – Target time to expiration in days
tte_tolerance (Tuple[int, int]) – Acceptable range for TTE as (min_days, max_days)
moneyness (str) – Contract moneyness (OTM, ATM, ITM)
strike_band (float | None) – Target percentage band for strike selection
hist_vol (float | None) – Historical volatility for dynamic strike selection
volatility_scaled (bool) – Whether to select strike by volatility
volatility_scalar (float | None) – Scaling factor for volatiliy-based strike selection
verbose (bool) – Whether to print verbose output
warning (bool)
dev_mode (bool)
- Return type:
- load_data(clean_up=False, offline=False, save_dir=None, warning=False, dev_mode=False)[source]
Load data for the selected contract.
- Parameters:
clean_up (bool) – Whether to clean up the data after use
offline (bool) – Whether to load saved data from disk
save_dir (str | None) – Directory to save/load data
warning (bool) – Whether to display warnings
dev_mode (bool) – Whether to use development mode
- Returns:
pd.DataFrame – The loaded data containing NBBO quotes and OHLCVC data for the contract and the underlying
- Return type:
DataFrame
- class ContractDataset(root, total_start_date, total_end_date, contract_stride, interval_min, right, target_tte, tte_tolerance, moneyness, strike_band=0.05, volatility_scaled=False, volatility_scalar=1.0, hist_vol=None, verbose=False, save_dir=None, warning=True, dev_mode=False, contract_dir=None)[source]
Bases:
objectA dataset containing options contracts generated with consistent parameters.
- Parameters:
root (str)
total_start_date (str)
total_end_date (str)
contract_stride (int)
interval_min (int)
right (str)
target_tte (int)
tte_tolerance (Tuple[int, int])
moneyness (str)
strike_band (float)
volatility_scaled (bool)
volatility_scalar (float)
hist_vol (float | None)
verbose (bool)
save_dir (str | None)
warning (bool)
dev_mode (bool)
contract_dir (Path | None)
- __init__(root, total_start_date, total_end_date, contract_stride, interval_min, right, target_tte, tte_tolerance, moneyness, strike_band=0.05, volatility_scaled=False, volatility_scalar=1.0, hist_vol=None, verbose=False, save_dir=None, warning=True, dev_mode=False, contract_dir=None)[source]
Initialize the ContractDataset with the specified parameters.
- Parameters:
root (str) – The security root symbol
total_start_date (str) – Start date for the dataset (YYYYMMDD)
total_end_date (str) – End date for the dataset (YYYYMMDD)
contract_stride (int) – Days between consecutive contracts
interval_min (int) – Data interval in minutes
right (str) – Option type (C/P)
target_tte (int) – Target time to expiration in days
tte_tolerance (Tuple[int, int]) – Acceptable range for TTE as (min_days, max_days)
moneyness (str) – Contract moneyness (OTM/ATM/ITM)
strike_band (float) – Target percentage band for strike selection
volatility_scaled (bool) – Whether to scale by volatility
volatility_scalar (float) – Scaling factor for volatility
hist_vol (float | None) – Historical volatility for dynamic strike selection
verbose (bool) – Whether to print verbose output
save_dir (str | None)
warning (bool)
dev_mode (bool)
contract_dir (Path | None)
- Return type:
None
- generate()[source]
Generate all contracts in the dataset based on configuration parameters. Contracts are generated by starting from total_start_date and advancing by contract_stride days until reaching the last valid date that allows for contracts within the specified time-to-expiration tolerance.
- Returns:
ContractDataset – The dataset with all generated contracts
- Return type:
- save(filename=None, clean_file=False)[source]
Save the dataset to a pickle file.
- Parameters:
filepath – Optional custom filepath. If None, generates default name
clean_file (bool) – Whether to delete the existing file if it exists
filename (str | None)
- Returns:
str – Path where the pickle file was saved
- Return type:
None
- get_contract_datasets(root, start_date, end_date, contract_stride, interval_min, right, target_tte, tte_tolerance, moneyness, strike_band=0.05, volatility_type='period', volatility_scaled=False, volatility_scalar=1.0, train_split=0.7, val_split=0.1, clean_up=False, offline=False, save_dir=None, verbose=False, dev_mode=False)[source]
Returns the training, validation, and test datasets contract datasets. These contain mutually exclusive contracts at mutually exclusive time periods to prevent information leakage during training and evaluation.
- Parameters:
root (str) – Underlying stock symbol
start_date (str) – Start date for the total dataset in YYYYMMDD format
end_date (str) – End date for the total dataset in YYYYMMDD format
contract_stride (int) – Number of days between each contract
interval_min (int) – Interval in minutes for the underlying stock data
right (str) – Option type (C for call, P for put)
target_tte (int) – Target time to expiration in days
tte_tolerance (Tuple[int, int]) – Tuple of (min, max) time to expiration tolerance in days
moneyness (str) – Moneyness of the option contract (OTM, ATM, ITM)
strike_band (float | None) – Target band for moneyness selection, proportion of current underlying price
volatility_type (str | None) – Type of historical volatility to use
volatility_scaled (bool | None) – Whether to scale strikes based on historical volatility
volatility_scalar (float | None) – Scalar to adjust historical volatility-based strike selection
train_split (float) – Proportion of total days to use for training
val_split (float) – Proportion of total days to use for validation
clean_up (bool) – Whether to clean up the data after use
offline (bool) – Whether to load saved contracts from disk
save_dir (str | None) – Directory to save/load contracts
verbose (bool) – Whether to print verbose output
dev_mode (bool) – Whether to use development mode
- Returns:
Training, validation, and test contract datasets.
- Return type:
optrade.data.features
- dt_features(df, feats, dt_col='datetime', market_open_time='09:30:00', market_close_time='16:00:00')[source]
Generates datetime features for options.
- Parameters:
df (DataFrame) – DataFrame containing a datetime column.
feats (List[str]) – List of datetime features to generate. Options include: - minute_of_day: Minute of trading day (0-389 for standard session) - sin_minute_of_day: Sine transformation of time of day (continuous circular feature) - cos_minute_of_day: Cosine transformation of time of day (continuous circular feature) - day_of_week: Day of week (0=Monday, 4=Friday) - hour_of_week: Hour position in trading week as proportion (0.0-1.0) - sin_hour_of_week: Sine transformation of hour of week (continuous circular feature) - cos_hour_of_week: Cosine transformation of hour of week (continuous circular feature)
dt_col (str | None) – Name of datetime column. If None, will attempt to detect it. Defaults to datetime.
market_open_time (str | None) – Market open time in HH:MM:SS format. Defaults to 09:30:00.
market_close_time (str | None) – Market close time in HH:MM:SS format. Defaults to 16:00:00.
- Returns:
Original DataFrame with additional datetime feature columns, prefixed with dt_.
- Return type:
DataFrame
Examples
Basic usage:
>>> import pandas as pd >>> data = pd.DataFrame({ ... "datetime": pd.date_range("2023-01-02 09:30:00", periods=5, freq="1min") ... }) >>> feats = ["minute_of_day", "day_of_week"] >>> result = dt_features(data, feats) >>> result.columns Index(['datetime', 'dt_minute_of_day', 'dt_day_of_week'], dtype='object')
Using custom datetime column name:
>>> data = pd.DataFrame({ ... "timestamp": pd.date_range("2023-01-02 09:30:00", periods=5, freq="1min") ... }) >>> result = dt_features(data, feats, dt_col="timestamp") >>> result.columns Index(['timestamp', 'dt_minute_of_day', 'dt_day_of_week'], dtype='object')
- tte_features(df, feats, exp)[source]
Generate Time to Expiration (TTE) features for a given DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing datetime column in format “YYYY-MM-DD HH:MM:SS”. The function will try to identify a datetime column if not explicitly named “datetime”.
feats (List) – List of features to generate. Options include: - “linear”: raw TTE in minutes - “inverse”: 1/TTE (in minutes) - “sqrt”: √(TTE minutes) - “inverse_sqrt”: 1/√(TTE minutes) - “exp_decay”: exp(-TTE/contract_length)
exp (str) – The expiration date of the option in YYYYMMDD format. The expiration time is assumed to be 16:30 (4:30 PM) on the expiration date.
- Returns:
pd.DataFrame –
- The original DataFrame with additional TTE feature columns. Each requested
feature will be added with a prefix “tte_” (e.g., “tte_inverse”). All TTE features are guaranteed to be float64 type.
- Return type:
DataFrame
- get_volatility_features(df, feats, root, right, risk_free_rate=0.045, rolling_volatility_range=None)[source]
Computes volatility features from stock and option data.
- Parameters:
df (DataFrame) – DataFrame with required columns
feats (List[str]) – List of feature names to compute
r – Risk-free rate
short_window – Lookback for short-term realized vol
long_window – Lookback for long-term realized vol
return_type – ‘log’ or ‘simple’ returns
root (str)
right (str)
risk_free_rate (float)
rolling_volatility_range (List[int] | None)
- Returns:
DataFrame with new volatility features
- Return type:
DataFrame
- transform_features(df, core_feats, tte_feats=None, datetime_feats=None, vol_feats=None, rolling_volatility_range=None, root=None, right=None, strike=None, exp=None, keep_datetime=False)[source]
Selects and transforms features from a DataFrame based on specified feature lists.
This function allows the selection of core features from NBBO and OHLCVC data, as well as the generation of time-to-expiration features and datetime-based features. It can also calculate derived features such as returns, moneyness, and LOB imbalance.
- Parameters:
df (DataFrame) – The DataFrame containing the raw features.
core_feats (List[str]) – List of core features to select.
tte_feats (List[str] | None) – List of Time to Expiration (TTE) features to generate.
datetime_feats (List[str] | None) – List of datetime features to generate.
strike (float | None) – Strike price of the option, required for moneyness and distance_to_strike calculations.
exp (str | None) – Expiration date string in YYYYMMDD format, required for TTE feature generation.
vol_feats (List[str] | None) – List of volatility features to generate.
root (str | None) – Stock symbol (e.g., “AAPL”), required for volatility feature generation.
right (str | None) – Option type (“C” for call, “P” for put), required for volatility feature generation.
rolling_volatility_range (List[int] | None) – List of intervals in minutes for rolling volatility features.
keep_datetime (bool) – If True, keep the datetime column in the output DataFrame. Otherwise, drop it.
- Returns:
DataFrame containing only the requested features.
- Return type:
DataFrame
- Core feature options (subset of NBBO and OHLCVC):
datetime: Timestamp of the data point
{asset}_mid_price: Mid price of the asset
{asset}_bid_size: Size of the bid
{asset}_bid_exchange: Exchange of the bid
{asset}_bid: Bid price
{asset}_bid_condition: Condition of the bid
{asset}_ask_size: Size of the ask
{asset}_ask_exchange: Exchange of the ask
{asset}_ask: Ask price
{asset}_ask_condition: Condition of the ask
{asset}_open: Opening price
{asset}_high: High price
{asset}_low: Low price
{asset}_close: Closing price
{asset}_volume: Volume
{asset}_count: Count
where “{asset}” is either “option” or “stock”.
- Advanced core feature options:
{asset}_returns: Mid-price returns
log_{asset}_returns: Log mid-price returns
{asset}_lob_imbalance: Limit order book imbalance
{asset}_quote_spread: Quote spread normalized by mid-price
moneyness: Log(S/K)
distance_to_strike: Linear distance to strike price
where “{asset}” is either “option” or “stock”.
- TTE features options:
tte: Time to expiration
inverse: Inverse time to expiration
sqrt: Square root of time to expiration
inverse_sqrt: Inverse square root of time to expiration
exp_decay: Exponential decay of time to expiration
- Datetime features options:
minute_of_day: Minute of the day
sin_minute_of_day: Sine of minute of the day
cos_minute_of_day: Cosine of minute of the day
day_of_week: Day of the week
sin_day_of_week: Sine of day of the week
cos_day_of_week: Cosine of day of the week
hour_of_week: Hour of the week
sin_hour_of_week: Sine of hour of the week
cos_hour_of_week: Cosine of hour of the week
- Volatility feature options:
rolling_volatility: Rolling volatility over specified interval in minutes, set by rolling_volatility_range parameter.
vol_ratio: Ratio of short-term to long-term volatility
Examples
Basic usage:
from optrade.data.thetadata.contracts import Contract contract = Contract() df = contract.load_data() # TTE features tte_feats = ["sqrt", "exp_decay"] # Datetime features datetime_feats = ["sin_minute_of_day", "cos_minute_of_day", "sin_hour_of_week", "cos_hour_of_week"] # Select features core_feats = [ "option_returns", "stock_returns", "distance_to_strike", "moneyness", "option_lob_imbalance", "option_quote_spread", "stock_lob_imbalance", "stock_quote_spread", "option_mid_price", "option_bid_size", "option_bid", "option_ask_size", "option_close", "option_volume", "option_count", "stock_mid_price", "stock_bid_size", "stock_bid", "stock_ask_size", "stock_ask", "stock_volume", "stock_count", ] df = transform_features( df=df, core_feats=core_feats, tte_feats=tte_feats, datetime_feats=datetime_feats, strike=contract.strike, exp=contract.exp )
optrade.data.forecasting
- class ForecastingDataset(data, seq_len, pred_len, target_channels=None, target_type='multistep', dtype='float32', normalize_target=False)[source]
Bases:
Dataset- Parameters:
data (DataFrame)
seq_len (int)
pred_len (int)
target_channels (List[str] | None)
target_type (str)
dtype (str)
normalize_target (bool)
- __init__(data, seq_len, pred_len, target_channels=None, target_type='multistep', dtype='float32', normalize_target=False)[source]
Initializes the ForecastingDataset class.
- Parameters:
data (pd.DataFrame) – Input DataFrame containing the time series data.
seq_len (int) – Length of the lookback window for each sample.
pred_len (int) – Length of the forecast window (number of steps ahead to predict).
target_channels (Optional[List[str]]) – List of column names to include as target channels. If None, all columns are used.
target_type (str) – Type of target to predict. Must be one of: - “multistep”: Predicts the full future sequence (regression). - “average”: Predicts the average value over the forecast window (regression). - “average_direction”: Predicts the sign of the average change (binary classification).
dtype (str) – Data type for the internal PyTorch tensors (e.g., “float32”, “float64”). Default is “float32”.
normalize_target (bool) – Whether to apply normalization to the target variable(s).
- Returns:
None
- Return type:
None
- to_numpy()[source]
Converts the dataset into a set of NumPy arrays for scikit-learn model training. :returns: Tuple[np.ndarray, np.ndarray] –
- A tuple containing:
inputs: NumPy array of shape (num_samples, seq_len, num_features).
targets: NumPy array of shape (num_samples, pred_len, num_target_features).
- If datetime is available:
input_datetimes: NumPy array of shape (num_samples, seq_len).
target_datetimes: NumPy array of shape (num_samples, pred_len).
- Return type:
Tuple[ndarray, ndarray] | Tuple[ndarray, ndarray, ndarray, ndarray]
- get_item(idx)[source]
Get a sample from the dataset. This method retrieves an input-target pair at the specified index, with input being the lookback window and target being the forecast window based on the target_type. :param idx: Index of the starting point of the lookback window.
- Returns:
If datetime is available –
- tuple: A tuple containing (input_tensor, target_tensor, input_datetime, target_datetime)
input_tensor: Lookback window of shape (num_features, seq_len).
target_tensor: Target window with shape depending on target_type: - “multistep”: (num_target_features, pred_len) - “average”: (num_target_features, 1) - “average_direction”: (num_target_features, 1)
input_datetime: Datetime values for input window of shape (seq_len,).
target_datetime: Datetime values for target window of shape (pred_len,).
- Otherwise:
- tuple: A tuple containing (input_tensor, target_tensor)
input_tensor: Lookback window of shape (num_features, seq_len).
target_tensor: Target window with shape as described above.
- Parameters:
idx (int)
- Return type:
Tuple[Tensor, Tensor] | Tuple[Tensor, Tensor, ndarray, ndarray]
- normalize_concat_dataset(concat_dataset, scaler)[source]
Modifies the data in a ConcatDataset in-place by normalizing it using a fitted StandardScaler.
- Parameters:
concat_dataset (ConcatDataset) – ConcatDataset object containing ForecastingDatasets
scaler (StandardScaler) – Fitted StandardScaler from scikit-learn.
- Returns:
None
- Return type:
None
- normalize_datasets(train_dataset, val_dataset, test_dataset)[source]
Normalizes financial time series datasets using StandardScaler. Fits scaler only on training data to prevent look-ahead bias.
- Parameters:
train_dataset (ConcatDataset) – Training dataset (ConcatDataset of ForecastingDatasets)
val_dataset (ConcatDataset) – Validation dataset
test_dataset (ConcatDataset) – Test dataset
- Returns:
Tuple[ConcatDataset, ConcatDataset, ConcatDataset, StandardScaler] – Normalized training, validation, and test datasets, and the fitted Standard
- Return type:
Tuple[ConcatDataset, ConcatDataset, ConcatDataset, StandardScaler]
- get_forecasting_dataset(contract_dataset, tte_tolerance, seq_len=None, pred_len=None, core_feats=['option_returns'], tte_feats=None, datetime_feats=None, vol_feats=None, rolling_volatility_range=None, keep_datetime=False, target_type='multistep', clean_up=False, offline=False, intraday=False, target_channels=None, dtype='float32', normalize_target=False, save_dir=None, download_only=False, validate_contracts=False, modify_contracts=False, verbose=False, warning=True, dev_mode=False)[source]
Creates a PyTorch dataset object composed of multiple ForecastingDatasets, each representing different option contracts.
- Parameters:
contract_dataset (ContractDataset) – ContractDataset object containing option contract parameters
tte_tolerance (Tuple[int, int]) – Tuple of (min, max) time to expiration tolerance in days
core_feats (List[str]) – List of core features to include
tte_feats (List[str] | None) – List of time-to-expiration features to include
datetime_feats (List[str] | None) – List of datetime features to include
vol_feats (List[str] | None) – List of volatility features to include
rolling_volatility_range (List[int] | None) – List of rolling volatility ranges to include
keep_datetime (bool) – Whether to keep the datetime column in the dataset
target_type (str) – Type of forecasting target. Options: “multistep” (float), “average” (float), or “average_direction” (binary).
clean_up (bool) – Whether to clean up the data after use
offline (bool) – Whether to load saved contracts from disk
intraday (bool) – Whether to use intraday data
target_channels (List[str] | None) – List of target channels to include in the target tensor. If None, all channels will be included.
seq_len (int | None) – Sequence length of lookback window (input)
pred_len (int | None) – Prediction length of forecast window (target)
dtype (str) – Data type for the PyTorch tensors
normalize_target (bool) – Whether to normalize the target variable(s)
save_dir (str | None) – Save directory
download_only (bool) – Whether to download data only (used mainly for Universe class)
validate_contracts (bool) – Whether to validate contracts by requesting data from ThetaData API and adjustintg start and end dates if necessary.
modify_contracts (bool) – Whether to delete old contracts .pkl file and save the (new) validate contracts in the same path. Warning: This will overwrite the old contracts.
verbose (bool) – Whether to print verbose output
warning (bool) – Whether to print verbose DataValidationError statements as warnings or errors.
dev_mode (bool) – Whether to run in development mode.
- Returns:
ContractDataset – The updated ContractDataset object if `download_only`=True or `validate_contracts`=True. Tuple[ConcatDataset, ContractDataset]: A tuple containing the concatenated PyTorch dataset and the updated ContractDataset if download_only=False.
- Return type:
ContractDataset | Tuple[ConcatDataset, ContractDataset]
- calibrate_new_contract(contract_dataset, original_contract, candidate_start_date, candidate_exp, tte_tolerance, expirations_exist=False, save_dir=None, verbose=False, dev_mode=False)[source]
- Parameters:
contract_dataset (ContractDataset)
original_contract (Contract)
candidate_start_date (str)
candidate_exp (str)
tte_tolerance (Tuple[int, int])
expirations_exist (bool)
save_dir (str | None)
verbose (bool)
dev_mode (bool)
- Return type:
Tuple[bool, Contract | None]
- get_valid_start_date(candidate_start_date)[source]
Return the next valid NYSE trading day given a candidate date in YYYYMMDD format.
This function checks whether the provided date falls on a weekend or a NYSE holiday. If so, it advances the date forward to the next valid trading day.
- Parameters:
candidate_start_date (str) – The date to validate, in ‘YYYYMMDD’ format.
- Returns:
str – The next valid NYSE trading day in ‘YYYYMMDD’ format.
- Raises:
ValueError – If no valid trading day is found within the search buffer.
- Return type:
str
- get_forecasting_loaders(train_contract_dataset, val_contract_dataset, test_contract_dataset, seq_len, pred_len, tte_tolerance, core_feats=['option_returns'], tte_feats=None, datetime_feats=None, vol_feats=None, rolling_volatility_range=None, keep_datetime=False, target_channels=None, target_type='multistep', batch_size=32, shuffle=True, drop_last=False, num_workers=4, prefetch_factor=None, pin_memory=False, persistent_workers=True, clean_up=False, offline=False, save_dir=None, verbose=False, scaling=False, intraday=False, dtype='float32', normalize_target=False, modify_contracts=False, warning=True, dev_mode=False)[source]
Forms training, validation, and test dataloaders for option contract data.
- Parameters:
train_contract_dataset (ContractDataset) – Contract dataset for training
val_contract_dataset (ContractDataset) – Contract dataset for validation
test_contract_dataset (ContractDataset) – Contract dataset for testing
seq_len (int) – Sequence length for input data
pred_len (int) – Prediction length for forecasting
tte_tolerance (Tuple[int, int]) – Tuple of (min, max) time to expiration tolerance in minutes
core_feats (List[str]) – List of core features to include
tte_feats (List[str] | None) – List of time-to-expiration features to include
datetime_feats (List[str] | None) – List of datetime features to include
keep_datetime (bool) – Whether to keep the datetime column in the dataset
target_type (str) – Type of forecasting target. Options: “multistep” (float), “average” (float), or “average_direction” (binary).
batch_size (int) – Number of samples per batch
shuffle (bool) – Whether to shuffle the data
drop_last (bool) – Whether to drop the last incomplete batch
num_workers (int) – Number of subprocesses to use for data loading
prefetch_factor (int | None) – Number of batches to prefetch
pin_memory (bool) – Whether to pin memory for faster GPU transfer
clean_up (bool) – Whether to clean up the data after use
offline (bool) – Whether to load saved contracts from disk
save_dir (str | None) – Directory to save/load processed datasets
modify_contracts (bool) – Whether to modify contracts if they are invalid in get_forecasting_dataset function calls.
verbose (bool) – Whether to print verbose output
scaling (bool) – Whether to normalize the datasets
intraday (bool) – Whether to use intraday data
target_channels (List[str] | None) – List of target channels for forecasting
dtype (str) – Data type for tensors
normalize_target (bool) – Whether to normalize the target variable(s)
warning (bool) – Whether to show warnings
dev_mode (bool) – Whether to run in development mode
vol_feats (List[str] | None)
rolling_volatility_range (List[int] | None)
persistent_workers (bool)
- Returns:
Tuple[DataLoader, DataLoader, DataLoader] – Train, validation, and test data loaders if scaling=False. Tuple[DataLoader, DataLoader, DataLoader, StandardScaler]: Train, validation, and test data loaders, and the scaler if scaling=True.
- Return type:
Tuple[DataLoader, DataLoader, DataLoader, None] | Tuple[DataLoader, DataLoader, DataLoader, StandardScaler]
- create_windows(df, seq_len, pred_len, window_stride, intraday=False)[source]
Generates rolling windows of data for a given DataFrame. Should be used primarily for scikit-learn models and/or intraday modeling, otherwise default to optrade.data.forecasing.get_forecasting_loaders or optrade.data.forecasting.get_forecasting_datasets.
- Parameters:
df (pd.DataFrame) – DataFrame containing the data.
seq_len (int) – Length of the input sequence.
pred_len (int) – Length of the prediction sequence.
window_stride (int) – Number of steps to move the window forward.
intraday (bool) – Whether the data is intraday or not. If True, the function will first split the data into separate trading days before creating individual windows that cannot crossover between days. Otherwise, the function will create windows that can span multiple days.
- Returns:
input (np.ndarray) –
- Array of input windows of shape (num_windows, seq_len, num_features) where num_features
is the number of columns in the DataFrame (removing datetime but adding returns).
- target (np.ndarray): Array of target windows of shape (num_windows, pred_len, 1).
Target contains only returns for the ‘option_mid_price’.
- Return type:
Tuple[ndarray, ndarray]
optrade.data.thetadata
- get_roots(sec='option', save_dir=None, clean_up=False, offline=False, dev_mode=False)[source]
Fetches all root symbols for a given security type.
- Parameters:
sec (str) – The security type. Options: ‘option’, ‘stock’, ‘index’.
save_dir (str) – Directory to save the CSV file (default: current directory)
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
dev_mode (bool) – Whether to run in development mode.
- Returns:
pd.DataFrame – The DataFrame containing the root symbols for the given security type.
- Return type:
DataFrame
- get_expirations(root, save_dir='.', clean_up=False, offline=False, dev_mode=False)[source]
Fetch option expiration dates for a given root symbol and save to CSV.
- Parameters:
root (str) – The root symbol to get expirations for.
save_dir (str) – Directory to save the CSV file (default: current directory)
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
dev_mode (bool) – Whether to run in development mode.
- Returns:
pd.DataFrame – The DataFrame containing the expiration dates for the given root symbol.
- Return type:
DataFrame
- get_strikes(root, exp, save_dir='.', clean_up=False, offline=False, dev_mode=False)[source]
Fetch option strike prices for a given root symbol and expiration, saving to CSV.
- Parameters:
root (str) – The root symbol to get expirations for.
exp (str) – The expiration date to get strikes for.
save_dir (str) – Directory to save the CSV file (default: current directory)
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
dev_mode (bool) – Whether to run in development mode.
- Returns:
pd.DataFrame – The DataFrame containing the strike prices for the given root and expiration.
- Return type:
DataFrame
- find_optimal_exp(root, start_date, target_tte, tte_tolerance, clean_up=False, dev_mode=False)[source]
Returns the closest valid TTE to target_tte within tolerance range and its expiration date.
- Parameters:
root (str) – The root symbol of the underlying security
start_date (str) – The start date in YYYYMMDD format
target_tte (int) – Desired days to expiry (e.g., 30)
tte_tolerance (Tuple[int, int]) – (min_tte, max_tte) acceptable range
save_dir – Directory to save the data files.
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
dev_mode (bool)
- Returns:
Tuple[str, int] –
- A tuple containing the optimal expiration date (in YYYYMMDD format) and
the corresponding time-to-expiration in days.
- Return type:
Tuple[str | None, int | None]
- load_stock_data(root, start_date, end_date, interval_min=1, save_dir=None, clean_up=False, offline=False, dev_mode=False)[source]
Gets historical quote-level data (NBBO) and OHLC (Open High Low Close) from ThetaData API for stocks across multiple exchanges, aggregated by interval_min (lowest resolution: 1min).
Note
Data from OHLC ends at 15:59:00, while quote data ends at 16:00:00, so for simplicity we remove all rows with 16:00:00 in datetime from quote data, before merging quote and OHLC data.
- Parameters:
root (str) – The root symbol of the underlying security.
start_date (str) – The start date of the data in YYYYMMDD format.
end_date (str) – The end date of the data in YYYYMMDD format.
interval_min (int) – The interval in minutes between data points.
save_dir (str) – The directory to save the data.
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
dev_mode (bool) – Whether to run in development mode.
- Returns:
pd.DataFrame – The merged NBBO quote and OHLCVC data.
- Return type:
DataFrame
- load_stock_data_eod(root, start_date, end_date, save_dir=None, clean_up=False, offline=False, dev_mode=False)[source]
Gets historical End of Day (EOD) report from ThetaData API for stocks across multiple exchanges. Each report is generated around 17:15:00 ET and contain NBBO and OHLCVC data.
- Parameters:
root (str) – The root symbol of the underlying security.
start_date (str) – The start date of the data in YYYYMMDD format.
end_date (str) – The end date of the data in YYYYMMDD format.
save_dir (str) – The directory to save the data.
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
dev_mode (bool) – Whether to run in development mode.
- Returns:
pd.DataFrame – The merged quote-level and OHLC data.
- Return type:
DataFrame
- find_optimal_strike(root, start_date, exp, right, interval_min, moneyness, strike_band=0.05, volatility_scaled=False, hist_vol=None, volatility_scalar=1.0, clean_up=False, offline=False, deterministic=True, dev_mode=False)[source]
Finds the optimal strike price for option return forecasting, prioritizing strikes that are likely to provide meaningful price movement data.
- Parameters:
root (str) – The root symbol of the option
start_date (str) – The start date in YYYYMMDD format
exp (str) – The expiration date in YYYYMMDD format
right (str) – Option type - “C” for call or “P” for put
interval_min (int) – The interval in minutes between data points (the resolution of the data).
moneyness (str) – Desired moneyness - “OTM”, “ITM”, or “ATM”
strike_band (float | None) – Base percentage distance from current price for strike selection
volatility_scaled (bool) – Whether to adjust strike_band based on historical volatility
hist_vol (float | None) – Historical volatility to use for scaling strike_band (required if volatility_scaled=True).
volatility_scalar (float | None) – The number of standard deviations to scale the strike_band by.
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
deterministic (bool | None) – Use deterministic algorithm for strike selection (True by default, stochastic mode not yet implemented).
dev_mode (bool) – Whether to run in development mode (True) or production mode (False).
- Returns:
float – The optimal strike price for option return forecasting based on the specified criteria.
- Return type:
Tuple[float, str]
- load_option_data(root, start_date, end_date, exp, strike, interval_min, right, save_dir=None, clean_up=False, offline=False, count_ohlc_zeros=False, dev_mode=False)[source]
Gets historical quote-level data (NBBO) and OHLC (Open High Low Close) from ThetaData API for options across multiple exchanges, aggregated by interval_min (lowest resolution: 1min).
Note
Data from OHLC ends at 15:59:00, while quote data ends at 16:00:00, so for simplicity we remove all rows with 16:00:00 in datetime from quote data, before merging quote and OHLC data.
- Parameters:
root (str) – The root symbol of the underlying security.
start_date (str) – The start date of the data in YYYYMMDD format.
end_date (str) – The end date of the data in YYYYMMDD format.
exp (Optional[str]) – The expiration date of the option in YYYYMMDD format.
strike (int) – The strike price of the option in dollars.
interval_min (int) – The interval in minutes between data points.
right (str) – The type of option, either ‘C’ for call or ‘P’ for put.
save_dir (str) – The directory to save the data.
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to work in offline mode, using previously saved data.
count_ohlc_zeros (bool) – Whether to count the proportion of zero values in OHLC transactions data.
dev_mode (bool) – Whether to run in development mode.
- Returns:
pd.DataFrame – Merged DataFrame containing quote-level (NBBO) and OHLC data for the specified option.
- Return type:
DataFrame
- load_all_data(root, start_date, exp, interval_min, right, strike, save_dir=None, clean_up=False, offline=False, warning=False, dev_mode=False)[source]
Gets historical quote-level data (NBBO) and OHLC (Open High Low Close) from ThetaData API for combined stocks and options across multiple exchanges, aggregated by interval_min (lowest resolution: 1min).
Note
Data from OHLC ends at 15:59:00, while quote data ends at 16:00:00, so for simplicity we remove all rows with 16:00:00 in datetime from quote data, before merging quote and OHLC data.
- Parameters:
root (str) – The root symbol of the underlying security.
start_date (str) – The start date of the data in YYYYMMDD format.
exp (str) – The expiration date of the option in YYYYMMDD format.
interval_min (int) – The interval in minutes between data points.
right (str) – The type of option, either ‘C’ for call or ‘P’ for put.
strike (float) – The strike price of the option in dollars.
save_dir (str) – The directory to save the data.
clean_up (bool) – Whether to clean up the CSV files after merging. If True, the CSV files are saved in a temp folder and then subsequently deleted before returning the df.
offline (bool) – Whether to use offline (already saved) data instead of calling ThetaData API directly (default: False).
dev_mode (bool) – Whether to run in development mode.
warning (bool)
- Returns:
DataFrame – The combined quote-level and OHLC data for an option and the underlying,
- Return type:
DataFrame
optrade.data.universe
- class Universe(start_date, end_date, sp_500=False, nasdaq_100=False, dow_jones=False, candidate_roots=None, volatility=None, pe_ratio=None, debt_to_equity=None, beta=None, market_cap=None, sector=None, industry=None, dividend_yield=None, earnings_volatility=None, market_beta=None, size_beta=None, value_beta=None, profitability_beta=None, investment_beta=None, momentum_beta=None, all_metrics=False, save_dir=None, verbose=False, dev_mode=False)[source]
Bases:
object- Parameters:
start_date (str)
end_date (str)
sp_500 (bool)
nasdaq_100 (bool)
dow_jones (bool)
candidate_roots (List[str] | None)
volatility (str | None)
pe_ratio (str | None)
debt_to_equity (str | None)
beta (str | None)
market_cap (str | None)
sector (str | None)
industry (str | None)
dividend_yield (str | None)
earnings_volatility (str | None)
market_beta (str | None)
size_beta (str | None)
value_beta (str | None)
profitability_beta (str | None)
investment_beta (str | None)
momentum_beta (str | None)
all_metrics (bool)
save_dir (str | None)
verbose (bool)
dev_mode (bool)
- __init__(start_date, end_date, sp_500=False, nasdaq_100=False, dow_jones=False, candidate_roots=None, volatility=None, pe_ratio=None, debt_to_equity=None, beta=None, market_cap=None, sector=None, industry=None, dividend_yield=None, earnings_volatility=None, market_beta=None, size_beta=None, value_beta=None, profitability_beta=None, investment_beta=None, momentum_beta=None, all_metrics=False, save_dir=None, verbose=False, dev_mode=False)[source]
A class for defining the universe of stocks and options for data retrieval and analysis.
This class contains parameters for filtering stocks based on various factors and selecting options contracts based on specific criteria.
- Parameters:
start_date (str)
end_date (str)
sp_500 (bool)
nasdaq_100 (bool)
dow_jones (bool)
candidate_roots (List[str] | None)
volatility (str | None)
pe_ratio (str | None)
debt_to_equity (str | None)
beta (str | None)
market_cap (str | None)
sector (str | None)
industry (str | None)
dividend_yield (str | None)
earnings_volatility (str | None)
market_beta (str | None)
size_beta (str | None)
value_beta (str | None)
profitability_beta (str | None)
investment_beta (str | None)
momentum_beta (str | None)
all_metrics (bool)
save_dir (str | None)
verbose (bool)
dev_mode (bool)
- Return type:
None
- start_date
Start date for data retrieval in YYYYMMDD format.
- Type:
str, optional
- end_date
End date for data retrieval in YYYYMMDD format.
- Type:
str, optional
- sp_500
If True, use S&P 500 stocks as the candidate universe. Default is False.
- Type:
bool
- nasdaq_100
If True, use NASDAQ 100 stocks as the candidate universe. Default is False.
- Type:
bool
- dow_jones
If True, use Dow Jones Industrial Average stocks as the candidate universe. Default is False.
- Type:
bool
- candidate_roots
Candidate root symbols to be filtered by other parameters. Used only if no collection (sp_500, nasdaq_100, etc.) is selected.
- Type:
list, optional
- volatility
The volatility of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of volatility from the candidate universe.
- Type:
str, optional
- pe_ratio
The P/E ratio of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of P/E ratio from the candidate universe.
- Type:
str, optional
- debt_to_equity
The debt to equity ratio of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of debt to equity from the candidate universe.
- Type:
str, optional
- beta
The beta of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of beta from the candidate universe.
- Type:
str, optional
- market_cap
The market cap of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of market cap from the candidate universe.
- Type:
str, optional
- sector
The sector of the stock. Options: ‘tech’, ‘healthcare’, ‘financial’, ‘consumer_cyclical’, ‘consumer_defensive’, ‘industrial’, ‘energy’, ‘materials’, ‘utilities’, ‘real_estate’, ‘communication’.
- Type:
str, optional
- industry
The industry of the stock matching Yahoo Finance classifications.
- Type:
str, optional
- dividend_yield
The dividend yield of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of dividend yield from the candidate universe.
- Type:
str, optional
- earnings_volatility
The earnings volatility of the stock. Options: ‘low’, ‘medium’, ‘high’. Based on the terciles of earnings volatility from the candidate universe.
- Type:
str, optional
- market_beta
The market beta of the stock. Options: ‘high’, ‘low’, ‘neutral’. Based on the absolute thresholds of < 0.9 and > 1.1.
- Type:
str, optional
- size_beta
The size beta of the stock. Options: ‘small_cap’, ‘large_cap’, ‘neutral’. Based on 30th and 70th percentiles of beta from the candidate universe.
- Type:
str, optional
- value_beta
The value beta of the stock. Options: ‘value’, ‘growth’, ‘neutral’. Based on 30th and 70th percentiles of beta from the candidate universe.
- Type:
str, optional
- profitability_beta
The profitability beta of the stock. Options: ‘robust’, ‘weak’, ‘neutral’. Based on 30th and 70th percentiles of beta from the candidate universe.
- Type:
str, optional
- investment_beta
The investment beta of the stock. Options: ‘conservative’, ‘aggressive’, ‘neutral’. Based on 30th and 70th percentiles of beta from the candidate universe.
- Type:
str, optional
- momentum_beta
(str, optional): The momentum beta of the stock used in Carhart 4-Factor model. Options: ‘high’, ‘low’, ‘neutral’. Based on 30th and 70th percentiles of beta from the candidate universe.
- all_metrics
If True, computes all metrics to the candidate universe. Default is False.
- Type:
bool
- save_dir
Directory to save the contract datasets and raw data.
- Type:
str, optional
- verbose
Whether to print verbose output. Default is False.
- Type:
bool
- dev_mode
If True, enables development mode specific data directory management. Default is False.
- Type:
bool
- set_roots()[source]
Fetches constituents of a specified index using public data on Wikipedia and updates candidate_roots.
- Return type:
None
- get_market_metrics(remove_roots=False)[source]
Retrieves market metrics data for each stock in candidate_roots from various sources. Only includes metrics that are specified in the filter criteria.
- Parameters:
remove_roots (bool)
- Return type:
None
- get_factor_exposures(remove_roots=False)[source]
Computes and categorizes Fama-French factor exposures for each stock in the universe, using Kenneth French’s data library and fitting the specified factor mode (ff3, c4, or ff5) with linear regression.
- Parameters:
remove_roots (bool)
- Return type:
None
- filter_three_level(filtered_roots, metric, level_value)[source]
- Parameters:
filtered_roots (List[str])
metric (str)
level_value (str | None)
- Return type:
List[str]
- filter_five_level(filtered_roots, metric, level_value)[source]
- Parameters:
filtered_roots (List[str])
metric (str)
level_value (str | None)
- Return type:
List[str]
- filter_categorical(filtered_roots, metric, category_value)[source]
- Parameters:
filtered_roots (List[str])
metric (str)
category_value (str | None)
- Return type:
List[str]
- filter()[source]
Filters the universe of stocks based on the specified criteria. - For ThreeFactorLevel: ‘low’ (0-33%), ‘medium’ (33-66%), ‘high’ (66-100%) - For FiveFactorLevel: ‘very_low’ (0-20%), ‘low’ (20-40%), ‘medium’ (40-60%), ‘high’ (60-80%), ‘very_high’ (80-100%)
- Return type:
None
- download(contract_stride, interval_min, right, target_tte, tte_tolerance, moneyness, train_split, val_split, strike_band=0.05, volatility_type='period', volatility_scaled=False, volatility_scalar=None)[source]
Downloads options contract datasets and market data for the filtered universe of stocks. To be used in conjunction with offline=True when calling get_forecasting_loaders() for higher efficiency during model training.
- Parameters:
contract_stride (int) – Number of days between consecutive contracts.
interval_min (int) – Interval in minutes for the options data.
right (str) – Type of contract (‘C’ for call or ‘P’ and for put).
target_tte (int) – Target time to expiration in days.
tte_tolerance (Tuple[int, int]) – Lower and upper bounds for the time to expiration.
moneyness (str) – Moneyness of the option. Options: “ATM”, “ITM”, or “OTM”.
strike_band (float) – Strike band for the option.
train_split (float) – Proportion of contracts to use for training.
val_split (float) – Proportion of contracts to use for validation.
volatility_type (str, optional) – Type of volatility to use for scaling. Options: “daily”, “period”, or “annualized”.
volatility_scaled (bool, optional) – Whether to scale the volatility.
volatility_scalar (float, optional) – Scalar to multiply the volatility by.
dev_mode (bool, optional) – Whether to use development mode.
- Returns:
None
- Return type:
None
- get_forecasting_loaders(root, tte_tolerance, seq_len, pred_len, scaling=False, dtype='float32', core_feats=['option_returns'], tte_feats=None, datetime_feats=None, keep_datetime=False, target_channels=None, target_type='multistep', offline=False, batch_size=32, shuffle=True, drop_last=False, num_workers=4, prefetch_factor=2, pin_memory=False, persistent_workers=True)[source]
- Parameters:
root (str) – Root symbol of the stock.
contract_stride (int) – Number of days between consecutive contracts.
interval_min (int) – Interval in minutes for the options data.
right (str) – Type of contract (‘C’ for call or ‘P’ and for put).
target_tte (int) – Target time to expiration in days.
tte_tolerance (Tuple[int, int]) – Lower and upper bounds for the time to expiration.
moneyness (str) – Moneyness of the option. Options: “ATM”, “ITM”, or “OTM”.
seq_len (int) – Sequence length for the input data.
pred_len (int) – Prediction length for the target data.
dtype_str (str) – Data type for the input and target data.
train_split (float) – Proportion of contracts to use for training.
val_split (float) – Proportion of contracts to use for validation.
scaling (bool) – Whether to scale the data.
dtype (str) – Data type for the input and target data.
core_feats (List[str]) – Core features to include in the input data.
tte_feats (List[str], optional) – Time-to-expiration features to include in the input data.
datetime_feats (List[str], optional) – Datetime features to include in the input data.
keep_datetime (bool, optional) – Whether to keep the datetime features in the input data.
target_channels (List[str], optional) – Target channels to include in the target data.
target_type (str, optional) – Type of forecasting target. Options: “multistep” (float), “average” (float), or “average_direction” (binary).
strike_band (float, optional) – Strike band for the option.
volatility_type (str, optional) – Type of volatility to use for scaling. Options: “daily”, “period”, or “annualized”.
volatility_scaled (bool, optional) – Whether to scale the volatility.
volatility_scalar (float, optional) – Scalar to multiply the volatility by.
offline (bool, optional) – Whether to use offline data for faster training.
batch_size (int, optional) – Batch size for the data loader.
shuffle (bool, optional) – Whether to shuffle the data.
drop_last (bool, optional) – Whether to drop the last incomplete batch.
num_workers (int, optional) – Number of workers for the data loader.
prefetch_factor (int, optional) – Prefetch factor for the data loader.
pin_memory (bool, optional) – Whether to pin memory for the data loader.
persistent_workers (bool, optional) – Whether to use persistent workers for the data loader.
dev_mode (bool, optional) – Whether to use development mode.
- Returns:
Tuple[DataLoader, DataLoader, DataLoader] – Train, validation, and test data loaders if scaling=False. Tuple[DataLoader, DataLoader, DataLoader, StandardScaler]: Train, validation, and test data loaders, and the scaler if scaling=True.
- Return type:
Tuple[DataLoader, DataLoader, DataLoader] | Tuple[DataLoader, DataLoader, DataLoader, StandardScaler]