# Alpha Lab Quantitative research experiments for qshare library. This repository contains Jupyter notebooks and analysis scripts for exploring trading strategies and machine learning models. ## Philosophy - **Notebook-centric**: Experiments are interactive notebooks, not rigid scripts - **Minimal abstraction**: Simple functions over complex class hierarchies - **Self-contained**: Each task directory is independent - **Ad-hoc friendly**: Easy to modify for exploration ## Structure ``` alpha_lab/ ├── common/ # Shared utilities (keep minimal!) │ ├── __init__.py │ ├── paths.py # Path management │ └── plotting.py # Common plotting functions │ ├── cta_1d/ # CTA 1-day return prediction │ ├── __init__.py # Re-exports from src/ │ ├── config.yaml # Task configuration │ ├── src/ # Implementation modules │ │ ├── __init__.py │ │ ├── loader.py # CTA1DLoader │ │ ├── train.py # Training functions │ │ ├── backtest.py # Backtest functions │ │ └── labels.py # Label blending utilities │ ├── 01_data_check.ipynb │ ├── 02_label_analysis.ipynb │ ├── 03_baseline_xgb.ipynb │ └── 04_blend_comparison.ipynb │ ├── stock_15m/ # Stock 15-minute return prediction │ ├── __init__.py # Re-exports from src/ │ ├── config.yaml # Task configuration │ ├── src/ # Implementation modules │ │ ├── __init__.py │ │ ├── loader.py # Stock15mLoader │ │ └── train.py # Training functions │ ├── 01_data_exploration.ipynb │ └── 02_baseline_model.ipynb │ └── results/ # Output directory (gitignored) ├── cta_1d/ └── stock_15m/ ``` ## Setup ```bash # Install dependencies pip install -r requirements.txt # Create environment file cp .env.template .env # Edit .env with your settings ``` ## Usage ### Interactive (Notebooks) Start Jupyter and run notebooks interactively: ```bash jupyter notebook ``` Each task directory contains numbered notebooks: - `01_*.ipynb` - Data loading and exploration - `02_*.ipynb` - Analysis and baseline models - `03_*.ipynb` - Advanced experiments - `04_*.ipynb` - Comparisons and ablations ### Command Line Train models from config files: ```bash # CTA 1D python -m cta_1d.train --config cta_1d/config.yaml --output results/cta_1d/exp01 # Stock 15m python -m stock_15m.train --config stock_15m/config.yaml --output results/stock_15m/exp01 # CTA Backtest python -m cta_1d.backtest \ --model results/cta_1d/exp01/model.json \ --dt-range 2023-01-01 2023-12-31 \ --output results/cta_1d/backtest_01 ``` ### Python API ```python # Import from task root (re-exports from src/) from cta_1d import CTA1DLoader, train_model, TrainConfig from stock_15m import Stock15mLoader, train_model, TrainConfig from common import create_experiment_dir ``` ## Experiment Tracking Experiments are tracked manually in `results/{task}/README.md`: ```markdown ## 2025-01-15: Baseline XGB - Notebook: `cta_1d/03_baseline_xgb.ipynb` (cells 1-50) - Config: eta=0.5, lambda=0.1 - Train IC: 0.042 - Test IC: 0.038 - Notes: Dual normalization, 4 trades/day ``` ## Adding a New Task 1. Create directory: `mkdir my_task` 2. Add `src/` subdirectory with: - `__init__.py` - Export public APIs - `loader.py` - Dataset loader class - Other modules as needed 3. Add root `__init__.py` that re-exports from `src/` 4. Create numbered notebooks 5. Add entry to `results/my_task/README.md` ## Git Worktrees This repository uses git worktrees for parallel experiment development: | Worktree | Branch | Purpose | |----------|--------|---------| | `alpha_lab` | `master` | Main repo (reference) | | `alpha_lab_cta_1d` | `cta_1d_exp` | CTA 1-day experiments | | `alpha_lab_stock_1d` | `stock_1d_exp` | Stock 1-day experiments | | `alpha_lab_stock_15m` | `stock_15m_exp` | Stock 15-min experiments | | `alpha_lab_data_ops` | `data_ops_exp` | Data ops research | ```bash # Create a new worktree git worktree add ../alpha_lab_new_exp -b new_exp # List all worktrees git worktree list # Remove a worktree when done git worktree remove ../alpha_lab_new_exp ``` ## Best Practices 1. **Keep it simple**: Only add to `common/` after 3+ copies 2. **Module organization**: Place implementation in `src/`, re-export from root `__init__.py` 3. **Notebook configs**: Define CONFIG dict in first cell for easy modification 4. **Document results**: Update results README after significant runs 5. **Git discipline**: Don't commit large files, results, or credentials