You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
guofu 26a694298d
Clean obsolete debug files from alpha158_beta
2 days ago
..
results Clean obsolete debug files from alpha158_beta 2 days ago
src Update documentation for processors module and commit test file 3 days ago
01_data_check.ipynb Initial alpha_lab structure\n\n- Notebook-centric experiment framework\n- CTA 1D and Stock 15m tasks\n- Minimal common utilities\n- Manual experiment tracking 3 weeks ago
02_label_analysis.ipynb Initial alpha_lab structure\n\n- Notebook-centric experiment framework\n- CTA 1D and Stock 15m tasks\n- Minimal common utilities\n- Manual experiment tracking 3 weeks ago
03_baseline_xgb.ipynb Initial alpha_lab structure\n\n- Notebook-centric experiment framework\n- CTA 1D and Stock 15m tasks\n- Minimal common utilities\n- Manual experiment tracking 3 weeks ago
03_baseline_xgb_executed.ipynb Add configuration files and alpha158_beta pipeline 4 days ago
04_blend_comparison.ipynb Initial alpha_lab structure\n\n- Notebook-centric experiment framework\n- CTA 1D and Stock 15m tasks\n- Minimal common utilities\n- Manual experiment tracking 3 weeks ago
README.md Update documentation for processors module and commit test file 3 days ago
__init__.py Add CTA 1D Parquet loader and data requirements 3 weeks ago
config.yaml Add configuration files and alpha158_beta pipeline 4 days ago
config_parquet.yaml Add CTA 1D Parquet loader and data requirements 3 weeks ago

README.md

CTA 1-Day Return Prediction

Experiments for predicting CTA (Commodity Trading Advisor) futures 1-day returns.

Data

  • Features: alpha158, hffactor
  • Labels: Return indicators (o2c_twap1min, o2o_twap1min, etc.)
  • Normalization: dual (blend of zscore, cs_zscore, rolling_20, rolling_60)

Notebooks

Notebook Purpose
01_data_check.ipynb Load and validate CTA data
02_label_analysis.ipynb Explore label distributions and blending
03_baseline_xgb.ipynb Train baseline XGBoost model
04_blend_comparison.ipynb Compare different normalization blends

Blend Configurations

The label blending combines 4 normalization methods:

  • zscore: Fit-time mean/std normalization
  • cs_zscore: Cross-sectional z-score per datetime
  • rolling_20: 20-day rolling window normalization
  • rolling_60: 60-day rolling window normalization

Predefined weights (from qshare.config.research.cta.labels):

  • equal: [0.25, 0.25, 0.25, 0.25]
  • zscore_heavy: [0.5, 0.2, 0.15, 0.15]
  • rolling_heavy: [0.1, 0.1, 0.3, 0.5]
  • cs_heavy: [0.2, 0.5, 0.15, 0.15]
  • short_term: [0.1, 0.1, 0.4, 0.4]
  • long_term: [0.4, 0.2, 0.2, 0.2]

Default: [0.2, 0.1, 0.3, 0.4]

Processors Module

The cta_1d.src.processors module provides Polars-based data processors that replicate Qlib's preprocessing pipeline:

Available Processors

Processor Description
DiffProcessor Adds diff features with configurable period
FlagMarketInjector Adds market_0, market_1 columns based on instrument codes
FlagSTInjector Creates IsST column from ST flags
ColumnRemover Removes specified columns
FlagToOnehot Converts one-hot industry flags to single index column
IndusNtrlInjector Industry neutralization per datetime
RobustZScoreNorm Robust z-score normalization using median/MAD
Fillna Fills NaN values with specified value

RobustZScoreNorm with Pre-fitted Parameters

The RobustZScoreNorm processor supports loading pre-fitted parameters from Qlib's proc_list.proc:

from cta_1d.src.processors import RobustZScoreNorm

# Method 1: Load from saved version (recommended)
processor = RobustZScoreNorm.from_version("csiallx_feature2_ntrla_flag_pnlnorm")

# Method 2: Load with direct parameters
processor = RobustZScoreNorm(
    feature_cols=['KMID', 'KLEN', ...],
    use_qlib_params=True,
    qlib_mean=mean_array,
    qlib_std=std_array
)

# Apply normalization
df = processor.process(df)

Parameter Extraction

Extract parameters from Qlib's proc_list.proc:

python stock_1d/d033/alpha158_beta/scripts/extract_qlib_params.py \
    --proc-list /path/to/proc_list.proc \
    --version my_version

Output structure:

data/robust_zscore_params/{version}/
├── mean_train.npy    # Pre-fitted mean (330,)
├── std_train.npy     # Pre-fitted std (330,)
└── metadata.json     # Feature columns and metadata

Pipeline Helper Functions

from cta_1d.src.processors import create_processor_pipeline, get_final_feature_columns

# Create pipeline from processor configs
pipeline = create_processor_pipeline([
    {'type': 'Diff', 'columns': ['turnover', 'free_turnover']},
    {'type': 'RobustZScoreNorm', 'feature_cols': feature_cols},
    {'type': 'Fillna', 'value': 0},
])

# Get final feature columns after industry neutralization
final_cols = get_final_feature_columns(
    alpha158_cols=ALPHA158_COLS,
    market_ext_cols=MARKET_EXT_COLS,
)