You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

45 lines
1.7 KiB

# CTA 1D Parquet Dataset
This directory contains requirements for CTA (Commodity Trading Advisor) futures
Parquet datasets used by alpha_lab.
## Tables
### cta_alpha158_1d
Alpha158 features for CTA futures.
- **Source**: `dfs://daily_stock_run.stg_1day_tinysoft_cta_alpha159_0_7_beta`
- **Output**: `/data/parquet/dataset/cta_alpha158_1d/`
- **Columns**: ~163 feature columns + code, m_nDate
### cta_hffactor_1d
High-frequency factor features (8 columns).
- **Source**: `dfs://daily_stock_run.stg_1day_tinysoft_cta_hffactor`
- **Output**: `/data/parquet/dataset/cta_hffactor_1d/`
- **Transformation**: Pivot from long to wide format
- Input columns: code, m_nDate, factor_name, value
- Output columns: code, m_nDate, vol_1min, skew_1min, ... (8 features)
- **Filter**: Only include factor_name in [vol_1min, skew_1min, volp_1min,
volp_ratio_1min, voln_ratio_1min, trend_strength_1min, pv_corr_1min,
flowin_ratio_1min]
### cta_dom_1d
Dominant contract mapping for continuous contracts.
- **Source**: `dfs://daily_stock_run.dwm_1day_cta_dom`
- **Output**: `/data/parquet/dataset/cta_dom_1d/`
- **Filter**: version = 'vp_csmax_roll2_cummax'
- **Aggregation**: GROUP BY m_nDate, code_init; SELECT first(code) as code
### cta_labels_1d
Return labels for different return types.
- **Source**: `dfs://daily_stock_run.stg_1day_tinysoft_cta_hfvalue`
- **Output**: `/data/parquet/dataset/cta_labels_1d/`
- **Filter**: indicator in [twap_open1m@1_twap_close1m@1, twap_open1m@1_twap_open1m@2]
- **Columns**: code, m_nDate, indicator, value
## Consumer
Used by: `alpha_lab/cta_1d/src/loader_parquet.py`
The alpha_lab project will create a parallel loader that reads from these
Parquet tables instead of DolphinDB.