Major changes:
- Fix FixedFlagMarketInjector to add market_0, market_1 columns based on instrument codes
- Fix FixedFlagSTInjector to create IsST column from ST_S, ST_Y flags
- Update generate_beta_embedding.py to handle IsST creation conditionally
- Add dump_polars_dataset.py for generating raw and processed datasets
- Add debug_data_divergence.py for comparing gold-standard vs polars output
Documentation:
- Update BUG_ANALYSIS_FINAL.md with IsST column issue discovery
- Update README.md with polars dataset generation instructions
Key discovery:
- The FlagSTInjector in the gold-standard qlib code fails silently
- The VAE was trained without IsST column (341 features, not 342)
- The polars pipeline correctly skips FlagSTInjector to match gold-standard
Generated dataset structure (2026-02-23 to 2026-02-27):
- Raw data: 18,291 rows × 204 columns
- Processed data: 18,291 rows × 342 columns (341 for VAE input)
- market_0, market_1 columns correctly added to feature_flag group
- Add .claudeignore and .clauderc for Claude Code setup
- Add config.yaml for cta_1d, stock_15m, and alpha158_beta tasks
- Add alpha158_beta pipeline.py with documentation
- Add utility scripts for embedding generation and prediction
- Add executed baseline notebook for cta_1d
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## New Files
- src/qlib_loader.py - Qlib data loader utility with:
- load_data_from_handler() - Load data with configurable start/end dates
- load_data_with_proc_list() - Full pipeline with preprocessing
- load_and_dump_data() - Dump raw and processed data to pickle files
- Fixed processor implementations (FixedDiff, FixedColumnRemover, etc.)
that handle :: separator column format correctly
- NaN filling workaround for con_rating_strength column
- config/handler.yaml - Modified handler config with <LOAD_START> and
<LOAD_END> placeholders instead of hardcoded <SINCE_DATE> and <TODAY>
- data/.gitignore - Ignore pickle and parquet data files
## Updated
- README.md - Documentation for data loading with configurable date range
## Key Changes
1. Fixed Diff processor bug: Column names now correctly use :: separator
format (e.g., 'feature_ext::log_size_diff') instead of malformed
string representations of tuples
2. Preserved trained parameters: Fixed processors use mean_train/std_train
from original proc_list pickle for RobustZScoreNorm
3. Configurable end date: handler.yaml now respects user-specified end
dates instead of always loading until today
## Tested
- Successfully dumps raw data (before proc_list) to pickle files
- Successfully applies fixed proc_list and dumps processed data
- Both 2019-01 and 2025-01 data processed without errors