DEV Community

Cover image for Qlib - 模型训练和预测(Model Training & Prediction)
MangoQuant
MangoQuant

Posted on

Qlib - 模型训练和预测(Model Training & Prediction)

我们quick start之前已经跑通,能修改yaml配置文件进行训练模型。但如果想更灵活的话,还需要手动写代码,今天来介绍一下。
参考了官网的一些文档,但直接用的话跑不通(我是macOS11,qlib 0.9.7, python 3.11),于是进行了改造。再加上一些指标分析的代码,我们一起来看一下吧。

一、初始化

# 合并后的完整脚本:训练模型 + 计算 IC 指标
# reference: https://qlib.readthedocs.io/en/latest/component/model.html

import qlib
import pandas as pd
from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158
from qlib.utils import init_instance_by_config, flatten_dict
from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
from qlib.contrib.eva.alpha import calc_ic

# 初始化 Qlib 数据路径
qlib.init(provider_uri="~/Documents/code/my_develop/qlib_data/cn_data_snapshot", region="cn")

market = "csi300"
benchmark = "SH000300"

# 数据处理器配置
data_handler_config = {
    "start_time": "2008-01-01",
    "end_time": "2020-08-01",
    "fit_start_time": "2008-01-01",
    "fit_end_time": "2014-12-31",
    "instruments": market,
}

Enter fullscreen mode Exit fullscreen mode

我们首先需要导入一些python包,provider_uri路径要修改成自己的。

二、配置


# 任务配置:模型 + 数据集
task = {
    "model": {
        "class": "LGBModel",
        "module_path": "qlib.contrib.model.gbdt",
        "kwargs": {
            "loss": "mse",
            "colsample_bytree": 0.8879,
            "learning_rate": 0.0421,
            "subsample": 0.8789,
            "lambda_l1": 205.6999,
            "lambda_l2": 580.9768,
            "max_depth": 8,
            "num_leaves": 210,
            "num_threads": 20,
        },
    },
    "dataset": {
        "class": "DatasetH",
        "module_path": "qlib.data.dataset",
        "kwargs": {
            "handler": {
                "class": "Alpha158",
                "module_path": "qlib.contrib.data.handler",
                "kwargs": data_handler_config,
            },
            "segments": {
                "train": ("2008-01-01", "2014-12-31"),
                "valid": ("2015-01-01", "2016-12-31"),
                "test": ("2017-01-01", "2020-08-01"),
            },
        },
    },
}
Enter fullscreen mode Exit fullscreen mode

配置内容可以参考 我前面的文章 Qlib - 工作流workflow配置详解
因为他们本质上是一样的,只不过一种是yaml格式,一种是python代码的字典格式。
这里以lightGBM为例,我前面也介绍过:《LightGBM: 一种高效的梯度提升决策树算法》论文(A Highly Efficient Gradient Boosting Decision Tree)
用的是Alpha158因子(我前面也介绍过),也就是构建了158个指标。再加上LightGBM模型进行训练和预测。

三、训练模型


def main():
    print("【Step 1】初始化模型和数据集...")
    model = init_instance_by_config(task["model"])
    dataset = init_instance_by_config(task["dataset"])

    print("【Step 2】启动实验并训练模型...")
    with R.start(experiment_name="workflow"):
        R.log_params(**flatten_dict(task))
        model.fit(dataset)

Enter fullscreen mode Exit fullscreen mode

这里用的是main()函数,否则在macos上运行会报错
我们先构建好模型model和数据集dataset
然后再进行模型的训练拟合model.fit(dataset)

四、模型预测

        print("【Step 3】生成预测信号...")
        recorder = R.get_recorder()
        sr = SignalRecord(model, dataset, recorder)
        sr.generate()

        print("【Step 4】获取当前实验的 recorder_id,用于后续读取结果...")
        recorder_id = recorder.id
        print(f"当前实验的 recorder_id 为:{recorder_id}")
Enter fullscreen mode Exit fullscreen mode

经过这一步,可以将模型在测试集上进行预测,并且保存到本地。
用于后续的效果分析。

五、效果分析IC计算

    # 使用 recorder_id 读取预测结果
    print("【Step 5】读取预测结果并计算 IC...")
    recorder = R.get_recorder(experiment_name="workflow", recorder_id=recorder_id)
    print("已保存的 artifacts:", recorder.list_artifacts())

    # 获取 artifact 路径
    artifact_path = recorder.artifact_uri.replace("file://", "")
    pred = pd.read_pickle(f"{artifact_path}/pred.pkl")
    label = pd.read_pickle(f"{artifact_path}/label.pkl")

    print("预测结果(前5行):")
    print(pred.head())
    print("预测结果(后5行):")
    print(pred.tail())
    print("预测结果时间范围:", pred.index.get_level_values('datetime').unique())

    print("标签结果(前5行):")
    print(label.head())
    print("标签结果(后5行):")
    print(label.tail())

    # 计算 IC
    ic = calc_ic(pred['score'], label['LABEL0'])

    print("【Step 6】IC 指标统计:")
    print("IC 均值:", ic[0].mean())
    print("IC 标准差:", ic[0].std())
    print("IC 绝对值均值:", ic[0].abs().mean())
    print("Rank IC 均值:", ic[1].mean())
    print("Rank IC 标准差:", ic[1].std())
    print("Rank IC 绝对值均值:", ic[1].abs().mean())

if __name__ == '__main__':
    main()

Enter fullscreen mode Exit fullscreen mode

通过这段代码,我们可以读取本地预测数据和标签数据,并且进行效果分析。
并且,我这里先分别打印一下头尾部数据,看看数据格式、日期比对等。
然后计算IC和Rank IC,方便看模型效果。

六、运行结果

我们执行一下代码python qlib_model_demo.py,可以得到下面的结果(大概2分钟完成):
在这里插入图片描述

(freq) test1@budas-MacBook-Pro user % python qlib_model_demo.py
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[56122:MainThread](2025-10-18 11:31:21,785) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56122:MainThread](2025-10-18 11:31:21,795) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56122:MainThread](2025-10-18 11:31:21,796) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
【Step 1】初始化模型和数据集...
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.

Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[56133:MainThread](2025-10-18 11:31:28,381) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56131:MainThread](2025-10-18 11:31:28,382) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56136:MainThread](2025-10-18 11:31:28,382) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56135:MainThread](2025-10-18 11:31:28,383) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56132:MainThread](2025-10-18 11:31:28,384) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56134:MainThread](2025-10-18 11:31:28,385) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56133:MainThread](2025-10-18 11:31:28,388) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56136:MainThread](2025-10-18 11:31:28,388) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56133:MainThread](2025-10-18 11:31:28,388) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56136:MainThread](2025-10-18 11:31:28,388) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56135:MainThread](2025-10-18 11:31:28,389) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56131:MainThread](2025-10-18 11:31:28,389) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56135:MainThread](2025-10-18 11:31:28,389) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56131:MainThread](2025-10-18 11:31:28,389) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56134:MainThread](2025-10-18 11:31:28,390) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56132:MainThread](2025-10-18 11:31:28,390) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56134:MainThread](2025-10-18 11:31:28,390) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56132:MainThread](2025-10-18 11:31:28,390) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
ModuleNotFoundError.  PyTorch models are skipped (optional: maybe installing pytorch can fix it).
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[56222:MainThread](2025-10-18 11:32:31,653) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56223:MainThread](2025-10-18 11:32:31,653) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56226:MainThread](2025-10-18 11:32:31,653) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56225:MainThread](2025-10-18 11:32:31,653) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56224:MainThread](2025-10-18 11:32:31,654) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56227:MainThread](2025-10-18 11:32:31,657) INFO - qlib.Initialization - [config.py:452] - default_conf: client.
[56223:MainThread](2025-10-18 11:32:31,658) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56226:MainThread](2025-10-18 11:32:31,658) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56225:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56222:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56223:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56225:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56226:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56222:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56224:MainThread](2025-10-18 11:32:31,659) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56224:MainThread](2025-10-18 11:32:31,660) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56227:MainThread](2025-10-18 11:32:31,661) INFO - qlib.Initialization - [__init__.py:75] - qlib successfully initialized based on client settings.
[56227:MainThread](2025-10-18 11:32:31,662) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56122:MainThread](2025-10-18 11:32:34,882) INFO - qlib.timer - [log.py:127] - Time cost: 73.083s | Loading data Done
[56122:MainThread](2025-10-18 11:32:36,646) INFO - qlib.timer - [log.py:127] - Time cost: 0.462s | DropnaLabel Done
[56122:MainThread](2025-10-18 11:32:39,763) INFO - qlib.timer - [log.py:127] - Time cost: 3.116s | CSZScoreNorm Done
[56122:MainThread](2025-10-18 11:32:39,826) INFO - qlib.timer - [log.py:127] - Time cost: 4.943s | fit & process data Done
[56122:MainThread](2025-10-18 11:32:39,827) INFO - qlib.timer - [log.py:127] - Time cost: 78.029s | Init data Done
【Step 2】启动实验并训练模型...
[56122:MainThread](2025-10-18 11:32:39,849) INFO - qlib.workflow - [exp.py:258] - Experiment 376922499957687719 starts running ...
[56122:MainThread](2025-10-18 11:32:40,631) INFO - qlib.workflow - [recorder.py:345] - Recorder 09c2896631e24499baebacb64603256c starts running under Experiment 376922499957687719 ...
warning: Not a git repository. Use --no-index to compare two paths outside a working tree
usage: git diff --no-index [<options>] <path> <path>

Diff output format options
    -p, --patch           generate patch
    -s, --no-patch        suppress diff output
    -u                    generate patch
    -U, --unified[=<n>]   generate diffs with <n> lines context
    -W, --function-context
                          generate diffs with <n> lines context
    --raw                 generate the diff in raw format
    --patch-with-raw      synonym for '-p --raw'
    --patch-with-stat     synonym for '-p --stat'
    --numstat             machine friendly --stat
    --shortstat           output only the last line of --stat
    -X, --dirstat[=<param1,param2>...]
                          output the distribution of relative amount of changes for each sub-directory
    --cumulative          synonym for --dirstat=cumulative
    --dirstat-by-file[=<param1,param2>...]
                          synonym for --dirstat=files,param1,param2...
    --check               warn if changes introduce conflict markers or whitespace errors
    --summary             condensed summary such as creations, renames and mode changes
    --name-only           show only names of changed files
    --name-status         show only names and status of changed files
    --stat[=<width>[,<name-width>[,<count>]]]
                          generate diffstat
    --stat-width <width>  generate diffstat with a given width
    --stat-name-width <width>
                          generate diffstat with a given name width
    --stat-graph-width <width>
                          generate diffstat with a given graph width
    --stat-count <count>  generate diffstat with limited lines
    --compact-summary     generate compact summary in diffstat
    --binary              output a binary diff that can be applied
    --full-index          show full pre- and post-image object names on the "index" lines
    --color[=<when>]      show colored diff
    --ws-error-highlight <kind>
                          highlight whitespace errors in the 'context', 'old' or 'new' lines in the diff
    -z                    do not munge pathnames and use NULs as output field terminators in --raw or --numstat
    --abbrev[=<n>]        use <n> digits to display object names
    --src-prefix <prefix>
                          show the given source prefix instead of "a/"
    --dst-prefix <prefix>
                          show the given destination prefix instead of "b/"
    --line-prefix <prefix>
                          prepend an additional prefix to every line of output
    --no-prefix           do not show any source or destination prefix
    --inter-hunk-context <n>
                          show context between diff hunks up to the specified number of lines
    --output-indicator-new <char>
                          specify the character to indicate a new line instead of '+'
    --output-indicator-old <char>
                          specify the character to indicate an old line instead of '-'
    --output-indicator-context <char>
                          specify the character to indicate a context instead of ' '

Diff rename options
    -B, --break-rewrites[=<n>[/<m>]]
                          break complete rewrite changes into pairs of delete and create
    -M, --find-renames[=<n>]
                          detect renames
    -D, --irreversible-delete
                          omit the preimage for deletes
    -C, --find-copies[=<n>]
                          detect copies
    --find-copies-harder  use unmodified files as source to find copies
    --no-renames          disable rename detection
    --rename-empty        use empty blobs as rename source
    --follow              continue listing the history of a file beyond renames
    -l <n>                prevent rename/copy detection if the number of rename/copy targets exceeds given limit

Diff algorithm options
    --minimal             produce the smallest possible diff
    -w, --ignore-all-space
                          ignore whitespace when comparing lines
    -b, --ignore-space-change
                          ignore changes in amount of whitespace
    --ignore-space-at-eol
                          ignore changes in whitespace at EOL
    --ignore-cr-at-eol    ignore carrier-return at the end of line
    --ignore-blank-lines  ignore changes whose lines are all blank
    -I, --ignore-matching-lines <regex>
                          ignore changes whose all lines match <regex>
    --indent-heuristic    heuristic to shift diff hunk boundaries for easy reading
    --patience            generate diff using the "patience diff" algorithm
    --histogram           generate diff using the "histogram diff" algorithm
    --diff-algorithm <algorithm>
                          choose a diff algorithm
    --anchored <text>     generate diff using the "anchored diff" algorithm
    --word-diff[=<mode>]  show word diff, using <mode> to delimit changed words
    --word-diff-regex <regex>
                          use <regex> to decide what a word is
    --color-words[=<regex>]
                          equivalent to --word-diff=color --word-diff-regex=<regex>
    --color-moved[=<mode>]
                          moved lines of code are colored differently
    --color-moved-ws <mode>
                          how white spaces are ignored in --color-moved

Other diff options
    --relative[=<prefix>]
                          when run from subdir, exclude changes outside and show relative paths
    -a, --text            treat all files as text
    -R                    swap two inputs, reverse the diff
    --exit-code           exit with 1 if there were differences, 0 otherwise
    --quiet               disable all output of the program
    --ext-diff            allow an external diff helper to be executed
    --textconv            run external text conversion filters when comparing binary files
    --ignore-submodules[=<when>]
                          ignore changes to submodules in the diff generation
    --submodule[=<format>]
                          specify how differences in submodules are shown
    --ita-invisible-in-index
                          hide 'git add -N' entries from the index
    --ita-visible-in-index
                          treat 'git add -N' entries as real in the index
    -S <string>           look for differences that change the number of occurrences of the specified string
    -G <regex>            look for differences that change the number of occurrences of the specified regex
    --pickaxe-all         show all changes in the changeset with -S or -G
    --pickaxe-regex       treat <string> in -S as extended POSIX regular expression
    -O <file>             control the order in which files appear in the output
    --rotate-to <path>    show the change in the specified path first
    --skip-to <path>      skip the output to the specified path
    --find-object <object-id>
                          look for differences that change the number of occurrences of the specified object
    --diff-filter [(A|C|D|M|R|T|U|X|B)...[*]]
                          select files by diff type
    --output <file>       Output to a specific file

[56122:MainThread](2025-10-18 11:32:40,664) INFO - qlib.workflow - [recorder.py:378] - Fail to log the uncommitted code of $CWD(/Users/test1/Documents/code/my_develop/qlib_data/user) when run git diff.
fatal: not a git repository (or any of the parent directories): .git
[56122:MainThread](2025-10-18 11:32:40,693) INFO - qlib.workflow - [recorder.py:378] - Fail to log the uncommitted code of $CWD(/Users/test1/Documents/code/my_develop/qlib_data/user) when run git status.
error: unknown option `cached'
usage: git diff --no-index [<options>] <path> <path>

Diff output format options
    -p, --patch           generate patch
    -s, --no-patch        suppress diff output
    -u                    generate patch
    -U, --unified[=<n>]   generate diffs with <n> lines context
    -W, --function-context
                          generate diffs with <n> lines context
    --raw                 generate the diff in raw format
    --patch-with-raw      synonym for '-p --raw'
    --patch-with-stat     synonym for '-p --stat'
    --numstat             machine friendly --stat
    --shortstat           output only the last line of --stat
    -X, --dirstat[=<param1,param2>...]
                          output the distribution of relative amount of changes for each sub-directory
    --cumulative          synonym for --dirstat=cumulative
    --dirstat-by-file[=<param1,param2>...]
                          synonym for --dirstat=files,param1,param2...
    --check               warn if changes introduce conflict markers or whitespace errors
    --summary             condensed summary such as creations, renames and mode changes
    --name-only           show only names of changed files
    --name-status         show only names and status of changed files
    --stat[=<width>[,<name-width>[,<count>]]]
                          generate diffstat
    --stat-width <width>  generate diffstat with a given width
    --stat-name-width <width>
                          generate diffstat with a given name width
    --stat-graph-width <width>
                          generate diffstat with a given graph width
    --stat-count <count>  generate diffstat with limited lines
    --compact-summary     generate compact summary in diffstat
    --binary              output a binary diff that can be applied
    --full-index          show full pre- and post-image object names on the "index" lines
    --color[=<when>]      show colored diff
    --ws-error-highlight <kind>
                          highlight whitespace errors in the 'context', 'old' or 'new' lines in the diff
    -z                    do not munge pathnames and use NULs as output field terminators in --raw or --numstat
    --abbrev[=<n>]        use <n> digits to display object names
    --src-prefix <prefix>
                          show the given source prefix instead of "a/"
    --dst-prefix <prefix>
                          show the given destination prefix instead of "b/"
    --line-prefix <prefix>
                          prepend an additional prefix to every line of output
    --no-prefix           do not show any source or destination prefix
    --inter-hunk-context <n>
                          show context between diff hunks up to the specified number of lines
    --output-indicator-new <char>
                          specify the character to indicate a new line instead of '+'
    --output-indicator-old <char>
                          specify the character to indicate an old line instead of '-'
    --output-indicator-context <char>
                          specify the character to indicate a context instead of ' '

Diff rename options
    -B, --break-rewrites[=<n>[/<m>]]
                          break complete rewrite changes into pairs of delete and create
    -M, --find-renames[=<n>]
                          detect renames
    -D, --irreversible-delete
                          omit the preimage for deletes
    -C, --find-copies[=<n>]
                          detect copies
    --find-copies-harder  use unmodified files as source to find copies
    --no-renames          disable rename detection
    --rename-empty        use empty blobs as rename source
    --follow              continue listing the history of a file beyond renames
    -l <n>                prevent rename/copy detection if the number of rename/copy targets exceeds given limit

Diff algorithm options
    --minimal             produce the smallest possible diff
    -w, --ignore-all-space
                          ignore whitespace when comparing lines
    -b, --ignore-space-change
                          ignore changes in amount of whitespace
    --ignore-space-at-eol
                          ignore changes in whitespace at EOL
    --ignore-cr-at-eol    ignore carrier-return at the end of line
    --ignore-blank-lines  ignore changes whose lines are all blank
    -I, --ignore-matching-lines <regex>
                          ignore changes whose all lines match <regex>
    --indent-heuristic    heuristic to shift diff hunk boundaries for easy reading
    --patience            generate diff using the "patience diff" algorithm
    --histogram           generate diff using the "histogram diff" algorithm
    --diff-algorithm <algorithm>
                          choose a diff algorithm
    --anchored <text>     generate diff using the "anchored diff" algorithm
    --word-diff[=<mode>]  show word diff, using <mode> to delimit changed words
    --word-diff-regex <regex>
                          use <regex> to decide what a word is
    --color-words[=<regex>]
                          equivalent to --word-diff=color --word-diff-regex=<regex>
    --color-moved[=<mode>]
                          moved lines of code are colored differently
    --color-moved-ws <mode>
                          how white spaces are ignored in --color-moved

Other diff options
    --relative[=<prefix>]
                          when run from subdir, exclude changes outside and show relative paths
    -a, --text            treat all files as text
    -R                    swap two inputs, reverse the diff
    --exit-code           exit with 1 if there were differences, 0 otherwise
    --quiet               disable all output of the program
    --ext-diff            allow an external diff helper to be executed
    --textconv            run external text conversion filters when comparing binary files
    --ignore-submodules[=<when>]
                          ignore changes to submodules in the diff generation
    --submodule[=<format>]
                          specify how differences in submodules are shown
    --ita-invisible-in-index
                          hide 'git add -N' entries from the index
    --ita-visible-in-index
                          treat 'git add -N' entries as real in the index
    -S <string>           look for differences that change the number of occurrences of the specified string
    -G <regex>            look for differences that change the number of occurrences of the specified regex
    --pickaxe-all         show all changes in the changeset with -S or -G
    --pickaxe-regex       treat <string> in -S as extended POSIX regular expression
    -O <file>             control the order in which files appear in the output
    --rotate-to <path>    show the change in the specified path first
    --skip-to <path>      skip the output to the specified path
    --find-object <object-id>
                          look for differences that change the number of occurrences of the specified object
    --diff-filter [(A|C|D|M|R|T|U|X|B)...[*]]
                          select files by diff type
    --output <file>       Output to a specific file

[56122:MainThread](2025-10-18 11:32:40,721) INFO - qlib.workflow - [recorder.py:378] - Fail to log the uncommitted code of $CWD(/Users/test1/Documents/code/my_develop/qlib_data/user) when run git diff --cached.
Training until validation scores don't improve for 50 rounds
[20]    train's l2: 0.990585    valid's l2: 0.99431
[40]    train's l2: 0.986931    valid's l2: 0.993693
[60]    train's l2: 0.984352    valid's l2: 0.99349
[80]    train's l2: 0.982319    valid's l2: 0.993382
[100]   train's l2: 0.980442    valid's l2: 0.99331
[120]   train's l2: 0.97871 valid's l2: 0.993247
[140]   train's l2: 0.976987    valid's l2: 0.993334
[160]   train's l2: 0.97536 valid's l2: 0.993338
Early stopping, best iteration is:
[122]   train's l2: 0.978519    valid's l2: 0.993238
【Step 3】生成预测信号...
[56122:MainThread](2025-10-18 11:33:25,686) INFO - qlib.workflow - [record_temp.py:198] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 376922499957687719
'The following are prediction results of the LGBModel model.'
                          score
datetime   instrument
2017-01-03 SH600000   -0.042865
           SH600008    0.005925
           SH600009    0.030596
           SH600010   -0.013973
           SH600015   -0.141758
【Step 4】获取当前实验的 recorder_id,用于后续读取结果...
当前实验的 recorder_id 为:09c2896631e24499baebacb64603256c
[56122:MainThread](2025-10-18 11:33:25,736) INFO - qlib.timer - [log.py:127] - Time cost: 0.000s | waiting `async_log` Done
【Step 5】读取预测结果并计算 IC...
已保存的 artifacts: ['label.pkl', 'pred.pkl']
预测结果(前5行):
                          score
datetime   instrument
2017-01-03 SH600000   -0.042865
           SH600008    0.005925
           SH600009    0.030596
           SH600010   -0.013973
           SH600015   -0.141758
预测结果(后5行):
                          score
datetime   instrument
2020-07-31 SZ300413   -0.078162
           SZ300433   -0.101778
           SZ300498   -0.054418
           SZ300601   -0.147531
           SZ300628    0.030925
预测结果时间范围: DatetimeIndex(['2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06',
               '2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
               '2017-01-13', '2017-01-16',
               ...
               '2020-07-20', '2020-07-21', '2020-07-22', '2020-07-23',
               '2020-07-24', '2020-07-27', '2020-07-28', '2020-07-29',
               '2020-07-30', '2020-07-31'],
              dtype='datetime64[ns]', name='datetime', length=871, freq=None)
标签结果(前5行):
                         LABEL0
datetime   instrument
2017-01-03 SH600000   -0.001831
           SH600008   -0.002398
           SH600009    0.001493
           SH600010    0.003520
           SH600015   -0.007142
标签结果(后5行):
                         LABEL0
datetime   instrument
2020-07-31 SZ300413   -0.037566
           SZ300433   -0.031677
           SZ300498   -0.006531
           SZ300601    0.090264
           SZ300628    0.004142
【Step 6】IC 指标统计:
IC 均值: 0.04993267859655785
IC 标准差: 0.12446777545374337
IC 绝对值均值: 0.10684702003092757
Rank IC 均值: 0.051507508261451146
Rank IC 标准差: 0.12273969195405407
Rank IC 绝对值均值: 0.10527763281820797
(freq) test1@budas-MacBook-Pro user %
Enter fullscreen mode Exit fullscreen mode

七、结果分析

我们可以忽略一些不重要的警告信息。

[56227:MainThread](2025-10-18 11:32:31,662) INFO - qlib.Initialization - [__init__.py:77] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/test1/Documents/code/my_develop/qlib_data/cn_data_snapshot')}
[56122:MainThread](2025-10-18 11:32:34,882) INFO - qlib.timer - [log.py:127] - Time cost: 73.083s | Loading data Done
[56122:MainThread](2025-10-18 11:32:36,646) INFO - qlib.timer - [log.py:127] - Time cost: 0.462s | DropnaLabel Done
[56122:MainThread](2025-10-18 11:32:39,763) INFO - qlib.timer - [log.py:127] - Time cost: 3.116s | CSZScoreNorm Done
[56122:MainThread](2025-10-18 11:32:39,826) INFO - qlib.timer - [log.py:127] - Time cost: 4.943s | fit & process data Done
[56122:MainThread](2025-10-18 11:32:39,827) INFO - qlib.timer - [log.py:127] - Time cost: 78.029s | Init data Done
Enter fullscreen mode Exit fullscreen mode

上面这部分日志,展示了一些加载数据、预处理数据的过程,以及消耗的时长。


【Step 2】启动实验并训练模型...
[56122:MainThread](2025-10-18 11:32:39,849) INFO - qlib.workflow - [exp.py:258] - Experiment 376922499957687719 starts running ...
[56122:MainThread](2025-10-18 11:32:40,631) INFO - qlib.workflow - [recorder.py:345] - Recorder 09c2896631e24499baebacb64603256c starts running under Experiment 376922499957687719 ...

...
[56122:MainThread](2025-10-18 11:32:40,721) INFO - qlib.workflow - [recorder.py:378] - Fail to log the uncommitted code of $CWD(/Users/test1/Documents/code/my_develop/qlib_data/user) when run git diff --cached.
Training until validation scores don't improve for 50 rounds
[20]    train's l2: 0.990585    valid's l2: 0.99431
[40]    train's l2: 0.986931    valid's l2: 0.993693
[60]    train's l2: 0.984352    valid's l2: 0.99349
[80]    train's l2: 0.982319    valid's l2: 0.993382
[100]   train's l2: 0.980442    valid's l2: 0.99331
[120]   train's l2: 0.97871 valid's l2: 0.993247
[140]   train's l2: 0.976987    valid's l2: 0.993334
[160]   train's l2: 0.97536 valid's l2: 0.993338
Early stopping, best iteration is:
[122]   train's l2: 0.978519    valid's l2: 0.993238
Enter fullscreen mode Exit fullscreen mode

这部分是训练过程,early-stop 在 122 棵
训练集 L2=0.9785,验证集 L2=0.9932 训练误差 < 验证误差,轻微过拟合,但不算严重,模型的早停策略起作用了。


【Step 3】生成预测信号...
[56122:MainThread](2025-10-18 11:33:25,686) INFO - qlib.workflow - [record_temp.py:198] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 376922499957687719
'The following are prediction results of the LGBModel model.'
                          score
datetime   instrument
2017-01-03 SH600000   -0.042865
           SH600008    0.005925
           SH600009    0.030596
           SH600010   -0.013973
           SH600015   -0.141758
【Step 4】获取当前实验的 recorder_id,用于后续读取结果...
当前实验的 recorder_id 为:09c2896631e24499baebacb64603256c
[56122:MainThread](2025-10-18 11:33:25,736) INFO - qlib.timer - [log.py:127] - Time cost: 0.000s | waiting `async_log` Done
【Step 5】读取预测结果并计算 IC...
已保存的 artifacts: ['label.pkl', 'pred.pkl']
预测结果(前5行):
                          score
datetime   instrument
2017-01-03 SH600000   -0.042865
           SH600008    0.005925
           SH600009    0.030596
           SH600010   -0.013973
           SH600015   -0.141758
预测结果(后5行):
                          score
datetime   instrument
2020-07-31 SZ300413   -0.078162
           SZ300433   -0.101778
           SZ300498   -0.054418
           SZ300601   -0.147531
           SZ300628    0.030925
预测结果时间范围: DatetimeIndex(['2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06',
               '2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
               '2017-01-13', '2017-01-16',
               ...
               '2020-07-20', '2020-07-21', '2020-07-22', '2020-07-23',
               '2020-07-24', '2020-07-27', '2020-07-28', '2020-07-29',
               '2020-07-30', '2020-07-31'],
              dtype='datetime64[ns]', name='datetime', length=871, freq=None)
标签结果(前5行):
                         LABEL0
datetime   instrument
2017-01-03 SH600000   -0.001831
           SH600008   -0.002398
           SH600009    0.001493
           SH600010    0.003520
           SH600015   -0.007142
标签结果(后5行):
                         LABEL0
datetime   instrument
2020-07-31 SZ300413   -0.037566
           SZ300433   -0.031677
           SZ300498   -0.006531
           SZ300601    0.090264
           SZ300628    0.004142
【Step 6】IC 指标统计:
IC 均值: 0.04993267859655785
IC 标准差: 0.12446777545374337
IC 绝对值均值: 0.10684702003092757
Rank IC 均值: 0.051507508261451146
Rank IC 标准差: 0.12273969195405407
Rank IC 绝对值均值: 0.10527763281820797
Enter fullscreen mode Exit fullscreen mode

这里主要就是预测的结果,以及计算效果。

我们用表格来分析一下:

指标 数值 业内参考
IC 均值 0.050 0.03 以下≈无效;0.05≈“可用”;0.1+≈“优秀”
IC 绝对值均值 0.107 同上,绝对值越高越好
IC 标准差 0.124 波动大,方向不稳定
Rank IC 与 IC 几乎持平 说明非线性单调性也没带来额外信息

所以这套代码属于刚刚可用水平,但至少我们跑通了。
剩下只需要优化模型,提升IC即可。

继续努力💪

八、所有代码

我把这套整体demo放在:https://github.com/JizhiXiang/Quant-Strategy上。

Top comments (0)