总览
其中 workflow 是 qlib 的重点之一,可以分成以下的三个部分,可以从这三个层次来学习源码
Config
类图
默认配置
_default_config = {
# data provider config
"calendar_provider": "LocalCalendarProvider",
"instrument_provider": "LocalInstrumentProvider",
"feature_provider": "LocalFeatureProvider",
"pit_provider": "LocalPITProvider",
"expression_provider": "LocalExpressionProvider",
"dataset_provider": "LocalDatasetProvider",
"provider": "LocalProvider",
# config it in qlib.init()
# "provider_uri" str or dict:
# # str
# "~/.qlib/stock_data/cn_data"
# # dict
# {"day": "~/.qlib/stock_data/cn_data", "1min": "~/.qlib/stock_data/cn_data_1min"}
# NOTE: provider_uri priority:
# 1. backend_config: backend_obj["kwargs"]["provider_uri"]
# 2. backend_config: backend_obj["kwargs"]["provider_uri_map"]
# 3. qlib.init: provider_uri
"provider_uri": "",
# cache
"expression_cache": None,
"calendar_cache": None,
# for simple dataset cache
"local_cache_path": None,
# kernels can be a fixed value or a callable function lie `def (freq: str) -> int`
# If the kernels are arctic_kernels, `min(NUM_USABLE_CPU, 30)` may be a good value
"kernels": NUM_USABLE_CPU,
# pickle.dump protocol version
"dump_protocol_version": PROTOCOL_VERSION,
# How many tasks belong to one process. Recommend 1 for high-frequency data and None for daily data.
"maxtasksperchild": None,
# If joblib_backend is None, use loky
"joblib_backend": "multiprocessing",
"default_disk_cache": 1, # 0:skip/1:use
"mem_cache_size_limit": 500,
"mem_cache_limit_type": "length",
# memory cache expire second, only in used 'DatasetURICache' and 'client D.calendar'
# default 1 hour
"mem_cache_expire": 60 * 60,
# cache dir name
"dataset_cache_dir_name": "dataset_cache",
"features_cache_dir_name": "features_cache",
# redis
# in order to use cache
"redis_host": "127.0.0.1",
"redis_port": 6379,
"redis_task_db": 1,
# This value can be reset via qlib.init
"logging_level": logging.INFO,
# Global configuration of qlib log
# logging_level can control the logging level more finely
"logging_config": {
"version": 1,
"formatters": {
"logger_format": {
"format": "[%(process)s:%(threadName)s](%(asctime)s) %(levelname)s - %(name)s - [%(filename)s:%(lineno)d] - %(message)s"
}
},
"filters": {
"field_not_found": {
"()": "qlib.log.LogFilter",
"param": [".*?WARN: data not found for.*?"],
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": logging.DEBUG,
"formatter": "logger_format",
"filters": ["field_not_found"],
}
},
"loggers": {"qlib": {"level": logging.DEBUG, "handlers": ["console"]}},
# To let qlib work with other packages, we shouldn't disable existing loggers.
# Note that this param is default to True according to the documentation of logging.
"disable_existing_loggers": False,
},
# Default config for experiment manager
"exp_manager": {
"class": "MLflowExpManager",
"module_path": "qlib.workflow.expm",
"kwargs": {
"uri": "file:" + str(Path(os.getcwd()).resolve() / "mlruns"),
"default_exp_name": "Experiment",
},
},
"pit_record_type": {
"date": "I", # uint32
"period": "I", # uint32
"value": "d", # float64
"index": "I", # uint32
},
"pit_record_nan": {
"date": 0,
"period": 0,
"value": float("NAN"),
"index": 0xFFFFFFFF,
},
# Default config for MongoDB
"mongo": {
"task_url": "mongodb://localhost:27017/",
"task_db_name": "default_task_db",
},
# Shift minute for highfreq minute data, used in backtest
# if min_data_shift == 0, use default market time [9:30, 11:29, 1:00, 2:59]
# if min_data_shift != 0, use shifted market time [9:30, 11:29, 1:00, 2:59] - shift*minute
"min_data_shift": 0,
}
例子配置
qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy
kwargs:
model: <MODEL>
dataset: <DATASET>
topk: 50
n_drop: 5
backtest:
start_time: 2017-01-01
end_time: 2020-08-01
account: 100000000
benchmark: *benchmark
exchange_kwargs:
limit_threshold: 0.095
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
task:
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.2
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
dataset:
class: DatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs:
model: <MODEL>
dataset: <DATASET>
- class: SigAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
ana_long_short: False
ann_scaler: 252
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config
名字 | 含义 | 作用 |
---|---|---|
C | 配置 | 驱动整个流程运行的配置 |
Cal | 日历 | 加载日历数据,涉及文件读取 |
Inst | 股票池 | 加载股票池数据,涉及文件读取 |
FeatureD | 特征 | 加载原始 feature 数据,涉及文件读取 |
PITD | PointInTime 数据 | 加载 PITD 数据 |
ExpressionD | 表达式引擎 | 计算表达式 |
DatasetD | 数据集 | 加载训练推理所需数据 |
D | BaseProvider | 所有没被拆分的 Provider 功能都在这个 Provider 提供 |
R | QlibRecorder | 是整个流程中最重要的管理器,用于管理实验、实验记录 |
Data
DatasetProvider
类图
分析
Storage
类图
分析
Dataset
类图
分析
Alpha158 是指一个包含158种因子(factor)的多因子模型,它通过对股票基本面、技术面和市场因素等多个方面的分析,计算出一个股票的预期收益率。Alpha158 的目标是利用这些因子来打败市场的平均收益率。
Alpha360 则是另一个多因子模型,它包含360种因子,比 Alpha158 更加复杂。Alpha360 的目标也是通过多因子分析来预测股票的收益率,以期在市场中获得超额收益。
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
instruments: *market
data_loader:
class: QlibDataLoader
kwargs:
config:
feature:
- ["Resi($close, 15)/$close", "Std(Abs($close/Ref($close, 1)-1)*$volume, 5)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 5)+1e-12)", "Rsquare($close, 5)", "($high-$low)/$open", "Rsquare($close, 10)", "Corr($close, Log($volume+1), 5)", "Corr($close/Ref($close,1), Log($volume/Ref($volume, 1)+1), 5)", "Corr($close, Log($volume+1), 10)", "Rsquare($close, 20)", "Corr($close/Ref($close,1), Log($volume/Ref($volume, 1)+1), 60)", "Corr($close/Ref($close,1), Log($volume/Ref($volume, 1)+1), 10)", "Corr($close, Log($volume+1), 20)", "(Less($open, $close)-$low)/$open"]
- ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10", "RSQR20", "CORD60", "CORD10", "CORR20", "KLOW"]
label:
- ["Ref($close, -2)/Ref($close, -1) - 1"]
- ["LABEL0"]
freq: day
data_loader_config: &data_loader_config
class: StaticDataLoader
module_path: qlib.data.dataset.loader
kwargs:
config:
feature: data/feature.pkl
label: data/label.pkl
for field in column_names: # The client does not have expression provider, the data will be loaded from cache using static method. obj[field] = ExpressionD.expression(inst, field, start_time, end_time, freq) data = pd.DataFrame(obj) data.index = Cal.calendar(freq=freq)[data.index.values.astype(int)] data.index.names = ["datetime"] for _processor in inst_processors: data = init_instance_by_config(_processor, accept_types=InstProcessor)(data, instrument=inst) return data
Expression & Feature
类图
分析
Processor
类图
分析
Model
BaseModel
类图
分析
Evaluation
Backtest
类图
分析
backtest 表示回测,里面有几个类分别代表了几个重要的概念
类名 | 概念 | 说明 |
---|---|---|
Exchange | 交易所 | 提供交易需要的相关信息,比如开仓费用率、平仓费用率、最低交易费用、交易额/量限制同时也提供了回测相关的信息,比如 回测的频率、开始结束时间、股票代码、交易价格 |
Account | 账户 | 提供账户相关的信息,比如初始现金金额、仓位 |
Order | 订单 | 某一个股票交易订单,提供如股票Id,下单数量,下单方向,下单的起止时间等信息 |
Positon | 持仓 | 当前股票的持仓,属于 account 的一部分 |
BaseTradeDecision | 交易决策 | 由 Strategy 产生,并由 Executor 执行,提供的信息主要是 OrderList |
BaseExecutor | 交易执行器 | 会执行交易生成并生成指标,依赖 Account,Exchange 这些基础组件 |
Signal | 交易信号 | 用来帮助 Strategy 获取生成 decision 的信号,比如模型的预测值等 |
Strategy | 交易策略 | 这是这里面的核心,有很多子类,用来生成交易决策,即 TradeDecision |
BaseInfrastructure | 基础设施 | 这个东西本质是一个 Map,存储了Executor 执行所需要的一些基础依赖,比如 Account 和 Exchange |
TradeCalendarManager |
In the Alpha158, Qlib uses the label Ref($close, -2)/Ref($close, -1) - 1 that means the change from T+1 to T+2, rather than Ref($close, -1)/$close - 1, of which the reason is that when getting the T day close price of a china stock, the stock can be bought on T+1 day and sold on T+2 day.
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。