abupy.CoreBu package¶
Submodules¶
abupy.CoreBu.ABu module¶
-
abupy.CoreBu.ABu.
gen_buy_from_chinese
(*args, **kwargs)[源代码]¶ - 抱歉!由于中文生成策略的方法也需要遵循一定的语法和句式,对于完全不熟悉编程的人可能会产生错误,’
- ‘造成无谓的经济损失,所以中文自动生成交易策略模块暂时不开放接口以及源代码!
-
abupy.CoreBu.ABu.
load_abu_result_tuple
(n_folds, store_type, custom_name=None)[源代码]¶ 读取使用store_abu_result_tuple保存的回测结果,根据n_folds,store_type参数 来定义读取的文件名称,依次读取orders_pd,action_pd,capital,benchmark后构造 AbuResultTuple对象返回,透传参数使用ABuStore.load_abu_result_tuple执行操作
参数: - n_folds – 回测执行了几年,只影响读取的文件名
- store_type – 回测保存类型EStoreAbu类型,只影响读取的文件名
- custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称
返回: AbuResultTuple对象
-
abupy.CoreBu.ABu.
run_kl_update
(n_folds=2, start=None, end=None, market=None, n_jobs=16, how='thread')[源代码]¶ 推荐在使用abu.run_loop_back()函数进行全市场回测前使用abu.run_kl_update()函数首先将数据进行更新, 在run_kl_update()中它会首选强制使用网络数据进行更新,在更新完毕后,更改数据获取方式为本地缓存 在run_kl_update实现根据EMarketTargetType类型即市场类型,进行全市场金融时间序列数据获取,使用多进 程或者多线程对外执行函数,多任务批量获取时间序列数据。
使用abu.run_kl_update()的好处是将数据更新与策略回测分离,在运行效率及问题排查上都会带来正面的提升
- eg:
from abupy import abu,EMarketTargetType # 港股全市场获取 abupy.env.g_market_target = EMarketTargetType.E_MARKET_TARGET_HK # 更新6年的数据 abu.run_kl_update(n_folds=6)
# A股全市场获取 abupy.env.g_market_target = EMarketTargetType.E_MARKET_TARGET_CN # 2013-07-10直到2016-07-26的数据 abu.run_kl_update(start=‘2013-07-10’, end=‘2016-07-26’)
参数: - n_folds – 请求几年的历史回测数据int
- start – 请求的开始日期 str对象, eg: ‘2013-07-10’
- end – 请求的结束日期 str对象 eg: ‘2016-07-26’
- market – 需要查询的市场,eg:EMarketTargetType.E_MARKET_TARGET_US
- n_jobs – 并行的任务数,对于进程代表进程数,线程代表线程数
- how – process:多进程,thread:多线程,main:单进程单线程
-
abupy.CoreBu.ABu.
run_loop_back
(read_cash, buy_factors, sell_factors, stock_picks=None, choice_symbols=None, n_folds=2, start=None, end=None, commission_dict=None, n_process_kl=None, n_process_pick=None)[源代码]¶ 封装执行择时,选股回测。
推荐在使用abu.run_loop_back()函数进行全市场回测前使用abu.run_kl_update()函数首先将数据进行更新, 在run_kl_update()中它会首选强制使用网络数据进行更新,在更新完毕后,更改数据获取方式为本地缓存, 使用abu.run_kl_update()的好处是将数据更新与策略回测分离,在运行效率及问题排查上都会带来正面的提升
参数: - read_cash – 初始化资金额度,eg:1000000
- buy_factors –
回测使用的买入因子策略序列, eg:
- buy_factors = [{‘xd’: 60, ‘class’: AbuFactorBuyBreak},
- {‘xd’: 42, ‘class’: AbuFactorBuyBreak}]
- sell_factors –
回测使用的卖出因子序列, eg:
- sell_factors = [{‘stop_loss_n’: 0.5, ‘stop_win_n’: 3.0, ‘class’: AbuFactorAtrNStop},
- {‘pre_atr_n’: 1.0, ‘class’: AbuFactorPreAtrNStop}, {‘close_atr_n’: 1.5, ‘class’: AbuFactorCloseAtrNStop},]
- stock_picks –
回测使用的选股因子序列: eg:
- stock_pickers = [{‘class’: AbuPickRegressAngMinMax,
- ‘threshold_ang_min’: 0.0, ‘reversed’: False},
- {‘class’: AbuPickStockPriceMinMax,
- ‘threshold_price_min’: 50.0, ‘reversed’: False}]
- choice_symbols –
- 备选股票池, 默认为None,即使用abupy.env.g_market_target的市场类型进行全市场回测,
- 为None的情况下为symbol序列
- eg:
- choice_symbols = [‘usNOAH’, ‘usSFUN’, ‘usBIDU’, ‘usAAPL’, ‘usGOOG’,
- ‘usTSLA’, ‘usWUBA’, ‘usVIPS’]
- n_folds – int, 回测n_folds年的历史数据
- start – 回测开始的时间, str对象, eg: ‘2013-07-10’
- end – 回测结束的时间, str对象 eg: ‘2016-07-26’
- commission_dict –
透传给AbuCapital,自定义交易手续费的时候时候。 eg:
- def free_commission(trade_cnt, price):
- # 免手续费 return 0
- commission_dict = {‘buy_commission_func’: free_commission,
- ‘sell_commission_func’: free_commission}
AbuCapital(read_cash, benchmark, user_commission_dict=commission_dict)
- n_process_kl – 金融时间序列数据收集启动并行的进程数,默认None, 内部根据cpu数量分配
- n_process_pick – 择时与选股操作启动并行的进程数,默认None, 内部根据cpu数量分配
返回: (AbuResultTuple对象, AbuKLManager对象)
-
abupy.CoreBu.ABu.
store_abu_result_tuple
(abu_result_tuple, n_folds, store_type=None, custom_name=None)[源代码]¶ 保存abu.run_loop_back的回测结果AbuResultTuple对象,根据n_folds,store_type参数 来定义存储的文件名称,透传参数使用ABuStore.store_abu_result_tuple执行操作
参数: - abu_result_tuple – AbuResultTuple对象类型
- n_folds – 回测执行了几年,只影响存贮文件名
- store_type – 回测保存类型EStoreAbu类型,只影响存贮文件名
- custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称
abupy.CoreBu.ABuBase module¶
类基础通用模块
abupy.CoreBu.ABuDeprecated module¶
Deprecated警告模块
abupy.CoreBu.ABuEnv module¶
全局环境配置模块
-
class
abupy.CoreBu.ABuEnv.
EDataCacheType
[源代码]¶ Bases:
enum.Enum
金融时间序列数据缓存类型
-
E_DATA_CACHE_CSV
= 1¶ 适合分布式扩展,存贮空间需要大
-
E_DATA_CACHE_HDF5
= 0¶ 读取及写入最慢 但非固态硬盘写速度还可以,存贮空间需要小
-
E_DATA_CACHE_MONGODB
= 2¶
-
-
class
abupy.CoreBu.ABuEnv.
EMarketDataFetchMode
[源代码]¶ Bases:
enum.Enum
金融时间数据获取模式
-
E_DATA_FETCH_FORCE_LOCAL
= 1¶ 强制从网络获取数据,不管本地数据是否满足
-
E_DATA_FETCH_FORCE_NET
= 2¶
-
E_DATA_FETCH_NORMAL
= 0¶ 强制从本地获取数据,本地数据不满足的情况下,返回None
-
-
class
abupy.CoreBu.ABuEnv.
EMarketDataSplitMode
[源代码]¶ Bases:
enum.Enum
ABuSymbolPd中请求参数,关于是否需要与基准数据对齐切割
-
E_DATA_SPLIT_SE
= 1¶
-
E_DATA_SPLIT_UNDO
= 0¶ 内部根据start,end取切割data
-
-
class
abupy.CoreBu.ABuEnv.
EMarketSourceType
[源代码]¶ Bases:
enum.Enum
数据源,当数据获取不可靠时,可尝试切换数据源,更可连接私有的数据源
-
E_MARKET_SOURCE_bd
= 0¶ 腾讯 a股,美股,港股
-
E_MARKET_SOURCE_hb_tc
= 200¶
-
E_MARKET_SOURCE_nt
= 2¶ 新浪 美股
-
E_MARKET_SOURCE_sn_futures
= 100¶ 新浪 国际期货
-
E_MARKET_SOURCE_sn_futures_gb
= 101¶ 火币 比特币,莱特币
-
E_MARKET_SOURCE_sn_us
= 3¶ 新浪 国内期货
-
E_MARKET_SOURCE_tx
= 1¶ 网易 a股,美股,港股
-
-
class
abupy.CoreBu.ABuEnv.
EMarketSubType
[源代码]¶ Bases:
enum.Enum
子市场(交易所)类型定义
-
CBOT
= 'CBOT'¶ 纽约商品交易所
-
COIN
= 'COIN'¶
-
DCE
= 'DCE'¶ 郑州商品交易所ZZCE’
-
HK
= 'hk'¶ 上证交易所sh
-
LME
= 'LME'¶ 芝加哥商品交易所
-
NYMEX
= 'NYMEX'¶ 币类子市场COIN’
-
SH
= 'sh'¶ 深圳交易所sz
-
SHFE
= 'SHFE'¶ 伦敦金属交易所
-
SZ
= 'sz'¶ 大连商品交易所DCE’
-
US_N
= 'NYSE'¶ 美股纳斯达克NASDAQ
-
US_OQ
= 'NASDAQ'¶ 美股粉单市场
-
US_OTC
= 'OTCMKTS'¶ 未上市
-
US_PINK
= 'PINK'¶ 美股OTCMKTS
-
US_PREIPO
= 'PREIPO'¶ 港股hk
-
ZZCE
= 'ZZCE'¶ 上海期货交易所SHFE’
-
-
class
abupy.CoreBu.ABuEnv.
EMarketTargetType
[源代码]¶ Bases:
enum.Enum
交易品种类型,即市场类型, eg. 美股市场, A股市场, 港股市场, 国内期货市场,
美股期权市场, TC币市场(比特币等-
E_MARKET_TARGET_CN
= 'hs'¶ 港股市场
-
E_MARKET_TARGET_FUTURES_CN
= 'futures_cn'¶ 国际期货市场
-
E_MARKET_TARGET_FUTURES_GLOBAL
= 'futures_global'¶ 美股期权市场
-
E_MARKET_TARGET_HK
= 'hk'¶ 国内期货市场
-
E_MARKET_TARGET_OPTIONS_US
= 'options_us'¶ TC币市场(比特币等)
-
E_MARKET_TARGET_TC
= 'tc'¶
-
E_MARKET_TARGET_US
= 'us'¶ A股市场
-
-
abupy.CoreBu.ABuEnv.
disable_example_env_ipython
()[源代码]¶ 只为在ipython example 环境中运行与书中一样的数据。,即读取RomDataBu/df_kl.h5下的数据 :return:
-
abupy.CoreBu.ABuEnv.
enable_example_env_ipython
()[源代码]¶ 只为在ipython example 环境中运行与书中一样的数据,即读取RomDataBu/df_kl.h5下的数据
初始内置在RomDataBu/df_kl.h5.zip下的数据只有zip压缩包,因为git上面的文件最好不要超过50m, 内置测试数据,包括美股,a股,期货,比特币,港股数据初始化在df_kl_ext.h5.zip中,通过解压zip 之后将测试数据为df_kl.h5 :return:
-
abupy.CoreBu.ABuEnv.
g_data_cache_type
= <EDataCacheType.E_DATA_CACHE_CSV: 1>¶ csv模式下的存储路径
-
abupy.CoreBu.ABuEnv.
g_data_fetch_mode
= <EMarketDataFetchMode.E_DATA_FETCH_NORMAL: 0>¶ 是否开启ipython example 环境,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_last_split_test
= False¶ 是否开启选股使用上一次切割完成的训练集股票数据,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_last_split_train
= False¶ 选股切割训练集股票数据与测试集股票数据切割参数n_folds,默认10
-
abupy.CoreBu.ABuEnv.
g_enable_ml_feature
= False¶ 是否开启买入订单前生成k线图快照,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_take_kl_snapshot
= False¶ 是否开启选股切割训练集股票数据与测试集股票数据,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_train_test_split
= False¶ 是否开启选股使用上一次切割完成的测试集股票数据,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_edge_deg_block
= False¶ 是否开启裁判拦截机制: 边裁price,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_edge_price_block
= False¶ 是否开启裁判拦截机制: 边裁wave,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_edge_wave_block
= False¶ 是否开启裁判拦截机制: 边裁full,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_main_deg_block
= False¶ 是否开启裁判拦截机制: 主裁jump,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_main_jump_block
= False¶ 是否开启裁判拦截机制: 主裁price,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_main_price_block
= False¶ 是否开启裁判拦截机制: 主裁wave,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_enable_ump_main_wave_block
= False¶ 是否开启裁判拦截机制: 边裁deg,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_ignore_all_warnings
= False¶ 忽略库警告,默认打开
-
abupy.CoreBu.ABuEnv.
g_is_ipython
= False¶ 主进程pid,使用并行时由于ABuEnvProcess会拷贝主进程注册了的模块信息,所以可以用g_main_pid来判断是否在主进程
-
abupy.CoreBu.ABuEnv.
g_is_mac_os
= True¶ python版本环境,是否python3
-
abupy.CoreBu.ABuEnv.
g_is_py3
= True¶ ipython,是否ipython运行环境
-
abupy.CoreBu.ABuEnv.
g_market_source
= <EMarketSourceType.E_MARKET_SOURCE_bd: 0>¶ 自定义的私有数据源类,默认None
-
abupy.CoreBu.ABuEnv.
g_market_target
= <EMarketTargetType.E_MARKET_TARGET_US: 'us'>¶ 市场中1年交易日,默认250日
-
abupy.CoreBu.ABuEnv.
g_project_cache_dir
= '/Users/tu/abu/data/cache'¶ abu项目数据主文件目录,即项目中的RomDataBu位置
-
abupy.CoreBu.ABuEnv.
g_project_data_dir
= '/Users/tu/abu/data'¶ abu日志文件夹 ~/abu/log
-
abupy.CoreBu.ABuEnv.
g_project_db_dir
= '/Users/tu/abu/db'¶ abu缓存文件夹 ~/abu/cache
-
abupy.CoreBu.ABuEnv.
g_project_kl_df_data_csv
= '/Users/tu/abu/data/csv'¶ 是否开启机器学习特征收集, 开启后速度会慢,默认关闭False
-
abupy.CoreBu.ABuEnv.
g_project_kl_df_data_example
= '/Users/tu/PycharmProjects/yabee_abu/abupy/RomDataBu/df_kl.h5'¶ chrome 驱动
-
abupy.CoreBu.ABuEnv.
g_project_log_dir
= '/Users/tu/abu/log'¶ abu数据库文件夹 ~/abu/db
-
abupy.CoreBu.ABuEnv.
g_project_log_info
= '/Users/tu/abu/log/info.log'¶ hdf5做为金融时间序列存储的路径
-
abupy.CoreBu.ABuEnv.
g_project_rom_data_dir
= '/Users/tu/PycharmProjects/yabee_abu/abupy/CoreBu/../RomDataBu'¶ abu日志文件 ~/abu/log/info.log
-
abupy.CoreBu.ABuEnv.
g_project_root
= '/Users/tu/abu'¶ abu数据文件夹 ~/abu/data
-
abupy.CoreBu.ABuEnv.
g_split_tt_n_folds
= 10¶ 是否开启裁判拦截机制: 主裁deg,默认关闭False
-
abupy.CoreBu.ABuEnv.
root_drive
= '/Users/tu'¶ abu数据缓存主目录文件夹
abupy.CoreBu.ABuEnvProcess module¶
多任务子进程拷贝跟随主进程设置模块
abupy.CoreBu.ABuFixes module¶
对各个依赖库不同版本,不同系统的规范进行统一以及问题修正模块
-
class
abupy.CoreBu.ABuFixes.
KFold
(n, n_folds=3, shuffle=False, random_state=None)[源代码]¶ Bases:
object
sklearn将KFold移动到了model_selection,而且改变了用法,暂时不需要 这么复杂的功能,将sklearn中关键代码简单实现,不from sklearn.model_selection import KFold
-
abupy.CoreBu.ABuFixes.
np_version
= (1, 12, 1)¶ sklearn 版本号tuple
-
abupy.CoreBu.ABuFixes.
pd_version
= (0, 20, 1)¶ scipy 版本号tuple
-
abupy.CoreBu.ABuFixes.
skl_version
= (0, 18, 1)¶ pandas 版本号tuple
-
abupy.CoreBu.ABuFixes.
sp_version
= (0, 19, 0)¶ matplotlib 版本号tuple
abupy.CoreBu.ABuParallel module¶
并行封装模块,主要针对不同平台统一接口规范:
windows 上使用joblib进行长时间的多任务,如超过10小时以上时,在任何最后有系统pop任务 的错误,所以windows上使用ProcessPoolExecutor进行多任务,套上Parallel和delayed保持接口的 通用性及规范统一
-
class
abupy.CoreBu.ABuParallel.
Parallel
(n_jobs=1, backend='multiprocessing', verbose=0, pre_dispatch='2 * n_jobs', batch_size='auto', temp_folder=None, max_nbytes='1M', mmap_mode='r')[源代码]¶ Bases:
object
封装ProcessPoolExecutor进行并行任务执行操作
-
abupy.CoreBu.ABuParallel.
delayed
(function)[源代码]¶ 将function通过functools.wraps及delayed_function进行保留,但不执行 :param function: :return:
abupy.CoreBu.ABuPdHelper module¶
封装pandas中版本兼容问题,保持接口规范情况下,避免警告
abupy.CoreBu.ABuStore module¶
针对交易回测结果存储,读取模块
-
class
abupy.CoreBu.ABuStore.
AbuResultTuple
[源代码]¶ Bases:
abupy.CoreBu.ABuStore.AbuResultTuple
使用abu.run_loop_back返回的nametuple对象:
orders_pd:回测结果生成的交易订单构成的pd.DataFrame对象 action_pd: 回测结果生成的交易行为构成的pd.DataFrame对象 capital: 资金类AbuCapital实例化对象 benchmark: 交易基准对象,AbuBenchmark实例对象
-
class
abupy.CoreBu.ABuStore.
EStoreAbu
[源代码]¶ Bases:
enum.Enum
保存回测结果的enum类型
-
E_STORE_CUSTOM_NAME
= 5¶
-
E_STORE_NORMAL
= 0¶ 保存训练集回测,存储文件后缀为train
-
E_STORE_TEST
= 2¶ 保存测试集交易使用主裁ump进行回测,存储文件后缀为test_ump
-
E_STORE_TEST_UMP
= 3¶ 保存测试集交易使用主裁+边裁ump进行回测,存储文件后缀为test_ump_with_edge
-
E_STORE_TEST_UMP_WITH_EDGE
= 4¶ 保存测回测,存储文件后缀为自定义字符串
-
E_STORE_TRAIN
= 1¶ 保存测试集交易回测,存储文件后缀为test
-
-
abupy.CoreBu.ABuStore.
load_abu_result_tuple
(n_folds, store_type, custom_name=None)[源代码]¶ 读取使用store_abu_result_tuple保存的回测结果,根据n_folds,store_type参数 来定义读取的文件名称,依次读取orders_pd,action_pd,capital,benchmark后构造 AbuResultTuple对象返回
参数: - n_folds – 回测执行了几年,只影响读取的文件名
- store_type – 回测保存类型EStoreAbu类型,只影响读取的文件名
- custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称
返回: AbuResultTuple对象
-
abupy.CoreBu.ABuStore.
store_abu_result_tuple
(abu_result_tuple, n_folds, store_type=None, custom_name=None)[源代码]¶ 保存abu.run_loop_back的回测结果AbuResultTuple对象,根据n_folds,store_type参数 来定义存储的文件名称
参数: - abu_result_tuple – AbuResultTuple对象类型
- n_folds – 回测执行了几年,只影响存贮文件名
- store_type – 回测保存类型EStoreAbu类型,只影响存贮文件名
- custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称
Module contents¶
-
class
abupy.CoreBu.
AbuResultTuple
[源代码]¶ Bases:
abupy.CoreBu.ABuStore.AbuResultTuple
使用abu.run_loop_back返回的nametuple对象:
orders_pd:回测结果生成的交易订单构成的pd.DataFrame对象 action_pd: 回测结果生成的交易行为构成的pd.DataFrame对象 capital: 资金类AbuCapital实例化对象 benchmark: 交易基准对象,AbuBenchmark实例对象
-
class
abupy.CoreBu.
EStoreAbu
[源代码]¶ Bases:
enum.Enum
保存回测结果的enum类型
-
E_STORE_CUSTOM_NAME
= 5¶
-
E_STORE_NORMAL
= 0¶
-
E_STORE_TEST
= 2¶
-
E_STORE_TEST_UMP
= 3¶
-
E_STORE_TEST_UMP_WITH_EDGE
= 4¶
-
E_STORE_TRAIN
= 1¶
-
-
class
abupy.CoreBu.
Parallel
(n_jobs=1, backend='multiprocessing', verbose=0, pre_dispatch='2 * n_jobs', batch_size='auto', temp_folder=None, max_nbytes='1M', mmap_mode='r')[源代码]¶ Bases:
object
封装ProcessPoolExecutor进行并行任务执行操作
-
abupy.CoreBu.
delayed
(function)[源代码]¶ 将function通过functools.wraps及delayed_function进行保留,但不执行 :param function: :return:
-
class
abupy.CoreBu.
EMarketSourceType
[源代码]¶ Bases:
enum.Enum
数据源,当数据获取不可靠时,可尝试切换数据源,更可连接私有的数据源
-
E_MARKET_SOURCE_bd
= 0¶
-
E_MARKET_SOURCE_hb_tc
= 200¶
-
E_MARKET_SOURCE_nt
= 2¶
-
E_MARKET_SOURCE_sn_futures
= 100¶
-
E_MARKET_SOURCE_sn_futures_gb
= 101¶
-
E_MARKET_SOURCE_sn_us
= 3¶
-
E_MARKET_SOURCE_tx
= 1¶
-
-
class
abupy.CoreBu.
EMarketTargetType
[源代码]¶ Bases:
enum.Enum
交易品种类型,即市场类型, eg. 美股市场, A股市场, 港股市场, 国内期货市场,
美股期权市场, TC币市场(比特币等-
E_MARKET_TARGET_CN
= 'hs'¶
-
E_MARKET_TARGET_FUTURES_CN
= 'futures_cn'¶
-
E_MARKET_TARGET_FUTURES_GLOBAL
= 'futures_global'¶
-
E_MARKET_TARGET_HK
= 'hk'¶
-
E_MARKET_TARGET_OPTIONS_US
= 'options_us'¶
-
E_MARKET_TARGET_TC
= 'tc'¶
-
E_MARKET_TARGET_US
= 'us'¶
-
-
class
abupy.CoreBu.
EMarketSubType
[源代码]¶ Bases:
enum.Enum
子市场(交易所)类型定义
-
CBOT
= 'CBOT'¶
-
COIN
= 'COIN'¶
-
DCE
= 'DCE'¶
-
HK
= 'hk'¶
-
LME
= 'LME'¶
-
NYMEX
= 'NYMEX'¶
-
SH
= 'sh'¶
-
SHFE
= 'SHFE'¶
-
SZ
= 'sz'¶
-
US_N
= 'NYSE'¶
-
US_OQ
= 'NASDAQ'¶
-
US_OTC
= 'OTCMKTS'¶
-
US_PINK
= 'PINK'¶
-
US_PREIPO
= 'PREIPO'¶
-
ZZCE
= 'ZZCE'¶
-
-
class
abupy.CoreBu.
EMarketDataSplitMode
[源代码]¶ Bases:
enum.Enum
ABuSymbolPd中请求参数,关于是否需要与基准数据对齐切割
-
E_DATA_SPLIT_SE
= 1¶
-
E_DATA_SPLIT_UNDO
= 0¶
-
-
class
abupy.CoreBu.
EMarketDataFetchMode
[源代码]¶ Bases:
enum.Enum
金融时间数据获取模式
-
E_DATA_FETCH_FORCE_LOCAL
= 1¶
-
E_DATA_FETCH_FORCE_NET
= 2¶
-
E_DATA_FETCH_NORMAL
= 0¶
-
-
class
abupy.CoreBu.
EDataCacheType
[源代码]¶ Bases:
enum.Enum
金融时间序列数据缓存类型
-
E_DATA_CACHE_CSV
= 1¶
-
E_DATA_CACHE_HDF5
= 0¶
-
E_DATA_CACHE_MONGODB
= 2¶
-
-
abupy.CoreBu.
train_test_split
(*arrays, **options)[源代码]¶ Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y))
and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.Read more in the User Guide.
- *arrays : sequence of indexables with same length / shape[0]
- Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
- test_size : float, int, or None (default is None)
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.
- train_size : float, int, or None (default is None)
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
- random_state : int or RandomState
- Pseudo-random number generator state used for random sampling.
- stratify : array-like or None (default is None)
- If not None, data is split in a stratified fashion, using this as the class labels.
- splitting : list, length=2 * len(arrays)
List containing train-test split of inputs.
0.16 新版功能: If the input is sparse, the output will be a
scipy.sparse.csr_matrix
. Else, output type is the same as the input type.
>>> import numpy as np >>> from sklearn.model_selection import train_test_split >>> X, y = np.arange(10).reshape((5, 2)), range(5) >>> X array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) >>> list(y) [0, 1, 2, 3, 4]
>>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, test_size=0.33, random_state=42) ... >>> X_train array([[4, 5], [0, 1], [6, 7]]) >>> y_train [2, 0, 3] >>> X_test array([[2, 3], [8, 9]]) >>> y_test [1, 4]
-
class
abupy.CoreBu.
KFold
(n, n_folds=3, shuffle=False, random_state=None)[源代码]¶ Bases:
object
sklearn将KFold移动到了model_selection,而且改变了用法,暂时不需要 这么复杂的功能,将sklearn中关键代码简单实现,不from sklearn.model_selection import KFold
-
abupy.CoreBu.
learning_curve
(estimator, X, y, groups=None, train_sizes=array([ 0.1, 0.325, 0.55, 0.775, 1. ]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', verbose=0)[源代码]¶ Learning curve.
Determines cross-validated training and test scores for different training set sizes.
A cross-validation generator splits the whole dataset k times in training and test data. Subsets of the training set with varying sizes will be used to train the estimator and a score for each training subset size and the test set will be computed. Afterwards, the scores will be averaged over all k runs for each training subset size.
Read more in the User Guide.
- estimator : object type that implements the “fit” and “predict” methods
- An object of that type which is cloned for each validation.
- X : array-like, shape (n_samples, n_features)
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape (n_samples) or (n_samples, n_features), optional
- Target relative to X for classification or regression; None for unsupervised learning.
- groups : array-like, with shape (n_samples,), optional
- Group labels for the samples used while splitting the dataset into train/test set.
- train_sizes : array-like, shape (n_ticks,), dtype float or int
- Relative or absolute numbers of training examples that will be used to generate the learning curve. If the dtype is float, it is regarded as a fraction of the maximum size of the training set (that is determined by the selected validation method), i.e. it has to be within (0, 1]. Otherwise it is interpreted as absolute sizes of the training sets. Note that for classification the number of samples usually have to be big enough to contain at least one sample from each class. (default: np.linspace(0.1, 1.0, 5))
- cv : int, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 3-fold cross validation,
- integer, to specify the number of folds in a (Stratified)KFold,
- An object to be used as a cross-validation generator.
- An iterable yielding train, test splits.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,StratifiedKFold
is used. In all other cases,KFold
is used.Refer User Guide for the various cross-validation strategies that can be used here.
- scoring : string, callable or None, optional, default: None
- A string (see model evaluation documentation) or
a scorer callable object / function with signature
scorer(estimator, X, y)
. - exploit_incremental_learning : boolean, optional, default: False
- If the estimator supports incremental learning, this will be used to speed up fitting for different training set sizes.
- n_jobs : integer, optional
- Number of jobs to run in parallel (default 1).
- pre_dispatch : integer or string, optional
- Number of predispatched jobs for parallel execution (default is all). The option can reduce the allocated memory. The string can be an expression like ‘2*n_jobs’.
- verbose : integer, optional
- Controls the verbosity: the higher, the more messages.
- train_sizes_abs : array, shape = (n_unique_ticks,), dtype int
- Numbers of training examples that has been used to generate the learning curve. Note that the number of ticks might be less than n_ticks because duplicate entries will be removed.
- train_scores : array, shape (n_ticks, n_cv_folds)
- Scores on training sets.
- test_scores : array, shape (n_ticks, n_cv_folds)
- Scores on test set.
See examples/model_selection/plot_learning_curve.py
-
abupy.CoreBu.
cross_val_score
(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs')[源代码]¶ Evaluate a score by cross-validation
Read more in the User Guide.
- estimator : estimator object implementing ‘fit’
- The object to use to fit the data.
- X : array-like
- The data to fit. Can be, for example a list, or an array at least 2d.
- y : array-like, optional, default: None
- The target variable to try to predict in the case of supervised learning.
- groups : array-like, with shape (n_samples,), optional
- Group labels for the samples used while splitting the dataset into train/test set.
- scoring : string, callable or None, optional, default: None
- A string (see model evaluation documentation) or
a scorer callable object / function with signature
scorer(estimator, X, y)
. - cv : int, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 3-fold cross validation,
- integer, to specify the number of folds in a (Stratified)KFold,
- An object to be used as a cross-validation generator.
- An iterable yielding train, test splits.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,StratifiedKFold
is used. In all other cases,KFold
is used.Refer User Guide for the various cross-validation strategies that can be used here.
- n_jobs : integer, optional
- The number of CPUs to use to do the computation. -1 means ‘all CPUs’.
- verbose : integer, optional
- The verbosity level.
- fit_params : dict, optional
- Parameters to pass to the fit method of the estimator.
- pre_dispatch : int, or string, optional
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
- An int, giving the exact number of total jobs that are spawned
- A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
- scores : array of float, shape=(len(list(cv)),)
- Array of scores of the estimator for each run of the cross validation.
>>> from sklearn import datasets, linear_model >>> from sklearn.model_selection import cross_val_score >>> diabetes = datasets.load_diabetes() >>> X = diabetes.data[:150] >>> y = diabetes.target[:150] >>> lasso = linear_model.Lasso() >>> print(cross_val_score(lasso, X, y)) [ 0.33150734 0.08022311 0.03531764]
sklearn.metrics.make_scorer()
:- Make a scorer from a performance metric or loss function.
-
class
abupy.CoreBu.
GridSearchCV
(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)[源代码]¶ Bases:
sklearn.model_selection._search.BaseSearchCV
Exhaustive search over specified parameter values for an estimator.
Important members are fit, predict.
GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.
The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
Read more in the User Guide.
- estimator : estimator object.
- This is assumed to implement the scikit-learn estimator interface.
Either estimator needs to provide a
score
function, orscoring
must be passed. - param_grid : dict or list of dictionaries
- Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
- scoring : string, callable or None, default=None
- A string (see model evaluation documentation) or
a scorer callable object / function with signature
scorer(estimator, X, y)
. IfNone
, thescore
method of the estimator is used. - fit_params : dict, optional
- Parameters to pass to the fit method.
- n_jobs : int, default=1
- Number of jobs to run in parallel.
- pre_dispatch : int, or string, optional
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
- An int, giving the exact number of total jobs that are spawned
- A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
- iid : boolean, default=True
- If True, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds.
- cv : int, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 3-fold cross validation,
- integer, to specify the number of folds in a (Stratified)KFold,
- An object to be used as a cross-validation generator.
- An iterable yielding train, test splits.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,StratifiedKFold
is used. In all other cases,KFold
is used.Refer User Guide for the various cross-validation strategies that can be used here.
- refit : boolean, default=True
- Refit the best estimator with the entire dataset. If “False”, it is impossible to make predictions using this GridSearchCV instance after fitting.
- verbose : integer
- Controls the verbosity: the higher, the more messages.
- error_score : ‘raise’ (default) or numeric
- Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.
- return_train_score : boolean, default=True
- If
'False'
, thecv_results_
attribute will not include training scores.
>>> from sklearn import svm, datasets >>> from sklearn.model_selection import GridSearchCV >>> iris = datasets.load_iris() >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} >>> svr = svm.SVC() >>> clf = GridSearchCV(svr, parameters) >>> clf.fit(iris.data, iris.target) ... GridSearchCV(cv=None, error_score=..., estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=..., decision_function_shape=None, degree=..., gamma=..., kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=..., verbose=False), fit_params={}, iid=..., n_jobs=1, param_grid=..., pre_dispatch=..., refit=..., return_train_score=..., scoring=..., verbose=...) >>> sorted(clf.cv_results_.keys()) ... ['mean_fit_time', 'mean_score_time', 'mean_test_score',... 'mean_train_score', 'param_C', 'param_kernel', 'params',... 'rank_test_score', 'split0_test_score',... 'split0_train_score', 'split1_test_score', 'split1_train_score',... 'split2_test_score', 'split2_train_score',... 'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]
- cv_results_ : dict of numpy (masked) ndarrays
A dict with keys as column headers and values as columns, that can be imported into a pandas
DataFrame
.For instance the below given table
param_kernel param_gamma param_degree split0_test_score ... rank_.... ‘poly’ – 2 0.8 ... 2 ‘poly’ – 3 0.7 ... 4 ‘rbf’ 0.1 – 0.8 ... 3 ‘rbf’ 0.2 – 0.9 ... 1 will be represented by a
cv_results_
dict of:{ 'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'], mask = [False False False False]...) 'param_gamma': masked_array(data = [-- -- 0.1 0.2], mask = [ True True False False]...), 'param_degree': masked_array(data = [2.0 3.0 -- --], mask = [False False True True]...), 'split0_test_score' : [0.8, 0.7, 0.8, 0.9], 'split1_test_score' : [0.82, 0.5, 0.7, 0.78], 'mean_test_score' : [0.81, 0.60, 0.75, 0.82], 'std_test_score' : [0.02, 0.01, 0.03, 0.03], 'rank_test_score' : [2, 4, 3, 1], 'split0_train_score' : [0.8, 0.9, 0.7], 'split1_train_score' : [0.82, 0.5, 0.7], 'mean_train_score' : [0.81, 0.7, 0.7], 'std_train_score' : [0.03, 0.03, 0.04], 'mean_fit_time' : [0.73, 0.63, 0.43, 0.49], 'std_fit_time' : [0.01, 0.02, 0.01, 0.01], 'mean_score_time' : [0.007, 0.06, 0.04, 0.04], 'std_score_time' : [0.001, 0.002, 0.003, 0.005], 'params' : [{'kernel': 'poly', 'degree': 2}, ...], }
NOTE that the key
'params'
is used to store a list of parameter settings dict for all the parameter candidates.The
mean_fit_time
,std_fit_time
,mean_score_time
andstd_score_time
are all in seconds.- best_estimator_ : estimator
- Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.
- best_score_ : float
- Score of best_estimator on the left out data.
- best_params_ : dict
- Parameter setting that gave the best results on the hold out data.
- best_index_ : int
The index (of the
cv_results_
arrays) which corresponds to the best candidate parameter setting.The dict at
search.cv_results_['params'][search.best_index_]
gives the parameter setting for the best model, that gives the highest mean score (search.best_score_
).- scorer_ : function
- Scorer function used on the held out data to choose the best parameters for the model.
- n_splits_ : int
- The number of cross-validation splits (folds/iterations).
The parameters selected are those that maximize the score of the left out data, unless an explicit score is passed in which case it is used instead.
If n_jobs was set to a value higher than one, the data is copied for each point in the grid (and not n_jobs times). This is done for efficiency reasons if individual jobs take very little time, but may raise errors if the dataset is large and not enough memory is available. A workaround in this case is to set pre_dispatch. Then, the memory is copied only pre_dispatch many times. A reasonable value for pre_dispatch is 2 * n_jobs.
ParameterGrid
:- generates all the combinations of a hyperparameter grid.
sklearn.model_selection.train_test_split()
:- utility function to split the data into a development set usable for fitting a GridSearchCV instance and an evaluation set for its final evaluation.
sklearn.metrics.make_scorer()
:- Make a scorer from a performance metric or loss function.
-
fit
(X, y=None, groups=None)[源代码]¶ Run fit with all sets of parameters.
- X : array-like, shape = [n_samples, n_features]
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples] or [n_samples, n_output], optional
- Target relative to X for classification or regression; None for unsupervised learning.
- groups : array-like, with shape (n_samples,), optional
- Group labels for the samples used while splitting the dataset into train/test set.
-
abupy.CoreBu.
signature
(obj, *, follow_wrapped=True)[源代码]¶ Get a signature object for the passed callable.
-
class
abupy.CoreBu.
Parameter
(name, kind, *, default, annotation)[源代码]¶ Bases:
object
Represents a parameter in a function signature.
Has the following public attributes:
- name : str
- The name of the parameter as a string.
- default : object
- The default value for the parameter if specified. If the parameter has no default value, this attribute is set to Parameter.empty.
- annotation
- The annotation for the parameter if specified. If the parameter has no annotation, this attribute is set to Parameter.empty.
- kind : str
- Describes how argument values are bound to the parameter. Possible values: Parameter.POSITIONAL_ONLY, Parameter.POSITIONAL_OR_KEYWORD, Parameter.VAR_POSITIONAL, Parameter.KEYWORD_ONLY, Parameter.VAR_KEYWORD.
-
KEYWORD_ONLY
= 3¶
-
POSITIONAL_ONLY
= 0¶
-
POSITIONAL_OR_KEYWORD
= 1¶
-
VAR_KEYWORD
= 4¶
-
VAR_POSITIONAL
= 2¶
-
annotation
¶
-
default
¶
-
empty
¶ _empty
的别名
-
kind
¶
-
name
¶
-
class
abupy.CoreBu.
ThreadPoolExecutor
(max_workers=None, thread_name_prefix='')[源代码]¶ Bases:
concurrent.futures._base.Executor
-
shutdown
(wait=True)[源代码]¶ Clean-up the resources associated with the Executor.
It is safe to call this method several times. Otherwise, no other methods can be called after this one.
- Args:
- wait: If True then shutdown will not return until all running
- futures have finished executing and the resources used by the executor have been reclaimed.
-
-
class
abupy.CoreBu.
zip
¶ Bases:
object
zip(iter1 [,iter2 [...]]) –> zip object
Return a zip object whose .__next__() method returns a tuple where the i-th element comes from the i-th iterable argument. The .__next__() method continues until the shortest iterable in the argument sequence is exhausted and then it raises StopIteration.
-
class
abupy.CoreBu.
range
(stop) → range object¶ Bases:
object
range(start, stop[, step]) -> range object
Return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1. start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3. These are exactly the valid indices for a list of 4 elements. When step is given, it specifies the increment (or decrement).
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return index of value.¶ Raise ValueError if the value is not present.
-
start
¶
-
step
¶
-
stop
¶
-
-
abupy.CoreBu.
reduce
(function, sequence[, initial]) → value¶ Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.
-
class
abupy.CoreBu.
map
¶ Bases:
object
map(func, *iterables) –> map object
Make an iterator that computes the function using arguments from each of the iterables. Stops when the shortest iterable is exhausted.
-
class
abupy.CoreBu.
filter
¶ Bases:
object
filter(function or None, iterable) –> filter object
Return an iterator yielding those items of iterable for which function(item) is true. If function is None, return the items that are true.
-
abupy.CoreBu.
Pickler
¶ _Pickler
的别名
-
abupy.CoreBu.
Unpickler
¶ _Unpickler
的别名
-
class
abupy.CoreBu.
partial
[源代码]¶ Bases:
object
partial(func, *args, **keywords) - new function with partial application of the given arguments and keywords.
-
args
¶ tuple of arguments to future partial calls
-
func
¶ function object to use in future partial calls
-
keywords
¶ dictionary of keyword arguments to future partial calls
-