abupy.CoreBu package

Submodules

abupy.CoreBu.ABu module

abupy.CoreBu.ABu.gen_buy_from_chinese(*args, **kwargs)[源代码]
抱歉!由于中文生成策略的方法也需要遵循一定的语法和句式,对于完全不熟悉编程的人可能会产生错误,’
‘造成无谓的经济损失,所以中文自动生成交易策略模块暂时不开放接口以及源代码!
abupy.CoreBu.ABu.load_abu_result_tuple(n_folds, store_type, custom_name=None)[源代码]

读取使用store_abu_result_tuple保存的回测结果,根据n_folds,store_type参数 来定义读取的文件名称,依次读取orders_pd,action_pd,capital,benchmark后构造 AbuResultTuple对象返回,透传参数使用ABuStore.load_abu_result_tuple执行操作

参数:
  • n_folds – 回测执行了几年,只影响读取的文件名
  • store_type – 回测保存类型EStoreAbu类型,只影响读取的文件名
  • custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称
返回:

AbuResultTuple对象

abupy.CoreBu.ABu.run_kl_update(n_folds=2, start=None, end=None, market=None, n_jobs=16, how='thread')[源代码]

推荐在使用abu.run_loop_back()函数进行全市场回测前使用abu.run_kl_update()函数首先将数据进行更新, 在run_kl_update()中它会首选强制使用网络数据进行更新,在更新完毕后,更改数据获取方式为本地缓存 在run_kl_update实现根据EMarketTargetType类型即市场类型,进行全市场金融时间序列数据获取,使用多进 程或者多线程对外执行函数,多任务批量获取时间序列数据。

使用abu.run_kl_update()的好处是将数据更新与策略回测分离,在运行效率及问题排查上都会带来正面的提升

eg:

from abupy import abu,EMarketTargetType # 港股全市场获取 abupy.env.g_market_target = EMarketTargetType.E_MARKET_TARGET_HK # 更新6年的数据 abu.run_kl_update(n_folds=6)

# A股全市场获取 abupy.env.g_market_target = EMarketTargetType.E_MARKET_TARGET_CN # 2013-07-10直到2016-07-26的数据 abu.run_kl_update(start=‘2013-07-10’, end=‘2016-07-26’)

参数:
  • n_folds – 请求几年的历史回测数据int
  • start – 请求的开始日期 str对象, eg: ‘2013-07-10’
  • end – 请求的结束日期 str对象 eg: ‘2016-07-26’
  • market – 需要查询的市场,eg:EMarketTargetType.E_MARKET_TARGET_US
  • n_jobs – 并行的任务数,对于进程代表进程数,线程代表线程数
  • how – process:多进程,thread:多线程,main:单进程单线程
abupy.CoreBu.ABu.run_loop_back(read_cash, buy_factors, sell_factors, stock_picks=None, choice_symbols=None, n_folds=2, start=None, end=None, commission_dict=None, n_process_kl=None, n_process_pick=None)[源代码]

封装执行择时,选股回测。

推荐在使用abu.run_loop_back()函数进行全市场回测前使用abu.run_kl_update()函数首先将数据进行更新, 在run_kl_update()中它会首选强制使用网络数据进行更新,在更新完毕后,更改数据获取方式为本地缓存, 使用abu.run_kl_update()的好处是将数据更新与策略回测分离,在运行效率及问题排查上都会带来正面的提升

参数:
  • read_cash – 初始化资金额度,eg:1000000
  • buy_factors

    回测使用的买入因子策略序列, eg:

    buy_factors = [{‘xd’: 60, ‘class’: AbuFactorBuyBreak},
    {‘xd’: 42, ‘class’: AbuFactorBuyBreak}]
  • sell_factors

    回测使用的卖出因子序列, eg:

    sell_factors = [{‘stop_loss_n’: 0.5, ‘stop_win_n’: 3.0, ‘class’: AbuFactorAtrNStop},
    {‘pre_atr_n’: 1.0, ‘class’: AbuFactorPreAtrNStop}, {‘close_atr_n’: 1.5, ‘class’: AbuFactorCloseAtrNStop},]
  • stock_picks

    回测使用的选股因子序列: eg:

    stock_pickers = [{‘class’: AbuPickRegressAngMinMax,
    ‘threshold_ang_min’: 0.0, ‘reversed’: False},
    {‘class’: AbuPickStockPriceMinMax,
    ‘threshold_price_min’: 50.0, ‘reversed’: False}]
  • choice_symbols
    备选股票池, 默认为None,即使用abupy.env.g_market_target的市场类型进行全市场回测,
    为None的情况下为symbol序列
    eg:
    choice_symbols = [‘usNOAH’, ‘usSFUN’, ‘usBIDU’, ‘usAAPL’, ‘usGOOG’,
    ‘usTSLA’, ‘usWUBA’, ‘usVIPS’]
  • n_folds – int, 回测n_folds年的历史数据
  • start – 回测开始的时间, str对象, eg: ‘2013-07-10’
  • end – 回测结束的时间, str对象 eg: ‘2016-07-26’
  • commission_dict

    透传给AbuCapital,自定义交易手续费的时候时候。 eg:

    def free_commission(trade_cnt, price):
    # 免手续费 return 0
    commission_dict = {‘buy_commission_func’: free_commission,
    ‘sell_commission_func’: free_commission}

    AbuCapital(read_cash, benchmark, user_commission_dict=commission_dict)

  • n_process_kl – 金融时间序列数据收集启动并行的进程数,默认None, 内部根据cpu数量分配
  • n_process_pick – 择时与选股操作启动并行的进程数,默认None, 内部根据cpu数量分配
返回:

(AbuResultTuple对象, AbuKLManager对象)

abupy.CoreBu.ABu.store_abu_result_tuple(abu_result_tuple, n_folds, store_type=None, custom_name=None)[源代码]

保存abu.run_loop_back的回测结果AbuResultTuple对象,根据n_folds,store_type参数 来定义存储的文件名称,透传参数使用ABuStore.store_abu_result_tuple执行操作

参数:
  • abu_result_tuple – AbuResultTuple对象类型
  • n_folds – 回测执行了几年,只影响存贮文件名
  • store_type – 回测保存类型EStoreAbu类型,只影响存贮文件名
  • custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称

abupy.CoreBu.ABuBase module

类基础通用模块

class abupy.CoreBu.ABuBase.AbuParamBase[源代码]

Bases: object

对象基础类,实现对象基本信息打印,调试查看接口

classmethod get_params()[源代码]
to_dict(user=True)[源代码]

for debug show dict

to_series(user=True)[源代码]

for notebook debug show series

class abupy.CoreBu.ABuBase.FreezeAttrMixin[源代码]

Bases: object

冻结对外设置属性混入类,设置抛异常

class abupy.CoreBu.ABuBase.PickleStateMixin[源代码]

Bases: object

混入有本地序列化需求的类

pick_extend_work()[源代码]

混入对象可覆盖pick_extend_work方法,完成对象特有的__getstate__工作

skip_abupy_version = True
unpick_extend_work(state)[源代码]

混入对象可覆盖unpick_extend_work方法,完成对象特有的__setstate__工作

abupy.CoreBu.ABuDeprecated module

Deprecated警告模块

class abupy.CoreBu.ABuDeprecated.AbuDeprecated(tip_info='')[源代码]

Bases: object

支持装饰类或者方法,在使用类或者方法时警告Deprecated信息

abupy.CoreBu.ABuEnv module

全局环境配置模块

class abupy.CoreBu.ABuEnv.EDataCacheType[源代码]

Bases: enum.Enum

金融时间序列数据缓存类型

E_DATA_CACHE_CSV = 1

适合分布式扩展,存贮空间需要大

E_DATA_CACHE_HDF5 = 0

读取及写入最慢 但非固态硬盘写速度还可以,存贮空间需要小

E_DATA_CACHE_MONGODB = 2
class abupy.CoreBu.ABuEnv.EMarketDataFetchMode[源代码]

Bases: enum.Enum

金融时间数据获取模式

E_DATA_FETCH_FORCE_LOCAL = 1

强制从网络获取数据,不管本地数据是否满足

E_DATA_FETCH_FORCE_NET = 2
E_DATA_FETCH_NORMAL = 0

强制从本地获取数据,本地数据不满足的情况下,返回None

class abupy.CoreBu.ABuEnv.EMarketDataSplitMode[源代码]

Bases: enum.Enum

ABuSymbolPd中请求参数,关于是否需要与基准数据对齐切割

E_DATA_SPLIT_SE = 1
E_DATA_SPLIT_UNDO = 0

内部根据start,end取切割data

class abupy.CoreBu.ABuEnv.EMarketSourceType[源代码]

Bases: enum.Enum

数据源,当数据获取不可靠时,可尝试切换数据源,更可连接私有的数据源

E_MARKET_SOURCE_bd = 0

腾讯 a股,美股,港股

E_MARKET_SOURCE_hb_tc = 200
E_MARKET_SOURCE_nt = 2

新浪 美股

E_MARKET_SOURCE_sn_futures = 100

新浪 国际期货

E_MARKET_SOURCE_sn_futures_gb = 101

火币 比特币,莱特币

E_MARKET_SOURCE_sn_us = 3

新浪 国内期货

E_MARKET_SOURCE_tx = 1

网易 a股,美股,港股

class abupy.CoreBu.ABuEnv.EMarketSubType[源代码]

Bases: enum.Enum

子市场(交易所)类型定义

CBOT = 'CBOT'

纽约商品交易所

COIN = 'COIN'
DCE = 'DCE'

郑州商品交易所ZZCE’

HK = 'hk'

上证交易所sh

LME = 'LME'

芝加哥商品交易所

NYMEX = 'NYMEX'

币类子市场COIN’

SH = 'sh'

深圳交易所sz

SHFE = 'SHFE'

伦敦金属交易所

SZ = 'sz'

大连商品交易所DCE’

US_N = 'NYSE'

美股纳斯达克NASDAQ

US_OQ = 'NASDAQ'

美股粉单市场

US_OTC = 'OTCMKTS'

未上市

US_PINK = 'PINK'

美股OTCMKTS

US_PREIPO = 'PREIPO'

港股hk

ZZCE = 'ZZCE'

上海期货交易所SHFE’

class abupy.CoreBu.ABuEnv.EMarketTargetType[源代码]

Bases: enum.Enum

交易品种类型,即市场类型, eg. 美股市场, A股市场, 港股市场, 国内期货市场,

美股期权市场, TC币市场(比特币等
E_MARKET_TARGET_CN = 'hs'

港股市场

E_MARKET_TARGET_FUTURES_CN = 'futures_cn'

国际期货市场

E_MARKET_TARGET_FUTURES_GLOBAL = 'futures_global'

美股期权市场

E_MARKET_TARGET_HK = 'hk'

国内期货市场

E_MARKET_TARGET_OPTIONS_US = 'options_us'

TC币市场(比特币等)

E_MARKET_TARGET_TC = 'tc'
E_MARKET_TARGET_US = 'us'

A股市场

abupy.CoreBu.ABuEnv.disable_example_env_ipython()[源代码]

只为在ipython example 环境中运行与书中一样的数据。,即读取RomDataBu/df_kl.h5下的数据 :return:

abupy.CoreBu.ABuEnv.enable_example_env_ipython()[源代码]

只为在ipython example 环境中运行与书中一样的数据,即读取RomDataBu/df_kl.h5下的数据

初始内置在RomDataBu/df_kl.h5.zip下的数据只有zip压缩包,因为git上面的文件最好不要超过50m, 内置测试数据,包括美股,a股,期货,比特币,港股数据初始化在df_kl_ext.h5.zip中,通过解压zip 之后将测试数据为df_kl.h5 :return:

abupy.CoreBu.ABuEnv.g_data_cache_type = <EDataCacheType.E_DATA_CACHE_CSV: 1>

csv模式下的存储路径

abupy.CoreBu.ABuEnv.g_data_fetch_mode = <EMarketDataFetchMode.E_DATA_FETCH_NORMAL: 0>

是否开启ipython example 环境,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_last_split_test = False

是否开启选股使用上一次切割完成的训练集股票数据,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_last_split_train = False

选股切割训练集股票数据与测试集股票数据切割参数n_folds,默认10

abupy.CoreBu.ABuEnv.g_enable_ml_feature = False

是否开启买入订单前生成k线图快照,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_take_kl_snapshot = False

是否开启选股切割训练集股票数据与测试集股票数据,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_train_test_split = False

是否开启选股使用上一次切割完成的测试集股票数据,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_edge_deg_block = False

是否开启裁判拦截机制: 边裁price,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_edge_price_block = False

是否开启裁判拦截机制: 边裁wave,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_edge_wave_block = False

是否开启裁判拦截机制: 边裁full,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_main_deg_block = False

是否开启裁判拦截机制: 主裁jump,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_main_jump_block = False

是否开启裁判拦截机制: 主裁price,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_main_price_block = False

是否开启裁判拦截机制: 主裁wave,默认关闭False

abupy.CoreBu.ABuEnv.g_enable_ump_main_wave_block = False

是否开启裁判拦截机制: 边裁deg,默认关闭False

abupy.CoreBu.ABuEnv.g_ignore_all_warnings = False

忽略库警告,默认打开

abupy.CoreBu.ABuEnv.g_is_ipython = False

主进程pid,使用并行时由于ABuEnvProcess会拷贝主进程注册了的模块信息,所以可以用g_main_pid来判断是否在主进程

abupy.CoreBu.ABuEnv.g_is_mac_os = True

python版本环境,是否python3

abupy.CoreBu.ABuEnv.g_is_py3 = True

ipython,是否ipython运行环境

abupy.CoreBu.ABuEnv.g_market_source = <EMarketSourceType.E_MARKET_SOURCE_bd: 0>

自定义的私有数据源类,默认None

abupy.CoreBu.ABuEnv.g_market_target = <EMarketTargetType.E_MARKET_TARGET_US: 'us'>

市场中1年交易日,默认250日

abupy.CoreBu.ABuEnv.g_project_cache_dir = '/Users/tu/abu/data/cache'

abu项目数据主文件目录,即项目中的RomDataBu位置

abupy.CoreBu.ABuEnv.g_project_data_dir = '/Users/tu/abu/data'

abu日志文件夹 ~/abu/log

abupy.CoreBu.ABuEnv.g_project_db_dir = '/Users/tu/abu/db'

abu缓存文件夹 ~/abu/cache

abupy.CoreBu.ABuEnv.g_project_kl_df_data_csv = '/Users/tu/abu/data/csv'

是否开启机器学习特征收集, 开启后速度会慢,默认关闭False

abupy.CoreBu.ABuEnv.g_project_kl_df_data_example = '/Users/tu/PycharmProjects/yabee_abu/abupy/RomDataBu/df_kl.h5'

chrome 驱动

abupy.CoreBu.ABuEnv.g_project_log_dir = '/Users/tu/abu/log'

abu数据库文件夹 ~/abu/db

abupy.CoreBu.ABuEnv.g_project_log_info = '/Users/tu/abu/log/info.log'

hdf5做为金融时间序列存储的路径

abupy.CoreBu.ABuEnv.g_project_rom_data_dir = '/Users/tu/PycharmProjects/yabee_abu/abupy/CoreBu/../RomDataBu'

abu日志文件 ~/abu/log/info.log

abupy.CoreBu.ABuEnv.g_project_root = '/Users/tu/abu'

abu数据文件夹 ~/abu/data

abupy.CoreBu.ABuEnv.g_split_tt_n_folds = 10

是否开启裁判拦截机制: 主裁deg,默认关闭False

abupy.CoreBu.ABuEnv.init_logging()[源代码]

logging相关初始化工作,配置log级别,默认写入路径,输出格式

abupy.CoreBu.ABuEnv.root_drive = '/Users/tu'

abu数据缓存主目录文件夹

abupy.CoreBu.ABuEnvProcess module

多任务子进程拷贝跟随主进程设置模块

class abupy.CoreBu.ABuEnvProcess.AbuEnvProcess[源代码]

Bases: object

多任务主进程内存设置拷贝执行者类

copy_process_env()[源代码]

为子进程拷贝主进程中的设置执行,在add_process_env_sig装饰器中调用,外部不应主动使用

register_module()[源代码]

注册需要拷贝内存的模块,不要全局模块注册,否则很多交叉引用,也不要做为类变量存储否则多进程传递pickle时会出错 :return:

abupy.CoreBu.ABuEnvProcess.add_process_env_sig(func)[源代码]

初始化装饰器时给被装饰函数添加env关键字参数,在wrapper中将env对象进行子进程copy 由于要改方法签名,多个装饰器的情况要放在最下面 :param func: :return:

abupy.CoreBu.ABuFixes module

对各个依赖库不同版本,不同系统的规范进行统一以及问题修正模块

class abupy.CoreBu.ABuFixes.KFold(n, n_folds=3, shuffle=False, random_state=None)[源代码]

Bases: object

sklearn将KFold移动到了model_selection,而且改变了用法,暂时不需要 这么复杂的功能,将sklearn中关键代码简单实现,不from sklearn.model_selection import KFold

abupy.CoreBu.ABuFixes.as_bytes(s)[源代码]
abupy.CoreBu.ABuFixes.check_random_state(seed)[源代码]
abupy.CoreBu.ABuFixes.np_version = (1, 12, 1)

sklearn 版本号tuple

abupy.CoreBu.ABuFixes.pd_version = (0, 20, 1)

scipy 版本号tuple

abupy.CoreBu.ABuFixes.skl_version = (0, 18, 1)

pandas 版本号tuple

abupy.CoreBu.ABuFixes.sp_version = (0, 19, 0)

matplotlib 版本号tuple

abupy.CoreBu.ABuParallel module

并行封装模块,主要针对不同平台统一接口规范:

windows 上使用joblib进行长时间的多任务,如超过10小时以上时,在任何最后有系统pop任务 的错误,所以windows上使用ProcessPoolExecutor进行多任务,套上Parallel和delayed保持接口的 通用性及规范统一

class abupy.CoreBu.ABuParallel.Parallel(n_jobs=1, backend='multiprocessing', verbose=0, pre_dispatch='2 * n_jobs', batch_size='auto', temp_folder=None, max_nbytes='1M', mmap_mode='r')[源代码]

Bases: object

封装ProcessPoolExecutor进行并行任务执行操作

abupy.CoreBu.ABuParallel.delayed(function)[源代码]

将function通过functools.wraps及delayed_function进行保留,但不执行 :param function: :return:

abupy.CoreBu.ABuParallel.run_in_subprocess(func, *args, **kwargs)[源代码]

多进程工具函数,不涉及返回值等细节处理时使用 :param func: 被进程委托的函数 :return: 返回multiprocessing进程对象

abupy.CoreBu.ABuParallel.run_in_thread(func, *args, **kwargs)[源代码]

多线程工具函数,不涉及返回值等细节处理时使用 :param func: 被线程委托的函数 :return: 返回Thread线程对象

abupy.CoreBu.ABuPdHelper module

封装pandas中版本兼容问题,保持接口规范情况下,避免警告

abupy.CoreBu.ABuPdHelper.pd_resample(pd_object, rule, *args, **kwargs)[源代码]

对pandas中的resample操作,根据pandas version版本自动选择调用方式 :param pd_object: 可迭代的序列,pd.Series, pd.DataFrame或者只是Iterable :param rule: 具体的resample函数中需要的参数 eg. 21D, 即重采样周期值 :return:

abupy.CoreBu.ABuStore module

针对交易回测结果存储,读取模块

class abupy.CoreBu.ABuStore.AbuResultTuple[源代码]

Bases: abupy.CoreBu.ABuStore.AbuResultTuple

使用abu.run_loop_back返回的nametuple对象:

orders_pd:回测结果生成的交易订单构成的pd.DataFrame对象 action_pd: 回测结果生成的交易行为构成的pd.DataFrame对象 capital: 资金类AbuCapital实例化对象 benchmark: 交易基准对象,AbuBenchmark实例对象

class abupy.CoreBu.ABuStore.EStoreAbu[源代码]

Bases: enum.Enum

保存回测结果的enum类型

E_STORE_CUSTOM_NAME = 5
E_STORE_NORMAL = 0

保存训练集回测,存储文件后缀为train

E_STORE_TEST = 2

保存测试集交易使用主裁ump进行回测,存储文件后缀为test_ump

E_STORE_TEST_UMP = 3

保存测试集交易使用主裁+边裁ump进行回测,存储文件后缀为test_ump_with_edge

E_STORE_TEST_UMP_WITH_EDGE = 4

保存测回测,存储文件后缀为自定义字符串

E_STORE_TRAIN = 1

保存测试集交易回测,存储文件后缀为test

abupy.CoreBu.ABuStore.load_abu_result_tuple(n_folds, store_type, custom_name=None)[源代码]

读取使用store_abu_result_tuple保存的回测结果,根据n_folds,store_type参数 来定义读取的文件名称,依次读取orders_pd,action_pd,capital,benchmark后构造 AbuResultTuple对象返回

参数:
  • n_folds – 回测执行了几年,只影响读取的文件名
  • store_type – 回测保存类型EStoreAbu类型,只影响读取的文件名
  • custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称
返回:

AbuResultTuple对象

abupy.CoreBu.ABuStore.store_abu_result_tuple(abu_result_tuple, n_folds, store_type=None, custom_name=None)[源代码]

保存abu.run_loop_back的回测结果AbuResultTuple对象,根据n_folds,store_type参数 来定义存储的文件名称

参数:
  • abu_result_tuple – AbuResultTuple对象类型
  • n_folds – 回测执行了几年,只影响存贮文件名
  • store_type – 回测保存类型EStoreAbu类型,只影响存贮文件名
  • custom_name – 如果store_type=EStoreAbu.E_STORE_CUSTOM_NAME时需要的自定义文件名称

Module contents

class abupy.CoreBu.AbuResultTuple[源代码]

Bases: abupy.CoreBu.ABuStore.AbuResultTuple

使用abu.run_loop_back返回的nametuple对象:

orders_pd:回测结果生成的交易订单构成的pd.DataFrame对象 action_pd: 回测结果生成的交易行为构成的pd.DataFrame对象 capital: 资金类AbuCapital实例化对象 benchmark: 交易基准对象,AbuBenchmark实例对象

class abupy.CoreBu.EStoreAbu[源代码]

Bases: enum.Enum

保存回测结果的enum类型

E_STORE_CUSTOM_NAME = 5
E_STORE_NORMAL = 0
E_STORE_TEST = 2
E_STORE_TEST_UMP = 3
E_STORE_TEST_UMP_WITH_EDGE = 4
E_STORE_TRAIN = 1
class abupy.CoreBu.AbuParamBase[源代码]

Bases: object

对象基础类,实现对象基本信息打印,调试查看接口

classmethod get_params()[源代码]
to_dict(user=True)[源代码]

for debug show dict

to_series(user=True)[源代码]

for notebook debug show series

class abupy.CoreBu.FreezeAttrMixin[源代码]

Bases: object

冻结对外设置属性混入类,设置抛异常

class abupy.CoreBu.PickleStateMixin[源代码]

Bases: object

混入有本地序列化需求的类

pick_extend_work()[源代码]

混入对象可覆盖pick_extend_work方法,完成对象特有的__getstate__工作

skip_abupy_version = True
unpick_extend_work(state)[源代码]

混入对象可覆盖unpick_extend_work方法,完成对象特有的__setstate__工作

class abupy.CoreBu.Parallel(n_jobs=1, backend='multiprocessing', verbose=0, pre_dispatch='2 * n_jobs', batch_size='auto', temp_folder=None, max_nbytes='1M', mmap_mode='r')[源代码]

Bases: object

封装ProcessPoolExecutor进行并行任务执行操作

abupy.CoreBu.delayed(function)[源代码]

将function通过functools.wraps及delayed_function进行保留,但不执行 :param function: :return:

class abupy.CoreBu.EMarketSourceType[源代码]

Bases: enum.Enum

数据源,当数据获取不可靠时,可尝试切换数据源,更可连接私有的数据源

E_MARKET_SOURCE_bd = 0
E_MARKET_SOURCE_hb_tc = 200
E_MARKET_SOURCE_nt = 2
E_MARKET_SOURCE_sn_futures = 100
E_MARKET_SOURCE_sn_futures_gb = 101
E_MARKET_SOURCE_sn_us = 3
E_MARKET_SOURCE_tx = 1
class abupy.CoreBu.EMarketTargetType[源代码]

Bases: enum.Enum

交易品种类型,即市场类型, eg. 美股市场, A股市场, 港股市场, 国内期货市场,

美股期权市场, TC币市场(比特币等
E_MARKET_TARGET_CN = 'hs'
E_MARKET_TARGET_FUTURES_CN = 'futures_cn'
E_MARKET_TARGET_FUTURES_GLOBAL = 'futures_global'
E_MARKET_TARGET_HK = 'hk'
E_MARKET_TARGET_OPTIONS_US = 'options_us'
E_MARKET_TARGET_TC = 'tc'
E_MARKET_TARGET_US = 'us'
class abupy.CoreBu.EMarketSubType[源代码]

Bases: enum.Enum

子市场(交易所)类型定义

CBOT = 'CBOT'
COIN = 'COIN'
DCE = 'DCE'
HK = 'hk'
LME = 'LME'
NYMEX = 'NYMEX'
SH = 'sh'
SHFE = 'SHFE'
SZ = 'sz'
US_N = 'NYSE'
US_OQ = 'NASDAQ'
US_OTC = 'OTCMKTS'
US_PINK = 'PINK'
US_PREIPO = 'PREIPO'
ZZCE = 'ZZCE'
class abupy.CoreBu.EMarketDataSplitMode[源代码]

Bases: enum.Enum

ABuSymbolPd中请求参数,关于是否需要与基准数据对齐切割

E_DATA_SPLIT_SE = 1
E_DATA_SPLIT_UNDO = 0
class abupy.CoreBu.EMarketDataFetchMode[源代码]

Bases: enum.Enum

金融时间数据获取模式

E_DATA_FETCH_FORCE_LOCAL = 1
E_DATA_FETCH_FORCE_NET = 2
E_DATA_FETCH_NORMAL = 0
class abupy.CoreBu.EDataCacheType[源代码]

Bases: enum.Enum

金融时间序列数据缓存类型

E_DATA_CACHE_CSV = 1
E_DATA_CACHE_HDF5 = 0
E_DATA_CACHE_MONGODB = 2
abupy.CoreBu.train_test_split(*arrays, **options)[源代码]

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

Read more in the User Guide.

*arrays : sequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
test_size : float, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.
train_size : float, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
random_state : int or RandomState
Pseudo-random number generator state used for random sampling.
stratify : array-like or None (default is None)
If not None, data is split in a stratified fashion, using this as the class labels.
splitting : list, length=2 * len(arrays)

List containing train-test split of inputs.

0.16 新版功能: If the input is sparse, the output will be a scipy.sparse.csr_matrix. Else, output type is the same as the input type.

>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)
...
>>> X_train
array([[4, 5],
       [0, 1],
       [6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
       [8, 9]])
>>> y_test
[1, 4]
class abupy.CoreBu.KFold(n, n_folds=3, shuffle=False, random_state=None)[源代码]

Bases: object

sklearn将KFold移动到了model_selection,而且改变了用法,暂时不需要 这么复杂的功能,将sklearn中关键代码简单实现,不from sklearn.model_selection import KFold

abupy.CoreBu.learning_curve(estimator, X, y, groups=None, train_sizes=array([ 0.1, 0.325, 0.55, 0.775, 1. ]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', verbose=0)[源代码]

Learning curve.

Determines cross-validated training and test scores for different training set sizes.

A cross-validation generator splits the whole dataset k times in training and test data. Subsets of the training set with varying sizes will be used to train the estimator and a score for each training subset size and the test set will be computed. Afterwards, the scores will be averaged over all k runs for each training subset size.

Read more in the User Guide.

estimator : object type that implements the “fit” and “predict” methods
An object of that type which is cloned for each validation.
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape (n_samples) or (n_samples, n_features), optional
Target relative to X for classification or regression; None for unsupervised learning.
groups : array-like, with shape (n_samples,), optional
Group labels for the samples used while splitting the dataset into train/test set.
train_sizes : array-like, shape (n_ticks,), dtype float or int
Relative or absolute numbers of training examples that will be used to generate the learning curve. If the dtype is float, it is regarded as a fraction of the maximum size of the training set (that is determined by the selected validation method), i.e. it has to be within (0, 1]. Otherwise it is interpreted as absolute sizes of the training sets. Note that for classification the number of samples usually have to be big enough to contain at least one sample from each class. (default: np.linspace(0.1, 1.0, 5))
cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross validation,
  • integer, to specify the number of folds in a (Stratified)KFold,
  • An object to be used as a cross-validation generator.
  • An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

scoring : string, callable or None, optional, default: None
A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).
exploit_incremental_learning : boolean, optional, default: False
If the estimator supports incremental learning, this will be used to speed up fitting for different training set sizes.
n_jobs : integer, optional
Number of jobs to run in parallel (default 1).
pre_dispatch : integer or string, optional
Number of predispatched jobs for parallel execution (default is all). The option can reduce the allocated memory. The string can be an expression like ‘2*n_jobs’.
verbose : integer, optional
Controls the verbosity: the higher, the more messages.
train_sizes_abs : array, shape = (n_unique_ticks,), dtype int
Numbers of training examples that has been used to generate the learning curve. Note that the number of ticks might be less than n_ticks because duplicate entries will be removed.
train_scores : array, shape (n_ticks, n_cv_folds)
Scores on training sets.
test_scores : array, shape (n_ticks, n_cv_folds)
Scores on test set.

See examples/model_selection/plot_learning_curve.py

abupy.CoreBu.cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs')[源代码]

Evaluate a score by cross-validation

Read more in the User Guide.

estimator : estimator object implementing ‘fit’
The object to use to fit the data.
X : array-like
The data to fit. Can be, for example a list, or an array at least 2d.
y : array-like, optional, default: None
The target variable to try to predict in the case of supervised learning.
groups : array-like, with shape (n_samples,), optional
Group labels for the samples used while splitting the dataset into train/test set.
scoring : string, callable or None, optional, default: None
A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).
cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross validation,
  • integer, to specify the number of folds in a (Stratified)KFold,
  • An object to be used as a cross-validation generator.
  • An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

n_jobs : integer, optional
The number of CPUs to use to do the computation. -1 means ‘all CPUs’.
verbose : integer, optional
The verbosity level.
fit_params : dict, optional
Parameters to pass to the fit method of the estimator.
pre_dispatch : int, or string, optional

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

  • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
  • An int, giving the exact number of total jobs that are spawned
  • A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
scores : array of float, shape=(len(list(cv)),)
Array of scores of the estimator for each run of the cross validation.
>>> from sklearn import datasets, linear_model
>>> from sklearn.model_selection import cross_val_score
>>> diabetes = datasets.load_diabetes()
>>> X = diabetes.data[:150]
>>> y = diabetes.target[:150]
>>> lasso = linear_model.Lasso()
>>> print(cross_val_score(lasso, X, y))  
[ 0.33150734  0.08022311  0.03531764]
sklearn.metrics.make_scorer():
Make a scorer from a performance metric or loss function.
class abupy.CoreBu.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)[源代码]

Bases: sklearn.model_selection._search.BaseSearchCV

Exhaustive search over specified parameter values for an estimator.

Important members are fit, predict.

GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

Read more in the User Guide.

estimator : estimator object.
This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.
param_grid : dict or list of dictionaries
Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
scoring : string, callable or None, default=None
A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). If None, the score method of the estimator is used.
fit_params : dict, optional
Parameters to pass to the fit method.
n_jobs : int, default=1
Number of jobs to run in parallel.
pre_dispatch : int, or string, optional

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

  • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
  • An int, giving the exact number of total jobs that are spawned
  • A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
iid : boolean, default=True
If True, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds.
cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross validation,
  • integer, to specify the number of folds in a (Stratified)KFold,
  • An object to be used as a cross-validation generator.
  • An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

refit : boolean, default=True
Refit the best estimator with the entire dataset. If “False”, it is impossible to make predictions using this GridSearchCV instance after fitting.
verbose : integer
Controls the verbosity: the higher, the more messages.
error_score : ‘raise’ (default) or numeric
Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.
return_train_score : boolean, default=True
If 'False', the cv_results_ attribute will not include training scores.
>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import GridSearchCV
>>> iris = datasets.load_iris()
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svr = svm.SVC()
>>> clf = GridSearchCV(svr, parameters)
>>> clf.fit(iris.data, iris.target)
...                             
GridSearchCV(cv=None, error_score=...,
       estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
                     decision_function_shape=None, degree=..., gamma=...,
                     kernel='rbf', max_iter=-1, probability=False,
                     random_state=None, shrinking=True, tol=...,
                     verbose=False),
       fit_params={}, iid=..., n_jobs=1,
       param_grid=..., pre_dispatch=..., refit=..., return_train_score=...,
       scoring=..., verbose=...)
>>> sorted(clf.cv_results_.keys())
...                             
['mean_fit_time', 'mean_score_time', 'mean_test_score',...
 'mean_train_score', 'param_C', 'param_kernel', 'params',...
 'rank_test_score', 'split0_test_score',...
 'split0_train_score', 'split1_test_score', 'split1_train_score',...
 'split2_test_score', 'split2_train_score',...
 'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]
cv_results_ : dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

For instance the below given table

param_kernel param_gamma param_degree split0_test_score ... rank_....
‘poly’ 2 0.8 ... 2
‘poly’ 3 0.7 ... 4
‘rbf’ 0.1 0.8 ... 3
‘rbf’ 0.2 0.9 ... 1

will be represented by a cv_results_ dict of:

{
'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
                             mask = [False False False False]...)
'param_gamma': masked_array(data = [-- -- 0.1 0.2],
                            mask = [ True  True False False]...),
'param_degree': masked_array(data = [2.0 3.0 -- --],
                             mask = [False False  True  True]...),
'split0_test_score'  : [0.8, 0.7, 0.8, 0.9],
'split1_test_score'  : [0.82, 0.5, 0.7, 0.78],
'mean_test_score'    : [0.81, 0.60, 0.75, 0.82],
'std_test_score'     : [0.02, 0.01, 0.03, 0.03],
'rank_test_score'    : [2, 4, 3, 1],
'split0_train_score' : [0.8, 0.9, 0.7],
'split1_train_score' : [0.82, 0.5, 0.7],
'mean_train_score'   : [0.81, 0.7, 0.7],
'std_train_score'    : [0.03, 0.03, 0.04],
'mean_fit_time'      : [0.73, 0.63, 0.43, 0.49],
'std_fit_time'       : [0.01, 0.02, 0.01, 0.01],
'mean_score_time'    : [0.007, 0.06, 0.04, 0.04],
'std_score_time'     : [0.001, 0.002, 0.003, 0.005],
'params'             : [{'kernel': 'poly', 'degree': 2}, ...],
}

NOTE that the key 'params' is used to store a list of parameter settings dict for all the parameter candidates.

The mean_fit_time, std_fit_time, mean_score_time and std_score_time are all in seconds.

best_estimator_ : estimator
Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.
best_score_ : float
Score of best_estimator on the left out data.
best_params_ : dict
Parameter setting that gave the best results on the hold out data.
best_index_ : int

The index (of the cv_results_ arrays) which corresponds to the best candidate parameter setting.

The dict at search.cv_results_['params'][search.best_index_] gives the parameter setting for the best model, that gives the highest mean score (search.best_score_).

scorer_ : function
Scorer function used on the held out data to choose the best parameters for the model.
n_splits_ : int
The number of cross-validation splits (folds/iterations).

The parameters selected are those that maximize the score of the left out data, unless an explicit score is passed in which case it is used instead.

If n_jobs was set to a value higher than one, the data is copied for each point in the grid (and not n_jobs times). This is done for efficiency reasons if individual jobs take very little time, but may raise errors if the dataset is large and not enough memory is available. A workaround in this case is to set pre_dispatch. Then, the memory is copied only pre_dispatch many times. A reasonable value for pre_dispatch is 2 * n_jobs.

ParameterGrid:
generates all the combinations of a hyperparameter grid.
sklearn.model_selection.train_test_split():
utility function to split the data into a development set usable for fitting a GridSearchCV instance and an evaluation set for its final evaluation.
sklearn.metrics.make_scorer():
Make a scorer from a performance metric or loss function.
fit(X, y=None, groups=None)[源代码]

Run fit with all sets of parameters.

X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples] or [n_samples, n_output], optional
Target relative to X for classification or regression; None for unsupervised learning.
groups : array-like, with shape (n_samples,), optional
Group labels for the samples used while splitting the dataset into train/test set.
abupy.CoreBu.signature(obj, *, follow_wrapped=True)[源代码]

Get a signature object for the passed callable.

class abupy.CoreBu.Parameter(name, kind, *, default, annotation)[源代码]

Bases: object

Represents a parameter in a function signature.

Has the following public attributes:

  • name : str
    The name of the parameter as a string.
  • default : object
    The default value for the parameter if specified. If the parameter has no default value, this attribute is set to Parameter.empty.
  • annotation
    The annotation for the parameter if specified. If the parameter has no annotation, this attribute is set to Parameter.empty.
  • kind : str
    Describes how argument values are bound to the parameter. Possible values: Parameter.POSITIONAL_ONLY, Parameter.POSITIONAL_OR_KEYWORD, Parameter.VAR_POSITIONAL, Parameter.KEYWORD_ONLY, Parameter.VAR_KEYWORD.
KEYWORD_ONLY = 3
POSITIONAL_ONLY = 0
POSITIONAL_OR_KEYWORD = 1
VAR_KEYWORD = 4
VAR_POSITIONAL = 2
annotation
default
empty

_empty 的别名

kind
name
replace(*, name=<class 'inspect._void'>, kind=<class 'inspect._void'>, annotation=<class 'inspect._void'>, default=<class 'inspect._void'>)[源代码]

Creates a customized copy of the Parameter.

class abupy.CoreBu.ThreadPoolExecutor(max_workers=None, thread_name_prefix='')[源代码]

Bases: concurrent.futures._base.Executor

shutdown(wait=True)[源代码]

Clean-up the resources associated with the Executor.

It is safe to call this method several times. Otherwise, no other methods can be called after this one.

Args:
wait: If True then shutdown will not return until all running
futures have finished executing and the resources used by the executor have been reclaimed.
submit(fn, *args, **kwargs)[源代码]

Submits a callable to be executed with the given arguments.

Schedules the callable to be executed as fn(*args, **kwargs) and returns a Future instance representing the execution of the callable.

Returns:
A Future representing the given call.
class abupy.CoreBu.zip

Bases: object

zip(iter1 [,iter2 [...]]) –> zip object

Return a zip object whose .__next__() method returns a tuple where the i-th element comes from the i-th iterable argument. The .__next__() method continues until the shortest iterable in the argument sequence is exhausted and then it raises StopIteration.

abupy.CoreBu.xrange

range 的别名

class abupy.CoreBu.range(stop) → range object

Bases: object

range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1. start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3. These are exactly the valid indices for a list of 4 elements. When step is given, it specifies the increment (or decrement).

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return index of value.

Raise ValueError if the value is not present.

start
step
stop
abupy.CoreBu.reduce(function, sequence[, initial]) → value

Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.

class abupy.CoreBu.map

Bases: object

map(func, *iterables) –> map object

Make an iterator that computes the function using arguments from each of the iterables. Stops when the shortest iterable is exhausted.

class abupy.CoreBu.filter

Bases: object

filter(function or None, iterable) –> filter object

Return an iterator yielding those items of iterable for which function(item) is true. If function is None, return the items that are true.

abupy.CoreBu.Pickler

_Pickler 的别名

abupy.CoreBu.Unpickler

_Unpickler 的别名

class abupy.CoreBu.partial[源代码]

Bases: object

partial(func, *args, **keywords) - new function with partial application of the given arguments and keywords.

args

tuple of arguments to future partial calls

func

function object to use in future partial calls

keywords

dictionary of keyword arguments to future partial calls

abupy.CoreBu.pd_resample(pd_object, rule, *args, **kwargs)[源代码]

对pandas中的resample操作,根据pandas version版本自动选择调用方式 :param pd_object: 可迭代的序列,pd.Series, pd.DataFrame或者只是Iterable :param rule: 具体的resample函数中需要的参数 eg. 21D, 即重采样周期值 :return: