「量化前沿」量化交易部分经典学术论文之一:机器学习+高频套利

2023-07-27 10:09发布

量化交易范围很大,从交易类型看,我们认为大部分市场上运行的策略都可以分类为多因子阿尔法策略、趋势跟踪策略和套利策略。我们主要专注于商品期货市场的趋势跟踪和统计套利策略的学术研究和实务量化交易工作,从前沿学术论文寻找交易策略Idea是一个途径,许多量化交易类学术论文的作者本身就是交易员,如AQR、JPM、JCM、JFM等量化交易和商品期货类学术杂志有很多这方面文献。本期开始,我们定期贴出近期团队在学习、FOLLOW和改造用于仿真及实盘交易的学术论文,并配以学习笔记。本期我们学习一篇高频统计套利的论文,这篇论文发表在SCI一区杂志《Expert Systems with Applications》。


1 统计套利:《Enhancing a Pairs Trading strategy with the application of Machine Learning》


原文摘要:Abstract—Pairs Trading is one of the most valuable marketneutral strategies used by hedge funds. It is particularly interesting as it overcomes the arduous process of valuing securities by focusing on relative pricing. By buying a relatively undervalued security and selling a relatively overvalued one, a profit can be made upon the pair’s price convergence. However, with the growing availability of data, it became increasingly harder to find rewarding pairs. In this work we address two problems: (i) how to find profitable pairs while constraining the search space and (ii) how to avoid long decline periods due to prolonged divergent pairs. To manage these difficulties, the application of promising Machine Learning techniques is investigated in detail. We propose the integration of an Unsupervised Learning algorithm, OPTICS, to handle problem (i). The results obtained demonstrate the suggested technique can outperform the common pairs’ search methods, achieving an average portfolio Sharpe ratio of 3.79, in comparison to 3.58 and 2.59 obtained by standard approaches. For problem (ii), we introduce a forecasting-based trading model, capable of reducing the periods of portfolio decline by 75%. Yet, this comes at the expense of decreasing overall profitability. The proposed strategy is tested using an ARMA model, an LSTM and an LSTM Encoder-Decoder. This work’s results are simulated during varying periods between January 2009 and December 2018, using 5-minutes price data from a group of 208 commoditylinked ETFs, and accounting for transaction costs.


直接粗暴的讲,作者干了这样一件事情,论文使用EFT日内高频数据做跨品种套利,套利采用配对交易模式。首先,由于涉及的ETF较多,就涉及选择配对组合可能性很多,如何选择配对组合的难题。论文中使用了208中与商品相关的ETF,那么穷尽所有可能就是4,477,824种配对组合可能。如果在国内商品期货市场上,流动性好的品种40多个合约,如果根据传统产业链套利,则可能损失较多配对组合的可能性。如果穷尽所有可能配对组合去测试,则计算开销较大(40*39/2)种可能性,在股票市场上4000多个品种则配对可能性更多。于是作者提出建立一个机器学习框架去优选配对组合。他们首先对各品种收益率协方差矩阵做降维提取特征,然后使用无监督学习方式对所有品种做分类,最后优选配对组合。其次,配对组合的价差一般有均值回归效应,作者认为神经网络对价差变化率的预测可以发挥重要作用,于是在价差变化率预测上,作者引进LSTM。作者在策略和研究设计上做了很精细的安排,比如训练集和测试样本的分类,滚动部署预测模型等等。










对所有涉及的ETF做特征聚类




最后,作者发现这一套机器学习框架下的高频配对策略,对交易有明显增益。这给予我们很大启发,我们在商品期货市场高频套利中,目前主要实施对多品种的跨期套利,主要针对某一品种主力和次主力合约设计套利模型,借鉴这篇文章的思路,我们对国内商品期货市场流动性较好的30-40个品种优选配对组合,可以发掘除了传统产业链跨品种套利之外的配对可能性。