Past performance is no guarantee of future returns. However, when automated trading strategies are developed and evaluated using backtests on historical pricing data, there is always a tendency, intentional or not, to overfit to the past. As a result, strategies that show fantastic performance on historical data often flounder when deployed with real capital.
Quantopian is an online platform that allows users to develop, backtest, and trade algorithmic investing strategies. By pooling all strategies developed on its platform, Quantopian constructed a huge and unique dataset of over 800,000 trading algorithms. Although Quantopian doesn’t have access to source code, it has access to returns and portfolio allocations as well as the time the algorithm was last edited, allowing a comparison of returns over the period the author had access and potentially overfit the model, as well as any true out-of-sample data that has accumulated since. Thomas Wiecki explores the prevalence of backtest overfitting and debunks several common myths in quantitative finance based on empirical findings. Thomas then demonstrates how he trained a machine-learning classifier on Quantopian’s dataset to predict if an algorithm is overfit and how its future performance will likely unfold.
Thomas Wiecki is the lead data science researcher at Quantopian, where he uses probabilistic programming and machine learning to help build the world’s first crowdsourced hedge fund. Among other open source projects, he is involved in the development of PyMC—a probabilistic programming framework written in Python. A recognized international speaker, Thomas has given talks at various conferences and meetups across the US, Europe, and Asia. He holds a PhD from Brown University.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.