Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Predicting out-of-sample performance of a large cohort of trading algorithms with machine learning

Thomas Wiecki (Quantopian)
11:15–11:55 Friday, 3/06/2016
Data science & advanced analytics
Location: Capital Suite 8/9 Level: Intermediate
Average rating: ***..
(3.50, 6 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of the stock market, data analysis (regressions), and machine learning (classifiers).


Past performance is no guarantee of future returns. However, when automated trading strategies are developed and evaluated using backtests on historical pricing data, there is always a tendency, intentional or not, to overfit to the past. As a result, strategies that show fantastic performance on historical data often flounder when deployed with real capital.

Quantopian is an online platform that allows users to develop, backtest, and trade algorithmic investing strategies. By pooling all strategies developed on its platform, Quantopian constructed a huge and unique dataset of over 800,000 trading algorithms. Although Quantopian doesn’t have access to source code, it has access to returns and portfolio allocations as well as the time the algorithm was last edited, allowing a comparison of returns over the period the author had access and potentially overfit the model, as well as any true out-of-sample data that has accumulated since. Thomas Wiecki explores the prevalence of backtest overfitting and debunks several common myths in quantitative finance based on empirical findings. Thomas then demonstrates how he trained a machine-learning classifier on Quantopian’s dataset to predict if an algorithm is overfit and how its future performance will likely unfold.

Photo of Thomas Wiecki

Thomas Wiecki


Thomas Wiecki is the lead data science researcher at Quantopian, where he uses probabilistic programming and machine learning to help build the world’s first crowdsourced hedge fund. Among other open source projects, he is involved in the development of PyMC—a probabilistic programming framework written in Python. A recognized international speaker, Thomas has given talks at various conferences and meetups across the US, Europe, and Asia. He holds a PhD from Brown University.