Best Practices for Reproducible Research: A Case Study in Quantitative Finance

Data Science, Sutton Center / Sutton South (NY Hilton)
Average rating: ***..
(3.50, 2 ratings)

Reproducibility is key in quantitative research. For data scientists, easily replicated results means greater confidence in correctness, less time investigating discrepancies, and improved robustness regardless of complexity. At the organizational level, processes that emphasizes the reproducibility criterion will improve robustness and help scale the depth and complexity of research.

In this presentation, I will discuss and demonstrate best practices for data scientists that emphasizes the goal of reproducibility. The principles presented are generally applicable, but the examples will focus on financial research in Python. I will present a case study in quantitative asset management to outline four main processes.

First, raw data should be properly stored and managed, with an emphasis on enabling the retrieval of historical time series data anchored by a particular observation date. Second, processed data and intermediate results should be persisted to enable a researcher to pinpoint the causes of discrepancies in final results. Third, code, trading signals, and model configurations should be version controlled so that various pieces of the computation can be rolled back to control the number of variables when comparing new and previous output. Fourth, regular testing of the entire research data and code stack helps catch and document bugs in code as well as changes to configuration. This aids in attributing changes in final output to either changes in underlying fundamentals, or changes in data, computations, and configurations.

The process of identifying the cause of discrepancies in results is a
controlled experiment whose main inputs are raw and processed data,
computational code, and configuration parameters for models and factors. Practitioners with the right tools and habits will be able to maximize the reproducibility of their results, easily diagnose differences, and ultimately be more productive and effective.

Photo of Chang She

Chang She

Cloudera

Chang She is CTO and cofounder of Lambda Foundry. From 2011 to 2012, he served as Assistant Vice President at Barclays Capital researching quantitative FX strategies and building research infrastructure. From 2006-2011, he worked at AQR Capital Management in global equities research and algorithm execution. He graduated from MIT with an M.Eng in Computer Science and S.B. degrees in Computer Science and Political Science.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.