Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY

Best practices for scaling modeling platforms

Scott Clark (SigOpt), Matt Greenwood (Two Sigma Investments)
4:55pm5:35pm Wednesday, April 17, 2019
AI Business Summit, Case Studies
Location: Sutton North/Center
Secondary topics:  AI case studies, Automation in machine learning and AI, Financial Services, Platforms and infrastructure

Who is this presentation for?

DevOps supporting researchers with operational solutions, Researchers who are interested in learning about new tools to support model training, Modeling platform engineers who are responsible for training management

Level

Intermediate

Prerequisite knowledge

*Conceptual understanding of model lifecycle management *Conceptual understanding of hyperparameter optimization *Conceptual understanding of infrastructure required for model training

What you'll learn

1) There are a few key ingredients to the success of any modeling platform 2) Hyperparameter optimization is a critical cog in the effectiveness of these platforms at scale 3) The efforts of companies like Two Sigma hold implications for teams developing these platforms

Description

Challenge

This case explores how Two Sigma, a leading quantitative investment and technology firm, solved their model optimization problem.

Algorithmic trading firms leverage massive amounts of data, advanced engineering, and quantitative research through every step of the investment process to maximize the returns for their customers. Parameterized models exist at the heart of each stage. Finding the optimal settings for these models is an ongoing challenge.

Some models are simple or well-studied enough to have closed-form analytic solutions. Others, like increasingly popular deep learning models, have analytic mathematical formulations that make them good targets for powerful gradient descent methods. Unfortunately, many models require full market simulations or machine learning algorithms where none of these fast optimization methods can be used.

Two Sigma tried both unsophisticated “grid search” and more sophisticated open source Bayesian optimization methods (like GPyOpt) to solve this problem. The former were far too expensive for even moderately complex models, and the latter were too brittle and inconsistent in their performance to use across modeling pipelines at scale. Furthermore, the cost of building, updating and maintaining the systems was a greater tax on Two Sigma’s resources than expected.

Solution

In a departure from our preference for open-source or internally built tools, Two Sigma trialed SigOpt as the optimization engine in a component of their modeling platform. At first they tested it against other methods to benchmark performance. Then quickly thereafter, they standardized on SigOpt as the preferred optimization engine powering their modeling platform. In the process, the Two Sigma team realized a few benefits.

First, SigOpt drove significant performance gains. In testing against alternative methods like GPyOpt, SigOpt delivered better results much faster. To contextualize this significant performance gain, consider one machine learning model that had particularly lengthy training cycles. Using GPyOpt, it took 24 days to tune. With SigOpt, the tuning process resulted in a more accurate model and only took 3 days to do so. That is, it resulted in a better performing model 8x faster.

Second, SigOpt offered advanced optimization features that allowed Two Sigma to solve entirely new business problems with modeling. One of the more intuitive examples of these advanced features is multimetric optimization. This feature empowers teams to optimize multiple metrics at the same time and analyze the Pareto-optimal frontier of solutions. This feature is useful in traditional machine learning scenarios, where, for example, teams may sacrifice accuracy for inference time.

Finally, SigOpt offers asynchronous parallelization of compute. Other solutions take advantage of massive clusters, but evaluate tasks in batches and wait for every task within the batch to complete before launching the next set of tasks. SigOpt’s algorithm provides a new task to evaluate as soon as one completes, meaning 100% of machines are utilized throughout the optimization process.

Conclusion

During this talk, we will explore each of these scenarios more deeply. We will provide a deeper overview of this particular benchmark and what this faster time to tune practically means for teams who are building modeling platforms. Then we will explore how techniques like multimetric optimization and asynchronous parallelization combine to empower teams to implement entirely new modeling strategies with significantly greater asset utilization.

Photo of Scott Clark

Scott Clark

SigOpt

Scott is a co-founder and CEO of SigOpt, providing optimization tools as a service, helping experts optimally tune their machine learning models. Scott has been applying optimal learning techniques in industry and academia for years, from bioinformatics to production advertising systems. Before SigOpt, Scott worked on the Ad Targeting team at Yelp leading the charge on academic research and outreach with projects like the Yelp Dataset Challenge and open sourcing MOE. Scott holds a PhD in Applied Mathematics and an MS in Computer Science from Cornell University and BS degrees in Mathematics, Physics, and Computational Physics from Oregon State University. Scott was chosen as one of Forbes’ 30 under 30 in 2016.

Photo of Matt Greenwood

Matt Greenwood

Two Sigma Investments

Matt is the Chief Innovation Officer at Two Sigma Investments. Since joining Two Sigma in 2003, he has led company-wide efforts across both engineering and modeling teams. Matt oversees development of BeakerX, which extends Jupyter Notebook to support to six languages, additional widgets, and one-click publication. Matt is also a board member and Venture Partner at Two Sigma Ventures and works closely with portfolio companies in both board membership and advisory capacities.

Matt began his career at Bell Labs and later moved to IBM Research, where he was responsible for early efforts in tablet computing and distributed computing. In 2000, Matt was lead developer and manager for Entrisphere, Inc., where he helped create a product providing access equipment for broadband service providers. Matt earned a BA and MA in Math from Oxford University, and an MA in Theoretical Physics from the Weizmann Institute of Science in Israel. He also holds a PhD in Mathematics from Columbia University, where he taught for many years.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)