Put AI to Work
April 15-18, 2019
New York, NY

Best practices for scaling modeling platforms

Scott Clark (SigOpt), Matt Greenwood (Two Sigma Investments)
4:55pm5:35pm Wednesday, April 17, 2019
AI Business Summit, Case Studies
Location: Sutton North/Center
Secondary topics:  AI case studies, Automation in machine learning and AI, Financial Services, Platforms and infrastructure

Who is this presentation for?

  • DevOps engineers, researchers, and modeling platform engineers

Level

Intermediate

Prerequisite knowledge

  • A basic understanding of model lifecycle management, hyperparameter optimization, and infrastructure required for model training

What you'll learn

  • Understand the key ingredients to the success of any modeling platform
  • See why hyperparameter optimization is a critical cog in the effectiveness of these platforms at scale
  • Explore Two Sigma efforts to develop these platforms

Description

Companies are increasingly building modeling platforms to empower their researchers to efficiently scale the development and productionalization of their models. Scott Clark and Matt Greenwood share a case study from a leading algorithmic trading firm to illustrate best practices for building these types of platforms in any industry. Join in to learn how Two Sigma, a leading quantitative investment and technology firm, solved its model optimization problem.

Algorithmic trading firms leverage massive amounts of data, advanced engineering, and quantitative research through every step of the investment process to maximize the returns for their customers. Parameterized models exist at the heart of each stage. Finding the optimal settings for these models is an ongoing challenge. Some models are simple or well studied enough to have closed-form analytic solutions. Others, like increasingly popular deep learning models, have analytic mathematical formulations that make them good targets for powerful gradient descent methods. Unfortunately, many models require full market simulations or machine learning algorithms where none of these fast optimization methods can be used.

Two Sigma tried both unsophisticated “grid search” and more sophisticated open source Bayesian optimization methods (like GPyOpt) to solve this problem. The former were far too expensive for even moderately complex models, and the latter were too brittle and inconsistent in their performance to use across modeling pipelines at scale. Furthermore, the cost of building, updating and maintaining the systems was a greater tax on Two Sigma’s resources than expected.

In a departure from its preference for open source or internally built tools, Two Sigma trialed SigOpt as the optimization engine in a component of their modeling platform. The company first tested it against other methods to benchmark performance and quickly standardized on SigOpt as the preferred optimization engine powering the modeling platform. In the process, the Two Sigma team realized a few benefits.

First, SigOpt drove significant performance gains. In testing against alternative methods like GPyOpt, SigOpt delivered better results much faster. To contextualize this significant performance gain, consider one machine learning model that had particularly lengthy training cycles. Using GPyOpt, it took 24 days to tune. With SigOpt, the tuning process resulted in a more accurate model and only took 3 days to do so. That is, it resulted in a better performing model 8x faster.

Second, SigOpt offered advanced optimization features that allowed Two Sigma to solve entirely new business problems with modeling. One of the more intuitive examples of these advanced features is multimetric optimization. This feature empowers teams to optimize multiple metrics at the same time and analyze the Pareto-optimal frontier of solutions. This feature is useful in traditional machine learning scenarios, where, for example, teams may sacrifice accuracy for inference time.

Finally, SigOpt offers asynchronous parallelization of compute. Other solutions take advantage of massive clusters but evaluate tasks in batches and wait for every task within the batch to complete before launching the next set of tasks. SigOpt’s algorithm provides a new task to evaluate as soon as one completes, meaning 100% of machines are utilized throughout the optimization process.

Scott and Matt explore each of these scenarios more deeply and provide a deeper overview of this particular benchmark—and what this faster time to tune practically means for teams who are building modeling platforms. They then discuss how techniques like multimetric optimization and asynchronous parallelization combine to empower teams to implement entirely new modeling strategies with significantly greater asset utilization.

Photo of Scott Clark

Scott Clark

SigOpt

Scott Clark is a cofounder and CEO of SigOpt, providing optimization tools as a service that help experts optimally tune their machine learning models. Scott has been applying optimal learning techniques in industry and academia for years, from bioinformatics to production advertising systems. Previously, he worked on the ad targeting team at Yelp, leading the charge on academic research and outreach with projects like the Yelp Dataset Challenge and open sourcing MOE. Scott was chosen as one of Forbes’s 2016 “30 under 30.” He holds a PhD in applied mathematics, an MS in computer science from Cornell University, and BS degrees in mathematics, physics, and computational physics from Oregon State University.

Photo of Matt Greenwood

Matt Greenwood

Two Sigma Investments

Matt Greenwood is the chief innovation officer at Two Sigma Investments, where he has led company-wide efforts across both engineering and modeling teams. Matt oversees development of BeakerX, which extends the Jupyter Notebook to support to six languages, additional widgets, and one-click publication. Matt is also a board member and Venture Partner at Two Sigma Ventures and works closely with portfolio companies in both board membership and advisory capacities. Matt began his career at Bell Labs and later moved to IBM Research, where he was responsible for early efforts in tablet computing and distributed computing. Matt was also lead developer and manager at Entrisphere, where he helped create a product providing access equipment for broadband service providers. Matt holds a PhD in mathematics from Columbia University, where he taught for many years, as well as a BA and MA in math from Oxford University and an MA in theoretical physics from the Weizmann Institute of Science in Israel.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)