Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Working with the data of sports

Thomas Miller (Northwestern University)
12:00pm12:30pm Tuesday, March 6, 2018

There is a rich history of baseball fans and sports analysts using play-by-play information to compute traditional performance measures, such as batting average, on-base percentage, and slugging percentage for batters and earned-run average for pitchers. And there is no shortage of data: baseball records show team matchups, batter-pitcher matchups, outs and runners-on-base situations, and player on-field positions, and event codes represent the outcome of each play, along with runs scored. However, sports analytics today is more than a matter of analyzing box scores and play-by-play statistics. Faced with detailed on-field or on-court data from every game, sports teams face challenges in data management, data engineering, and analytics.

Thomas Miller details the challenges faced by a Major League Baseball team as it sought competitive advantage through data science and deep learning. Thomas demonstrates how neural network models (methods from deep learning and natural language processing) can generate vector representations of teams and players, providing more complete measures of on-field performance. These vector representations can then be used to evaluate teams and players, predict runs scored, and guide in-game strategy.

Photo of Thomas Miller

Thomas Miller

Northwestern University

Thomas W. Miller is faculty director of the Data Science Program at Northwestern University, where he has developed and taught a number of courses, including practical machine learning, web information retrieval, and network data science. In addition, he consults with businesses about performance and value measurement, data science methods, information technology, and best practices for building teams of data scientists and data engineers. Thomas has written six books about data science.