Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Becoming a Scalable Data Scientist

Carlos Guestrin (Apple | University of Washington ), Alice Zheng (Amazon), Shawn Scully (Dato)
1:30pm–5:00pm Wednesday, 10/15/2014
Data Science
Location: 1 C03/1 C04
Average rating: **...
(2.50, 10 ratings)

Data Science is perhaps the hottest profession on the market today. Folks with backgrounds ranging from Statistics and Physics to Engineering and Computer Science are eager and excited to transition to this field. However, designing and deploying data analysis and machine learning apps at scale is a significant challenge to overcome: For some folks, machine learning algorithms and methods can be obscure, too mathy, and disconnected from practice. For others, writing and deploying scalable software requires significant effort, which distracts them from focusing on their deep analytic efforts.

This tutorial focuses on two interpretations of scaling up data science: enable more of us to become data scientist and provide simple tools that significantly decrease the effort involved in deploying data science methods at scale. Using GraphLab with a simple Python interface running on your laptop, you will learn how to use state-of-the-art machine learning algorithms in practice, and through the use of GraphLab, the same code can be deployed at scale on a Hadoop cluster.

More specifically, we will provide an introduction to modern machine learning methods, and we will show how practitioners are using machine learning to detect fraud, analyze social networks, and build personalized recommender services. Through these case studies, we will walk you through the common tasks followed in all applied machine learning problems, from data cleaning, through model building, to predictions and finally insight. These techniques will be demonstrated in practice, using GraphLab and Python.

We will then turn to scaling it up, and show how the same code can be deployed at scale on a Hadoop cluster, how to build pipelines of data analysis jobs, how to monitor the performance and accuracy of these analyses directly from your laptop using our latest visualization techniques, and, finally, how to close the loop, and improve the performance of your system through interactive feature engineering, optimization, and model ensembling.

Photo of Carlos Guestrin

Carlos Guestrin

Apple | University of Washington

Carlos Guestrin is the Amazon Professor of Machine Learning at the
Computer Science & Engineering Department of the University of
Washington. He is also a co-founder and CEO of GraphLab Inc.,
focusing large-scale machine learning and graph analytics. His
previous positions include the Finmeccanica Associate Professor at
Carnegie Mellon University and senior researcher at the Intel Research
Lab in Berkeley. Carlos received his PhD and Master from Stanford
University, and a Mechatronics Engineer degree from the University of
Sao Paulo, Brazil. Carlos’ work has been recognized by awards at a
number of conferences and two journals: KDD 2007 and 2010, IPSN 2005
and 2006, VLDB 2004, NIPS 2003 and 2007, UAI 2005, ICML 2005, AISTATS
2010, JAIR in 2007 & 2012, and JWRPM in 2009. He is also a recipient
of the ONR Young Investigator Award, NSF Career Award, Alfred P. Sloan
Fellowship, IBM Faculty Fellowship, the Siebel Scholarship and the
Stanford Centennial Teaching Assistant Award. Carlos was named one of
the 2008 `Brilliant 10’ by Popular Science Magazine, received the
IJCAI Computers and Thought Award and the Presidential Early Career
Award for Scientists and Engineers (PECASE). He is a former member of
the Information Sciences and Technology (ISAT) advisory group for

Photo of Alice Zheng

Alice Zheng


Alice is the Director of Data Science at GraphLab, a Seattle-based startup that offers powerful large-scale machine learning and graph analytics tools. She loves playing with data and enabling others to play with data. She is a tool builder and an expert in Machine Learning algorithms. Her research spans software diagnosis, computer network security, and social network analysis. Prior to joining GraphLab, she was a researcher at Microsoft Research, Redmond. She holds Ph.D. and B.A. degrees in Computer Science, and a B.A. in Mathematics, all from U.C. Berkeley.

Photo of Shawn Scully

Shawn Scully


Shawn is the Director of Product at GraphLab where he helps make it easy to build cool experiences with data. He is data geeky and loves inspired technologies, businesses, and gadgets. His technical background spans recommendation systems and business analytics, physics simulations, and energy. He holds a PhD in Materials Science from Stanford University and a BA in Physics from Cornell University.