Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Schedule: Data Science sessions

Inside the world of data practitioners, from the hard science of the latest algorithms and advances in machine learning to the thorny issues of cultural change and team-building.

Wednesday, October 15

Add to your personal schedule
9:00am–12:30pm Wednesday, 10/15/2014
Location: 1 E8/1 E9
Jeroen Janssens (Data Science Workshops)
Average rating: ***..
(3.96, 27 ratings)
The command line, although invented decades ago, remains an amazing environment for doing data science. By combining small, yet powerful, command-line tools you can quickly obtain, scrub, explore, visualize, and model your data. In this hands-on tutorial you will gain a solid understanding of how to leverage the power of the command line and integrate it into your existing data science workflow. Read more.
Add to your personal schedule
9:00am–5:00pm Wednesday, 10/15/2014
Location: 1 E12/1 E13
Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Brian Granger (Cal Poly San Luis Obispo), Andy Terrel (NumFOCUS), Peter Wang (Anaconda), Jake Vanderplas (eScience Institute, University of Washington), Olivier Grisel (Inria & scikit-learn), Travis Oliphant (Anaconda), Wes McKinney (Two Sigma Investments), Trent Nelson (Continuum Analytics), Kayur Patel (Google), Kester Tong (Google)
Average rating: ****.
(4.43, 14 ratings)
Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including iPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets. Read more.
Add to your personal schedule
9:00am–5:00pm Wednesday, 10/15/2014
Location: 1 E16/ 1 E17
Hadley Wickham (Rice University / RStudio), Winston Chang (RStudio), Garrett Grolemund (RStudio), Joseph Allaire (Rstudio, Inc.), Yihui Xie (RStudio, Inc.)
Average rating: *****
(5.00, 10 ratings)
From advanced visualization, collaboration, reproducibility to data manipulation, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers, the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data. Read more.
Add to your personal schedule
1:30pm–5:00pm Wednesday, 10/15/2014
SOLD OUT
Location: 1 C03/1 C04
Carlos Guestrin (Apple | University of Washington ), Alice Zheng (Amazon), Shawn Scully (Dato)
Average rating: **...
(2.50, 10 ratings)
This tutorial focuses on hands-on data science skills from prototyping to production. Using GraphLab tools, we walk through multiple case studies such as fraud detection, social network analysis, and building personalized recommendation services. Read more.

Thursday, October 16

Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Location: 1D
Claudia Perlich (Dstillery)
Average rating: ****.
(4.60, 5 ratings)
There is a symbiotic relationship between predictive modeling and Big Data. Performance gets better with more data and predictive models demonstrate like few other techniques the value of Big Data. However, there is a surprising paradox: when you need models most, even all the data is not enough or just not suitable. So in the days and age of Big Data there remains an art to predictive modeling. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Location: 1D
Joseph Adler (Confluent), Hilary Mason (Fast Forward Labs), Scott Nicholson (Poynt), Lucian Lita (Intuit), Roger Magoulas (O'Reilly Media)
Average rating: **...
(2.91, 11 ratings)
In this debate, two teams of the world's best data scientists will debate the following proposition: "If you can't code, you can't be a data scientist." Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Location: 1 E8/1 E9
Laurie Skelly (Datascope Analytics)
Average rating: **...
(2.71, 14 ratings)
Data scientists wear many hats -- how do you train a ready-for-prime-time data scientist in twelve weeks? We'll share some of the choices and models we used to create the Metis Data Science Bootcamp and select its first cohort of students. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Location: 1D
Chris Harland (Microsoft)
Average rating: ****.
(4.17, 6 ratings)
An increasingly common task for data science is the measurement and attribution of experimental impact. Using examples from healthcare.gov, Microsoft advertising, and Bing experimentation, we will explore the strengths, weaknesses, and pitfalls of techniques for dealing with impact and attribution in scenarios/data in which control experiments were not possible or otherwise not performed. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Location: 1D
Vitaly Gordon (LinkedIn)
Average rating: ***..
(3.82, 11 ratings)
A talk about how the largest professional social network in the world is digitally mapping the global economy to connect talent with opportunity at massive scale. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Location: 1D
Brian Granger (Cal Poly San Luis Obispo), Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory)
Average rating: *****
(5.00, 4 ratings)
The IPython Notebook is an open-source, web-based interactive computing environment. The Notebook enables users to author documents that combine live code, descriptive text, mathematical equations, images, videos, and arbitrary HTML. This talk will describe how IPython is evolving to support a wide range of programming languages relevant in data science, including Python, Julia, and R. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Location: 1D
Juan Miguel Lavista (Microsoft)
Average rating: ***..
(3.43, 7 ratings)
Just in the US, we make over ~40 billion queries every month. From the time we wake up, search engines are one of the top activities we do online, this talk will show some examples on how this data can be used from funny things like determining which city wakes up earlier to more complex scenarios like finding adverse drug interactions. Read more.

Friday, October 17

Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Location: 1D
Beau Cronin (Embedding.js)
Average rating: **...
(2.83, 6 ratings)
What does AI mean in 2014, and where is it headed? Every day brings news of purported breakthroughs, and some of the new applications are certainly impressive, but the field has witnessed boom/bust cycles before. What are the challenges that lie ahead this time? This talk will provide an overview of the state of the field, as well as a critical framework for thinking about the years ahead. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Location: 1D
Douglas Moore (Think Big Analytics)
Average rating: **...
(2.92, 12 ratings)
We debunk some popular approaches and attitudes we have encountered over the course of more than 50 real world Big Data implementations. We will describe each anti-pattern and its appeal--but also why they fail, and how to do it right. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Location: 1D
Vishal Chowdhary (Microsoft)
Average rating: ***..
(3.67, 6 ratings)
Microsoft Translator currently supports 100+ languages. We constantly improve the translation quality, add new scenarios, all with a constant team size. This session describes a production scale ML architecture using MS Translator as a case study. You will learn the mental model to approach your ML problem and concrete Do’s and Don’ts for the various components of the ML system architecture. Read more.
Add to your personal schedule
2:05pm–2:25pm Friday, 10/17/2014
Location: 1 E8/1 E9
Lauro Lins (AT&T Labs)
Average rating: **...
(2.75, 4 ratings)
Nanocubes is an open source project that can be used to visually explore large spatiotemporal datasets at interactive rates using a web browser. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Location: 1 E10/1 E11
Bahman Bahmani (Stanford University)
Average rating: ***..
(3.60, 5 ratings)
As in a game of chess, successful use of machine learning techniques against adaptive adversaries, such as spammers and intruders, requires designing the learning algorithms having anticipated the opponent’s response to those algorithms. In this talk, we present techniques to design robust machine learning algorithms for adversarial environments and provide clarifying attack-defense examples. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Location: 1D
Tags: fashion
Karen Moon (Trendalytics), Vijay Subramanian (Rent the Runway), Liza Kindred (Lullabot)
Average rating: **...
(2.50, 2 ratings)
Karen Moon, Co-founder and CEO, Trendalytics Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Location: 1D
Cliff Click (0xdata)
Average rating: ****.
(4.50, 4 ratings)
H2O presents the worlds fastest Distributed Parallel GBM. GBM is a ML algorithm used to win many recent Kaggle competitions, and is well known for it's high quality results. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Location: 1 E6/1 E7
Josh Levy (Vast)
Average rating: ***..
(3.00, 1 rating)
By reducing friction from deploying models and comparing competing models, data scientists can focus on high-value efforts. At Vast we've experimented with tools and strategies for this while shipping a suite of data products for consumers and agents in the midst of some of life’s biggest purchases. I'll share best practices and lessons learned, and help you free up time for the fun stuff. Read more.