Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

featured conference sessions

11:20am–12:00pm Thursday, 10/01/2015
Kurt Brown (Netflix)
Slides:   external link
The Netflix Data Platform is a constantly evolving, large scale infrastructure running in the (AWS) cloud. We are especially focused on performance and ease of use, with initiatives including Presto integration, Spark, and our big data portal and API. This talk will dive into the various technologies we use, the motivations behind our approach, and the business benefits we get.
2:55pm–3:35pm Wednesday, 09/30/2015
Mike Lee Williams (Cloudera Fast Forward Labs)
Because of the way sentiment analysis algorithms are trained, they systematically amplify the voices of those who express themselves unsubtly and aggressively. I will extrapolate from this observation to show the ways in which supervised machine learning has the potential to amplify social and economic privilege.
2:25pm–2:45pm Thursday, 10/01/2015
Allen Downey (Olin College of Engineering)
Slides:   external link
Bayesian methods are well-suited for business applications because they provide concrete guidance for decision-making under uncertainty.  But many data science teams lack the background to take advantage of these methods.  In this presentation I will explain the advantages and suggest ways for teams to develop skills and add Bayesian methods to their toolkit.
1:15pm–1:55pm Thursday, 10/01/2015
Daniel Weeks (Netflix)
Slides:   1-PDF 
The Big Data Platform team at Netflix continues to push big data processing in the cloud with the addition of Spark to our platform. Recent enhancements to Spark allow us to effectively leverage it for processing against a 10+ petabyte warehouse backed by S3. We will share our experiences and performance of production jobs along with the pains and gains of deploying Spark at scale on YARN.
1:15pm–1:55pm Thursday, 10/01/2015
Michael Freeman (University of Washington)
Slides:   1-PDF 
Data-driven decision-making can only be properly executed when the decision makers understand both the underlying data, and the types of manipulations that have been applied to it. In this session, we’ll explore what exactly we "do" to data (aggregation, "cleaning," statistical modeling, machine learning), and how to visually communicate about the processes and implications of our work.