Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Featured conference sessions

11:00am–11:40am Thursday, 03/31/2016
michael dddd (Databricks)
Michael Armbrust explores real-time analytics with Spark from interactive queries to streaming.
2:40pm–3:20pm Wednesday, 03/30/2016
BayesDB enables rapid prototyping and incremental refinement of statistical models by combining a model-independent declarative query language, BQL, with machine-assisted modeling and compositional models. Richard Tibbetts and Vikash Mansinghka explore the applications of BayesDB for analyzing and understanding developmental economics data in collaboration with the Gates Foundation.
1:50pm–2:30pm Wednesday, 03/30/2016
Data scientists inhabit such an ever-changing landscape of languages, packages, and frameworks that it can be easy to succumb to tool fatigue. If this sounds familiar, you may have missed the increasing popularity of Linux containers in the DevOps world, in particular Docker. Michelangelo D'Agostino demonstrates why Docker deserves a place in every data scientist’s toolkit.
4:20pm–5:00pm Thursday, 03/31/2016
Traditional data-warehousing techniques are sometimes limited by the scalability of the implementation tools themselves. Arun Thangamani explains how the advanced architectural approaches by tools like Apache Phoenix and HBase allow new, highly scalable live-analytics solutions using the same traditional techniques and showcases a successful implementation at CDK.
4:20pm–5:00pm Thursday, 03/31/2016
Sreeni Iyer (quadanalytix), Anurag Bhardwaj (Quad Analytix)
Typically, 8–10% of product URLs in ecommerce sites are misclassified. Sreeni Iyer and Anurag Bhardwaj discuss a machine-learning-based solution that relies on an innovative fusion of classifiers that are both text- and image-based, along with human touch to handle edge cases, to automatically classify product URLs according to a canonical taxonomic organization with a high F-score.
1:50pm–2:30pm Wednesday, 03/30/2016
Dean Wampler (Lightbend)
The success of Apache Spark is bringing developers to Scala. For big data, the JVM uses memory inefficiently, causing significant GC challenges. Spark's Project Tungsten fixes these problems with custom data layouts and code generation. Dean Wampler gives an overview of Spark, explaining ongoing improvements and what we should do to improve Scala and the JVM for big data.
4:20pm–5:00pm Thursday, 03/31/2016
Joseph Turian (Workday), Alex Nisnevich (Bayes Impact)
Next-gen UIs will allow people to use plain English to interact with software. However, current published research focuses on abstract understanding, not on translating English into concrete software actions. Joseph Turian and Alex Nisnevich outline UPSHOT's English-to-SQL semantic parser and demonstrate how to build your own English-to-“your software application” parser.