Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Schedule: Data innovations sessions

11:20am–12:00pm Wednesday, 09/28/2016
Location: 1 E 07/1 E 08 Level: Intermediate
Yonik Seeley (Cloudera)
Average rating: ****.
(4.00, 2 ratings)
Yonik Seeley explores recent Apache Solr features in the areas of faceting and analytics, including parallel SQL, streaming expressions, distributed join, and distributed graph queries, as well as the trade-offs of different approaches and strategies for maximizing scalability. Read more.
11:20am–12:00pm Wednesday, 09/28/2016
Location: 3D 10 Level: Intermediate
Owen O'Malley (Cloudera)
Average rating: ****.
(4.92, 12 ratings)
Picking the best data format depends on what kind of data you have and how you plan to use it. Owen O'Malley outlines the performance differences between formats in different use cases and offers an overview of the advantages and disadvantages of each to help you improve the performance of your applications. Read more.
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1 E 07/1 E 08 Level: Beginner
Tags: pydata
Brian Granger (Cal Poly San Luis Obispo), Sylvain Corlay (QuantStack), Jason Grout (Bloomberg LP)
Average rating: ****.
(4.75, 12 ratings)
Brian Granger, Sylvain Corlay, and Jason Grout offer an overview of JupyterLab, the next-generation user interface for Project Jupyter that puts Jupyter Notebooks within a powerful user interface that allows the building blocks of interactive computing to be assembled to support a wide range of interactive workflows used in data science. Read more.
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1 E 07/1 E 08 Level: Beginner
Stuart Lynn (CartoDB), Andy Eschbacher (CARTO)
Average rating: ****.
(4.14, 7 ratings)
Geospatial analysis can provide deep insights into many datasets. Unfortunately the key tools to unlocking these insights—geospatial statistics, machine learning, and meaningful cartography—remain inaccessible to nontechnical audiences. Stuart Lynn and Andy Eschbacher explore the design challenges in making these tools accessible and integrated in an intuitive location intelligence platform. Read more.
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1 E 07/1 E 08 Level: Advanced
Tags: real-time
Julien Le Dem (WeWork), Jacques Nadeau (Dremio)
Average rating: ****.
(4.33, 6 ratings)
In pursuit of speed, big data is evolving toward columnar execution. The solid foundation laid by Arrow and Parquet for a shared columnar representation across the ecosystem promises a great future. Julien Le Dem and Jacques Nadeau discuss the future of columnar and the hardware trends it takes advantage of, like RDMA, SSDs, and nonvolatile memory. Read more.
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1 E 07/1 E 08 Level: Beginner
Himanshu Gupta (Yahoo)
Average rating: ***..
(3.33, 3 ratings)
Himanshu Gupta explains why Yahoo has been increasingly investing in interactive analytics and how it leverages Druid to power a variety of internal- and external-facing data applications. Read more.
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 3D 12 Level: Intermediate
Siva Raghupathy (Amazon Web Services)
Average rating: ****.
(4.85, 13 ratings)
Siva Raghupathy demonstrates how to use Hadoop innovations in conjunction with Amazon Web Services (cloud) innovations. Read more.
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1 E 07/1 E 08 Level: Intermediate
Kurt Brown (Netflix)
Average rating: ****.
(4.22, 9 ratings)
The Netflix data platform is constantly evolving, but fundamentally it's an all-cloud platform at a massive scale (40+ PB and over 700 billion new events per day) focused on empowering developers. Kurt Brown dives into the current technology landscape at Netflix and offers some thoughts on what the future holds. Read more.
11:20am–12:00pm Thursday, 09/29/2016
Location: 1 E 07/1 E 08 Level: Intermediate
Ryan Blue (Netflix)
Average rating: ****.
(4.71, 7 ratings)
Netflix is exploring new avenues for data processing where traditional approaches fail to scale. Ryan Blue explains how Netflix is building on Parquet to enhance its 40+ petabyte warehouse, combining Parquet's features with Presto and Spark to boost ETL and interactive queries. Information about tuning Parquet is hard to find. Ryan shares what he's learned, creating the missing guide you need. Read more.
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1 E 07/1 E 08 Level: Beginner
Tags: real-time
Tyler Akidau (Google)
Average rating: ****.
(4.67, 3 ratings)
Tyler Akidau offers a whirlwind tour of the conceptual building blocks of massive-scale data processing systems over the last decade, comparing and contrasting systems at Google with popular open source systems in use today. Read more.
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1 E 07/1 E 08 Level: Intermediate
Xavier Léauté (Confluent)
Ever wondered what it takes to scale Kafka, Samza, and Druid to handle complex, heterogeneous analytics workloads at petabyte size? Xavier Léauté discusses his experience scaling Metamarkets's real-time processing to over 3 million events per second and shares the challenges encountered and lessons learned along the way. Read more.
2:05pm–2:45pm Thursday, 09/29/2016
Location: 3D 12 Level: Beginner
Thomas Phelan (BlueData)
Average rating: ****.
(4.11, 18 ratings)
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale environments poses new challenges, especially for big data applications like Hadoop. Thomas Phelan shares lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment. Read more.
2:55pm–3:35pm Thursday, 09/29/2016
Location: 1 E 07/1 E 08 Level: Non-technical
Bart van Leeuwen (Netage)
Average rating: *****
(5.00, 2 ratings)
Smart data allows fire services to better protect the people they serve and keep their firefighters safe. The combination of open and nonpublic data used in a smart way generates new insights both in preparation and operations. Bart van Leeuwen discusses how the fire service is benefiting from open standards and best practices. Read more.
2:55pm–3:35pm Thursday, 09/29/2016
Location: 1 C04 / 1 C05 Level: Beginner
Haoyuan Li (Alluxio)
Average rating: ****.
(4.00, 1 rating)
Haoyuan Li offers an overview of Alluxio (formerly Tachyon), a memory-speed virtual distributed storage system. In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features. This year, the goal is to make Alluxio accessible to an even wider set of users through a focus on security, new language bindings, and APIs. Read more.
4:35pm–5:15pm Thursday, 09/29/2016
Location: 1 E 07/1 E 08 Level: Intermediate
Tags: real-time
Fangjin Yang (Imply)
Average rating: *****
(5.00, 4 ratings)
Cluster computing frameworks such as Hadoop or Spark are tremendously beneficial in processing and deriving insights from data. However, long query latencies make these frameworks suboptimal choices to power interactive applications. Fangjin Yang discusses using Druid for analytics and explains why the architecture is well suited to power analytic dashboards. Read more.
4:35pm–5:15pm Thursday, 09/29/2016
Location: 3D 10 Level: Intermediate
Jeffrey Carpenter (DataStax)
Average rating: ****.
(4.33, 3 ratings)
Jeff Carpenter describes how data modeling can be a key enabler of microservice architectures for transactional and analytics systems, including service identification, schema design, and event streaming. Read more.