Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Schedule: Big data and the Cloud sessions

Learn to deploy cloud computing platforms such as Amazon Web Services, Google Cloud, and Microsoft Azure—including migrating BI & SQL and integrating tools like Hadoop, Spark, R, TensorFlow & BigQuery.

9:00am - 5:00pm Monday, March 13 & Tuesday, March 14
Location: 213
Secondary topics:  Architecture, Cloud
Jesse Anderson (Big Data Institute)
Average rating: ****.
(4.00, 1 rating)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
9:00am12:30pm Tuesday, March 14, 2017
Location: LL20 D
Secondary topics:  Cloud
Radhika Ravirala (Amazon Web Services), Ryan Nienhuis (Amazon Web Services), Ben Snively (Amazon Web Services (AWS)), Dario Rivera (Amazon Web Services (AWS))
Average rating: ***..
(3.50, 2 ratings)
Want to ramp up your knowledge of Amazon's big data web services and launch your first big data application on the cloud? Ben Snively, Radhika Ravirala, Ryan Nienhuis, and Dario Rivera walk you through building a big data application using open source technologies, such as Apache Hadoop, Spark, and Zeppelin, and AWS managed services, such as Amazon EMR, Amazon Kinesis, and more. Read more.
9:00am12:30pm Tuesday, March 14, 2017
Location: LL21 A Level: Intermediate
Secondary topics:  Architecture, Cloud
Jennifer Wu (Cloudera), Eugene Fratkin (Cloudera), Andrei Savu (Cloudera), Tony Wu (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
Jennifer Wu, Eugene Fratkin, Andrei Savu, and Tony Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud. Read more.
9:00am12:30pm Tuesday, March 14, 2017
Location: LL20 C
Edd Wilder-James (Google), Ellen Friedman (MapR Technologies), Jim Scott (MapR Technologies), GABRIELA QUEIROZ (R-Ladies), Melanie Warrick (Google), Aneesh Karve (Quilt)
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata + Hadoop World, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems. Read more.
1:30pm5:00pm Tuesday, March 14, 2017
Location: 210 D/H
Secondary topics:  Architecture, Cloud
James Malone (Google), John Mikula (Google Cloud)
Average rating: **...
(2.00, 6 ratings)
James Malone explores using managed Spark and Hadoop solutions in public clouds alongside cloud products for storage, analysis, and message queues to meet enterprise requirements via the Spark and Hadoop ecosystem. Read more.
11:00am11:40am Wednesday, March 15, 2017
Location: 210 A/E Level: Intermediate
Secondary topics:  Architecture, Cloud
Sriram Ganesan (Qubole), Prakhar Jain (Qubole)
Average rating: ***..
(3.00, 2 ratings)
Qubole started out by offering Hadoop as a service in AWS. Over time, it extended its big data capabilities beyond Hadoop and its cloud infrastructure support beyond AWS. Sriram Ganesan and Prakhar Jain explain how and why Qubole built Cloudman, a simple, cloud-agnostic, multipurpose provisioning tool that can be extended for further engines and further cloud support. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Location: 210 A/E
Secondary topics:  Architecture, Cloud
Andrei Savu (Cloudera), Jennifer Wu (Cloudera)
Average rating: ***..
(3.00, 3 ratings)
Cloud infrastructure, with a scalable data store and elastic compute, is particularly well suited for large-scale data engineering workloads. Andrei Savu and Jennifer Wu explore the latest cloud technologies and outline cost, security, and ease-of-use considerations for data engineers. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Location: 210 A/E Level: Intermediate
Secondary topics:  Architecture, Cloud
Henry Robinson (Cloudera), Alex Gutow (Cloudera)
Henry Robinson and Alex Gutow explain how to best take advantage of the flexibility and cost-effectiveness of the cloud with your BI and SQL analytic workloads using Apache Hadoop and Apache Impala (incubating) to provide the same great functionality, partner ecosystem, and flexibility of on-premises deployments. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: LL21 E/F Level: Beginner
Secondary topics:  Architecture, Cloud
Paige Liu (Microsoft), John Zhuge (Netflix)
Paige Liu and John Zhuge explore the options and trade-offs to consider when building a Cloudera cluster on Microsoft Azure Cloud and explain how to deploy and scale a Cloudera cluster on Azure and how to connect a Cloudera cluster with other Azure services to build enterprise-grade end-to-end big data solutions. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: 230 A Level: Intermediate
Secondary topics:  Architecture, Data Platform
Gwen Shapira (Confluent), Bob Lehmann (Bayer)
Average rating: ****.
(4.50, 2 ratings)
Gwen Shapira and Bob Lehmann share their experience and patterns building a cross-data-center streaming data platform for Monsanto. Learn how to facilitate your move to the cloud while "keeping the lights on" for legacy applications. In addition to integrating private and cloud data centers, you'll discover how to establish a solid foundation for a transition from batch to stream processing. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: 210 A/E Level: Intermediate
Secondary topics:  Cloud
Shubham Tagra (Qubole)
Shubham Tagra offers an introduction to RubiX, a lightweight, cross-engine caching solution that works well with optimized columnar formats by caching only the required amount of data. RubiX can be used with any data analytics engine that reads data from remote sources via the Hadoop FileSystem interface without any changes to the source code of those engines. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: LL20 D Level: Beginner
Secondary topics:  Architecture, IoT, Manufacturing, Platform, Streaming
Kishore R (GE)
Average rating: ***..
(3.00, 1 rating)
Kishore Reddipalli explores how to stream data at a large scale from the edge to the cloud to the client, detect anomalies, analyze machine data in stream and rest in an industrial world, and optimize the industrial operations by providing real-time insights and recommendations using big data technologies. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: 210 A/E
Secondary topics:  Cloud
Mark Donsky (Okera), Sudhanshu Arora (Cloudera)
Average rating: ****.
(4.00, 1 rating)
Big data needs governance. Governance empowers data scientists to find, trust, and use data on their own, yet it can be overwhelming to know where to start—especially if your big data environment spans beyond your enterprise to the cloud. Mark Donsky and Sudhanshu Arora share a step-by-step approach to kick-start your big data governance initiatives. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: LL21 C/D
Secondary topics:  Cloud
Anand Iyer (Cloudera), Eugene Fratkin (Cloudera)
Average rating: *****
(5.00, 1 rating)
Both Spark workloads and use of the public cloud have been rapidly gaining adoption in mainstream enterprises. Anand Iyer and Eugene Fratkin discuss new developments in Spark and provide an in-depth discussion on the intersection between the latest Spark and cloud technologies. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture, Cloud, Geospatial
Naghman Waheed (Bayer Crop Science), Martin Mendez-Costabel (Bayer Crop Science)
Average rating: ****.
(4.00, 1 rating)
Recently, the volume of data collected from farmers' fields via sensors, rovers, drones, in-cabin technologies, and other sources has forced Monsanto to rethink its geospatial processing capabilities. Naghman Waheed and Martin Mendez-Costabel explain how Monsanto built a scalable geospatial platform using cloud and open source technologies. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: 210 A/E
Secondary topics:  Architecture, Cloud
Dale Kim (Arcadia Data)
Big data applications in the cloud are becoming more about the global distribution and access of data than about easier deployments. Dale Kim shares insights on architecting big data applications for the cloud, using an example reference application his team built and published as context for describing several key requirements for cloud-based environments. Read more.
11:50am12:30pm Thursday, March 16, 2017
Location: LL21 C/D Level: Beginner
Secondary topics:  Architecture
Haoyuan Li (Alluxio), Gene Pang (Alluxio)
Average rating: ****.
(4.00, 1 rating)
Alluxio (formerly Tachyon) is an open source memory-speed virtual distributed storage system. The project has experienced a tremendous improvement in performance and scalability and was extended with key new features. Haoyuan Li and Gene Pang explore Alluxio's goal of making its product accessible to an even wider set of users through a focus on security, new language bindings, and APIs. Read more.