Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Big data and data science in the cloud sessions

1:30pm–5:00pm Tuesday, 09/11/2018

Building your first big data application on AWS

Location: 1E 12/13 Level: Intermediate

Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Faria Bruno (Amazon Web Services)

Average rating:

(2.86, 7 ratings)

Want to learn how to use Amazon's big data web services to launch your first big data application in the cloud? Jorge Lopez, Radhika Ravirala, Paul Sears, and Bruno Faria walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

Running multidisciplinary big data workloads in the cloud

Location: 1E 14 Level: Intermediate

Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera)

Attend this tutorial to learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows and explore considerations and best practices for data analytics pipelines in the cloud. Along the way, you'll see how to share metadata across workloads in a big data PaaS. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

The evolution of Netflix's S3 data warehouse

Location: 1A 10 Level: Intermediate

Secondary topics: Data Platforms

Ryan Blue (Netflix), Daniel Weeks (Netflix)

Average rating:

(5.00, 3 ratings)

In the last few years, Netflix's data warehouse has grown to more than 100 PB in S3. Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Optimizing Apache Impala for a cloud-based data warehouse

Location: 1A 10 Level: Intermediate

Greg Rahn (Cloudera)

Average rating:

(5.00, 1 rating)

Cloud object stores are becoming the bedrock of cloud data warehouses for modern data-driven enterprises, and it's become a necessity for data teams to have the ability to directly query data stored in S3 or ADLS. Greg Rahn and Mostafa Mokhtar discuss optimal end-to-end workflows and technical considerations for using Apache Impala over object stores for your cloud data warehouse. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Circuit breakers to safeguard for garbage in, garbage out

Location: 1A 23/24 Level: Beginner

Secondary topics: Data Integration and Data Pipelines, Financial Services

Sandeep Uttamchandani (Intuit)

Do your analysts always trust the insights generated by your data platform? Ensuring insights are always reliable is critical for use cases in the financial sector. Sandeep Uttamchandani outlines a circuit breaker pattern developed for data pipelines, similar to the common design pattern used in service architectures, that detects and corrects problems and ensures always reliable insights. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Media, Marketing, Advertising

Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )

Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Self-service modern analytics on the GovCloud

Location: 1A 21/22 Level: Intermediate

Ramesh Krishnan (lmco), Steven Morgan (Lockheed Martin)

Average rating:

(4.00, 1 rating)

Lockheed Martin is a data-driven company with a massive variety and volume of data. To extract the most value from its information assets, the company is constantly exploring ways to enable effective self-service scenarios. Ramesh Krishnan and Steve Morgan discuss Lockheed Martin's journey into modern analytics and explore its analytics platform focused on leveraging AWS GovCloud. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Cassandra versus cloud databases

Location: 1A 23/24 Level: Beginner

Jonathan Ellis (DataStax)

Average rating:

(4.50, 2 ratings)

Is open source Apache Cassandra still relevant in an era of hosted cloud databases? Jonathan Ellis discusses Cassandra’s strengths and weaknesses relative to Amazon DynamoDB, Microsoft CosmosDB, and Google Cloud Spanner. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Deep learning on audio in Azure to detect sounds in real time

Location: 1A 15/16 Level: Beginner

Secondary topics: Deep Learning

Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)

Average rating:

(5.00, 3 ratings)

In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Building turnkey recommendations for 5% of internet video

Location: 1A 21/22 Level: Intermediate

Secondary topics: Deep Learning, Media, Marketing, Advertising, Recommendation Systems

Nir Yungster (JW Player), Kamil Sindi (JW Player)

JW Player—the world’s largest network-independent video platform, representing 5% of global internet video—provides on-demand recommendations as a service to thousands of media publishers. Nir Yungster and Kamil Sindi explain how the company is systematically improving model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Best practices for developing an enterprise data hub to collect and analyze 1 TB of data a day from a multiple services with Apache Kafka and Google Cloud Platform

Location: 1A 23/24 Level: Beginner

Secondary topics: Data Integration and Data Pipelines

Kenji Hayashida (Recruit Lifestyle co., ltd.), Toru Sasaki (NTT DATA Corporation)

Average rating:

(4.50, 2 ratings)

Recruit Group and NTT DATA Corporation have developed a platform based on a data hub, utilizing Apache Kafka. This platform can handle around 1 TB/day of application logs generated by a number of services in Recruit Group. Kenji Hayashida and Toru Sasaki share best practices for and lessons learned about topics such as schema evolution and network architecture. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com