Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Big data and data science in the cloud sessions

1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 12/13 Level: Intermediate
Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Faria Bruno (Amazon Web Services)
Average rating: **...
(2.86, 7 ratings)
Want to learn how to use Amazon's big data web services to launch your first big data application in the cloud? Jorge Lopez, Radhika Ravirala, Paul Sears, and Bruno Faria walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 14 Level: Intermediate
Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera)
Attend this tutorial to learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows and explore considerations and best practices for data analytics pipelines in the cloud. Along the way, you'll see how to share metadata across workloads in a big data PaaS. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Secondary topics:  Data Platforms
Ryan Blue (Netflix), Daniel Weeks (Netflix)
Average rating: *****
(5.00, 3 ratings)
In the last few years, Netflix's data warehouse has grown to more than 100 PB in S3. Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Greg Rahn (Cloudera)
Average rating: *****
(5.00, 1 rating)
Cloud object stores are becoming the bedrock of cloud data warehouses for modern data-driven enterprises, and it's become a necessity for data teams to have the ability to directly query data stored in S3 or ADLS. Greg Rahn and Mostafa Mokhtar discuss optimal end-to-end workflows and technical considerations for using Apache Impala over object stores for your cloud data warehouse. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Beginner
Secondary topics:  Data Integration and Data Pipelines, Financial Services
Do your analysts always trust the insights generated by your data platform? Ensuring insights are always reliable is critical for use cases in the financial sector. Sandeep Uttamchandani outlines a circuit breaker pattern developed for data pipelines, similar to the common design pattern used in service architectures, that detects and corrects problems and ensures always reliable insights. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising
Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )
Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Ramesh Krishnan (lmco), Steven Morgan (Lockheed Martin)
Average rating: ****.
(4.00, 1 rating)
Lockheed Martin is a data-driven company with a massive variety and volume of data. To extract the most value from its information assets, the company is constantly exploring ways to enable effective self-service scenarios. Ramesh Krishnan and Steve Morgan discuss Lockheed Martin's journey into modern analytics and explore its analytics platform focused on leveraging AWS GovCloud. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Beginner
Jonathan Ellis (DataStax)
Average rating: ****.
(4.50, 2 ratings)
Is open source Apache Cassandra still relevant in an era of hosted cloud databases? Jonathan Ellis discusses Cassandra’s strengths and weaknesses relative to Amazon DynamoDB, Microsoft CosmosDB, and Google Cloud Spanner. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Beginner
Secondary topics:  Deep Learning
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
Average rating: *****
(5.00, 3 ratings)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Recommendation Systems
Nir Yungster (JW Player), Kamil Sindi (JW Player)
JW Player—the world’s largest network-independent video platform, representing 5% of global internet video—provides on-demand recommendations as a service to thousands of media publishers. Nir Yungster and Kamil Sindi explain how the company is systematically improving model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Beginner
Secondary topics:  Data Integration and Data Pipelines
Kenji Hayashida (Recruit Lifestyle co., ltd.), Toru Sasaki (NTT DATA Corporation)
Average rating: ****.
(4.50, 2 ratings)
Recruit Group and NTT DATA Corporation have developed a platform based on a data hub, utilizing Apache Kafka. This platform can handle around 1 TB/day of application logs generated by a number of services in Recruit Group. Kenji Hayashida and Toru Sasaki share best practices for and lessons learned about topics such as schema evolution and network architecture. Read more.