Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Tutorials

These expert-led presentations on Tuesday, March 26 give you a chance to dive deep into the subject matter. Please note: to attend tutorials, you must register for a Gold or Silver pass; does not include access to training courses on Monday or Tuesday.

Tuesday, March 26

Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2007
Secondary topics:  Data Integration and Data Pipelines, Data preparation, data governance, and data lineage, Model lifecycle management
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2011
Secondary topics:  AI and Data technologies in the cloud, Deep Learning, Media, Marketing, Advertising
David Arpin (Amazon Web Services)
Learn how to use the Amazon SageMaker platform to build a machine learning model to recommend products to customers based on their past preferences. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2002
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Martin Gorner (Google)
Hands-on with Recurrent Neural Networks and Tensorflow. Discover what makes RNNs so powerful for time series analysis. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2009
Secondary topics:  Deep Learning, Text and Language processing and analysis
David Talby (Pacific AI), Alexander Thomas (Indeed), Claudiu Branzan (G2 Web Services)
This is a hands-on tutorial for scalable NLP using the highly performant, highly scalable open-source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2005
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2006
Secondary topics:  AI and machine learning in the enterprise
Jonathan Seidman (Cloudera), Ted Malaska (Capital One)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2004
Secondary topics:  Streaming, realtime analytics, and IoT
Jeff Bean (dA)
This hands-on session introduces Flink via the SQL interface. You will receive an overview of stream processing, and a survey of Apache Flink with its various modes of use. Then we’ll use Flink to run SQL queries on data streams and contrast this with the Flink data stream API. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2001
Secondary topics:  Ethics, Security and Privacy
Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)
From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns including bias, transparency, and privacy. In this tutorial, we will demonstrate tools and capabilities that can help data scientists address these concerns. The tools help bridge the gap between ethicists and regulators, and machine learning practitioners. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2003
Secondary topics:  AI and machine learning in the enterprise
Joshua Poduska (Domino Data Lab)
The honeymoon era of data science is ending; accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Join Joshua Poduska to learn how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Location: 2008
Secondary topics:  Data preparation, data governance, and data lineage, Storage
Santosh Kumar (Cloudera)
Cloudera SDX provides unified metadata control, simplifies administration, and maintains context as well as data lineage across storage services, workloads, and operating environments. In this 3h tutorial, we cover the background to SDX, before diving deep into the moving parts and also get hands on in setting it up. You'll leave with all the skills and experience you need to setup your own SDX. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, March 26, 2019
Location: 2022
Alex Kudriashova (Astro Digital), Jonathan Francis (Starbucks), JoLynn Lavin (General Mills, Inc), Robin Way (Corios), June Andrews (GE), kyungtaak Noh (SK Telecom), Taposh Dutta Roy (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente), Craig Rowley (Columbia Sportswear), Ambal Balakrishnan (IBM), Benjamin Glicksberg (UCSF)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, March 26, 2019
Location: 2024
Susan Etlinger (Altimeter Group), Alistair Croll (Solve For Interesting), Shannon Vallor (Santa Clara University), Danielle Cass (Workday), Susan Etlinger (Altimeter Group), Bradley Voytek (UC San Diego and Uber, Inc.), Jana Eggers (Nara Logics), Yiannis Kanellopoulos (Code4Thought), Kathy Baxter (Salesforce)
In this day-long event, academics, practitioners, and innovators dive deep into the thorny issues of data, privacy, bias, and morality that are at the forefront of today's headlines. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2004
Secondary topics:  Streaming, realtime analytics, and IoT
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL-on-Anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from Gigabytes to Petabytes. In this tutorial, attendees will learn Presto usages, best practices, and optional hands on exercises. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2001
Secondary topics:  Ethics
Patrick Hall (H2O.ai | George Washington University)
If machine learning can lead to financial gains for your organization why isn’t everyone doing it? One reason is training machine learning systems with transparent inner-workings and auditable predictions is difficult. This talk will present the good, bad, and downright ugly lessons learned from the presenters’ years of experience in implementing solutions for interpretable machine learning. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2002
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Model lifecycle management
Abhishek Kumar (Publicis.Sapient), Dr. Vijay Srinivas Agneeswaran (Publicis Sapient)
This tutorial describes deep learning based recommender and personalisation systems that we have built for clients. The tutorial primarily gives the view of TensorFlow Serving and MLFlow for the end-to-end productionalization, including model serving, dockerization, reproducibility and experimentation plus how to use Kubernetes for deployment and orchestration of ML based micro-architectures. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2008
Secondary topics:  AI and Data technologies in the cloud
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2003
Secondary topics:  AI and machine learning in the enterprise, Ethics, Security and Privacy
Andrew Burt (Immuta), Steve Touw (Immuta), Richard Geering (Immuta), Joe Regensburger (Immuta), Alfred Rossi (Immuta)
This tutorial will provide a hands on overview of how to train, validate and audit machine learning models (ML) in practice. As ML becomes increasingly important for businesses and data science teams alike, managing its risks is quickly becoming one of the biggest challenges to the technology’s widespread adoption. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2006
Secondary topics:  AI and machine learning in the enterprise
Sourav Dey (Manifold), Alex Ng (Manifold)
Many teams are still run as if data science is mainly about experimentation, but those days are over. Now it must be turnkey to take models into production. Sourav Day and Alex Ng explain how to streamline a machine learning project and help your engineers work as an an integrated part of your production teams, using a Lean AI process and the Orbyter package for Docker-first data science. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2007
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Holden Karau (Google), Francesca Lazzeri (Microsoft), Trevor Grant (IBM), Ilan Filonenko (Bloomberg LP)
This workshop will quickly introduce what Kubeflow is, and how we can use it to train and serve models across different cloud environments (and on-prem). We’ll have a script to do the initial set up work ready so you can jump (almost) straight into training a model on one cloud, and then look at how to set up serving in another cluster/cloud. We will start with a simple model w/follow up links. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2011
Secondary topics:  AI and machine learning in the enterprise
Chi-Yi Kuan (LinkedIn), Yongzheng Zhang (LinkedIn), Julie Wang (LinkedIn), Xiaojing Dong (LinkedIn), Wei Di (LinkedIn)
Thanks to the rapid growth in data resources, it is common for business leaders to appreciate the challenge and importance in mining the information from data. In this tutorial, a group of well respected data scientists would share with you their experiences and success on leveraging the emerging techniques in assisting intelligent decisions, that would lead to impactful outcomes at LinkedIn. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2009
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Jason Dai (Intel), Yuhao Yang (Intel), Jennie Wang (Intel), Guoqiong Song (Intel)
In this tutorial, we will show how to build and productionize deep learning applications for Big Data using "Analytics Zoo":https://github.com/intel-analytics/analytics-zoo (a unified analytics + AI platform that seamlessly unites Spark, TensorFlow, Keras and BigDL programs into an integrated pipeline) using real-world use cases (such as JD.com, MLSListings, World Bank, Baosight, Midea/KUKA, etc.) Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2005
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Storage, Streaming, realtime analytics, and IoT
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). In this tutorial we shall lead the audience through a journey of the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline - messaging, compute and storage - for real-time data and algorithms to extract insights - e.g., heavy-hitters, quantiles - from data streams. Read more.