Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Tutorials

These expert-led presentations on Tuesday, March 26 give you a chance to dive deep into the subject matter. Please note: to attend tutorials, you must register for a Gold or Silver pass; does not include access to training courses on Monday or Tuesday.

Tuesday, March 26

9:00am12:30pm Tuesday, March 26, 2019
Location: 2007
Secondary topics:  Data Integration and Data Pipelines, Data preparation, data governance, and data lineage, Model lifecycle management
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
Average rating: ***..
(3.85, 13 ratings)
Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipeline and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2002
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Martin Gorner (Google)
Average rating: ****.
(4.50, 4 ratings)
Martin Gorner leads a hands-on introduction to recurrent neural networks and TensorFlow. Join in to discover what makes RNNs so powerful for time series analysis. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2009
Secondary topics:  Deep Learning, Text and Language processing and analysis
David Talby (Pacific AI), Alex Thomas (Indeed), Claudiu Branzan (Accenture AI)
Average rating: ****.
(4.75, 8 ratings)
David Talby, Alex Thomas, and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2005
Mark Madsen (Teradata), Todd Walter (Teradata)
Average rating: ****.
(4.21, 28 ratings)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2006
Secondary topics:  AI and machine learning in the enterprise
Jonathan Seidman (Cloudera), Ted Malaska (Capital One)
Average rating: ****.
(4.00, 6 ratings)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. Jonathan Seidman and Ted Malaska share guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2004
Secondary topics:  Streaming, realtime analytics, and IoT
Fabian Hueske (Ververica)
Average rating: *****
(5.00, 1 rating)
Fabian Hueske offers an overview of Apache Flink via the SQL interface, covering stream processing and Flink's various modes of use. Then you'll use Flink to run SQL queries on data streams and contrast this with the Flink DataStream API. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2001
Secondary topics:  Ethics, Security and Privacy
Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)
Average rating: *****
(5.00, 3 ratings)
From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns about a host of issues, including bias, transparency, and privacy. Iman Saleh, Cory Ilo, and Cindy Tseng demonstrate tools and capabilities that can help data scientists address these concerns and bridge the gap between ethicists, regulators, and machine learning practitioners. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2003
Secondary topics:  AI and machine learning in the enterprise
Joshua Poduska (Domino Data Lab), Kimberly Shenk (NakedPoppy), Mac Steele (Domino Data Lab)
Average rating: ****.
(4.60, 15 ratings)
The honeymoon era of data science is ending, and accountability is coming. Successful data science leaders must deliver measurable impact on an increasing share of an enterprise's KPIs. Joshua Poduska, Kimberly Shenk, and Mac Steele explain how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
9:00am12:30pm Tuesday, March 26, 2019
Location: 2008
Secondary topics:  Data preparation, data governance, and data lineage, Storage
Santosh Kumar (Cloudera), Andre Araujo (Cloudera), Wim Stoop (Cloudera)
Average rating: *****
(5.00, 1 rating)
Cloudera SDX provides unified metadata control, simplifies administration, and maintains context and data lineage across storage services, workloads, and operating environments. Santosh Kumar, Andre Araujo, and Wim Stoop offer an overview of SDX before diving deep into the moving parts and guiding you through setting it up. You'll leave with the skills to set up your own SDX. Read more.
9:00am5:00pm Tuesday, March 26, 2019
Location: 2022
Alex Kudriashova (Astro Digital), Jonathan Francis (Starbucks), JoLynn Lavin (General Mills), Robin Way (Corios), June Andrews (GE), Kyungtaak Noh (SK Telecom), Taposh DuttaRoy (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente), Craig Rowley (Columbia Sportswear), Ambal Balakrishnan (IBM), Benjamin Glicksberg (UCSF), Patrick Lucey (STATS), Rhonda Textor (True Fit)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
9:00am5:00pm Tuesday, March 26, 2019
Location: 2024
Susan Etlinger (Altimeter Group), Alistair Croll (Solve For Interesting), Susan Etlinger (Altimeter Group), Jake Metcalf (Ethical Resolve), Emanuel Moss (Data & Society), Bradley Voytek (UC San Diego ), Jonathan Foster (Microsoft), Yiannis Kanellopoulos (Code4Thought), Kathy Baxter (Salesforce), Bulbul Gupta (Socos Labs), Brian Rieger (Labelbox), Carole Piovesan (INQ Data Law), Jana Eggers (Nara Logics), Irina Raicu (Santa Clara University), Brian Green (Santa Clara University), Alistair Croll (Solve For Interesting), Susan Etlinger (Altimeter Group), Tim O'Reilly (O'Reilly Media), Bradley Voytek (UC San Diego ), Jana Eggers (Nara Logics), Jonathan Foster (Microsoft), Brian Rieger (Labelbox), Rachel Thomas (fast.ai), Yiannis Kanellopoulos (Code4Thought), Rumman Chowdhury (Accenture), Kathy Baxter (Salesforce), Carole Piovesan (INQ Data Law), Stuart Buck (Arnold Ventures)
In this day-long event, academics, practitioners, and innovators dive deep into the thorny issues of data, privacy, bias, and morality that are at the forefront of today's headlines. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2004
Secondary topics:  Streaming, realtime analytics, and IoT
Matt Fuller (Starburst)
Average rating: ***..
(3.57, 7 ratings)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2001
Secondary topics:  Ethics
Patrick Hall (H2O.ai | George Washington University)
Average rating: ****.
(4.00, 9 ratings)
If machine learning can lead to financial gains for your organization, why isn’t everyone doing it? One reason is training machine learning systems with transparent inner workings and auditable predictions is difficult. Patrick Hall details the good, bad, and downright ugly lessons learned from his years of experience implementing solutions for interpretable machine learning. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2002
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Model lifecycle management
Abhishek Kumar (Publicis Sapient), Pramod Singh (Publicis Sapient)
Average rating: ****.
(4.17, 6 ratings)
Abhishek Kumar and Pramod Singh walk you through deep learning-based recommender and personalization systems they've built for clients. Join in to learn how to use TensorFlow Serving and MLflow for end-to-end productionalization, including model serving, Dockerization, reproducibility, and experimentation, and Kubernetes for deployment and orchestration of ML-based microarchitectures. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2008
Secondary topics:  AI and Data technologies in the cloud
Jason Wang (Cloudera), Brandon Freeman (Cloudera), Michael Kohs (Cloudera), Akihiro Ishikawa (Cloudera), Toby Ferguson (Cloudera)
Average rating: ***..
(3.20, 5 ratings)
There are many challenges with moving multidisciplinary big data workloads to the cloud and running them. Jason Wang, Brandon Freeman, Michael Kohs, Akihiro Nishikawa, and Toby Ferguson explore cloud architecture and its challenges and walk you through using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2003
Secondary topics:  AI and machine learning in the enterprise, Ethics, Security and Privacy
Andrew Burt (Immuta), Steven Touw (Immuta), richard geering (Immuta), Joseph Regensburger (Immuta), Alfred Rossi (Immuta)
Average rating: *****
(5.00, 2 ratings)
As ML becomes increasingly important for businesses and data science teams alike, managing its risks is quickly becoming one of the biggest challenges to the technology’s widespread adoption. Join Andrew Bur, Steven Touw, Richard Geering, Joseph Regensburger, and Alfred Rossi for a hands-on overview of how to train, validate, and audit machine learning models (ML) in practice. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2006
Secondary topics:  AI and machine learning in the enterprise
Sourav Dey (Manifold), Alex Ng (Manifold)
Average rating: ****.
(4.25, 4 ratings)
Many teams are still run as if data science is mainly about experimentation, but those days are over. Now it must offer turnkey solutions to take models into production. Sourav Day and Alex Ng explain how to streamline an ML project and help your engineers work as an integrated part of your production teams, using a Lean AI process and the Orbyter package for Docker-first data science. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2007
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Holden Karau (Google), Francesca Lazzeri (Microsoft), Trevor Grant (IBM)
Average rating: ***..
(3.00, 2 ratings)
Holden Karau, Francesca Lazzeri, and Trevor Grant offer an overview of Kubeflow and walk you through using it to train and serve models across different cloud environments (and on-premises). You'll use a script to do the initial setup work, so you can jump (almost) straight into training a model on one cloud and then look at how to set up serving in another cluster/cloud. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2011
Secondary topics:  AI and machine learning in the enterprise
Chi-Yi Kuan (LinkedIn), Tiger Zhang (LinkedIn), Xiaojing Dong (LinkedIn), Burcu Baran (LinkedIn), Emily Huang (LinkedIn)
Average rating: ****.
(4.43, 14 ratings)
Thanks to the rapid growth in data resources, business leaders now appreciate the importance (and the challenge) of mining information from data. Join in as a group of LinkedIn's data scientists share their experiences successfully leveraging emerging techniques to assist in intelligent decision making. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2009
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Jason Dai (Intel), Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel), Guoqiong Song (Intel)
Average rating: ***..
(3.00, 6 ratings)
Jason Dai, Yuhao Yang, Jennie Wang, and Guoqiong Song explain how to build and productionize deep learning applications for big data with Analytics Zoo—a unified analytics and AI platform that seamlessly unites Spark, TensorFlow, Keras, and BigDL programs into an integrated pipeline—using real-world use cases from JD.com, MLSListings, the World Bank, Baosight, and Midea/KUKA. Read more.
1:30pm5:00pm Tuesday, March 26, 2019
Location: 2005
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Storage, Streaming, realtime analytics, and IoT
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Average rating: **...
(2.67, 12 ratings)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.