Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Tutorials

These expert-led presentations on Tuesday, September 11 give you a chance to dive deep into the subject matter. To attend tutorials, you must register for a Gold or Silver pass; does not include access to training courses on Tuesday.

Tuesday, September 11

9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Deep Learning, Text and Language processing and analysis
Garrett Hoffman (StockTwits)
Average rating: ****.
(4.75, 4 ratings)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 11 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
Mark Donsky (Okera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera), Ifigeneia Derekli (Cloudera), Camila Hiskey (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
New regulations such as GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Syed Rafice, Mubashir Kazia, Ifigeneia Derekli, and Camila Hiskey share hands-on best practices for meeting these challenges, with special attention paid to GDPR. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 10 Level: Intermediate
David Arpin (Amazon Web Services)
Average rating: **...
(2.80, 10 ratings)
David Arpin walks you through building a machine learning application, from data manipulation to algorithm training to deployment to a real-time prediction endpoint, using Spark and Amazon SageMaker. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 12/14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Joshua Poduska (Domino Data Lab), Patrick Harrison (S&P Global)
Average rating: ****.
(4.29, 7 ratings)
The honeymoon era of data science is ending, and accountability is coming. Successful data science leaders deliver measurable impact on an increasing share of an enterprise’s KPIs. Joshua Poduska and Patrick Harrison detail how leading organizations have taken a holistic approach to people, process, and technology to build a sustainable competitive advantage Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 14 Level: Beginner
Viviana Acquaviva (CUNY New York City College of Technology)
Average rating: ****.
(4.75, 4 ratings)
Using interesting, diverse publicly available datasets and actual problems in astronomy research, Viviana Acquaviva leads an intermediate tutorial on machine learning. You'll learn how to customize algorithms and evaluation metrics required by scientific applications and discover best practices for choosing, developing, and evaluating machine learning algorithms in "real-world" datasets. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 09 Level: Intermediate
James Bednar (Anaconda)
Average rating: ****.
(4.60, 5 ratings)
Python lets you solve data science problems by stitching together packages from the Python ecosystem, but it can be difficult to assemble the right tools to solve real-world problems. James Bednar walks you through using the 15+ packages covered by the new PyViz.org initiative to make it simple to build interactive plots and dashboards, even for large, streaming, and highly multidimensional data. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 07/08 Level: Intermediate
Tim Berglund (Confluent)
Average rating: ****.
(4.33, 3 ratings)
Tim Berglund leads this solid introduction to Apache Kafka as a streaming data platform. You'll cover the internal architecture, APIs, and platform components like Kafka Connect and Kafka Streams, then finish with an exercise processing streaming data using KSQL, the new SQL-like declarative stream processing language for Kafka. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Data Platforms
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Average rating: ***..
(3.12, 8 ratings)
Arun Kejariwal and Karthik Ramasamy lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, covering messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. They also share case studies from the IoT, gaming, and healthcare and their experience operating these systems at internet scale. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Data Platforms
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Average rating: ***..
(3.50, 10 ratings)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 06 Level: Intermediate
Secondary topics:  Model lifecycle management
Dan Crankshaw (UC Berkeley RISELab)
Average rating: *****
(5.00, 1 rating)
Dan Crankshaw offers an overview of the current challenges in deploying machine applications into production and the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system and shows you how to get started. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Ethics and Privacy, Health and Medicine
Patrick Hall (H2O.ai | George Washington University), Avni Wadhwa (H20.ai), Mark Chan (H2O.ai)
Average rating: ****.
(4.50, 4 ratings)
Transparency, auditability, and stability are crucial for business adoption and human acceptance of complex machine learning models. Patrick Hall, Avni Wadhwa, and Mark Chan share practical and productizable approaches for explaining, testing, and visualizing machine learning models using open source, Python-friendly tools such as GraphViz, H2O, and XGBoost. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Recommendation Systems
Dr. Vijay Srinivas Agneeswaran (Publicis Sapient), Abhishek Kumar (Publicis.Sapient)
Average rating: ****.
(4.40, 5 ratings)
Abhishek Kumar and Vijay Srinivas Agneeswaran offer an introduction to deep learning-based recommendation and learning-to-rank systems using TensorFlow. You'll learn how to build a recommender system based on intent prediction using deep learning that is based on a real-world implementation for an ecommerce client. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 08
Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Amro Alkhatib (National Health Insurance Company-Daman), Mridul Mishra (Fidelity Investments), Patrick Angeles (Cloudera), James Psota (Panjiva ), Andreas Kohlmaier (Munich Re), Paul Lashmet (Arcadia Data), Nick Curcuru (Mastercard), Robin Way (Corios), Theresa Johnson (Airbnb), Jane Tran (Unqork), Swatee Singh (American Express)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1E 10
Paco Nathan (derwen.ai), Katharina Warzel (EveryMundo), Mike Berger (Mount Sinai Health System), Sam Helmich (Deere & Company), Stephanie Fischer (datanizing GmbH), Maryam Jahanshahi (TapRecruit), Greg Quist (SmartCover Systems), Ann Nguyen (Whole Whale), Steve Otto (Navistar), Jennifer Lim (Cerner), S Anand (Gramener), Ian Brooks (Hortonworks)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Deep Learning
Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera)
Average rating: *....
(1.00, 1 rating)
Vartika Singh, Alan Silva, Alex Bleakley, Steven Totman, Mirko Kämpf, and Syed Nasar outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
Average rating: ***..
(3.00, 7 ratings)
David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial for scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 23/24 Level: Intermediate
Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
Average rating: ***..
(3.67, 3 ratings)
Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead. You'll also discover a few ML model serving ideas along the way. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 06/07 Level: Advanced
Secondary topics:  Data Platforms
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Average rating: ***..
(3.12, 8 ratings)
Using Customer 360 and the internet of things as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, including components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Bruno Gonçalves (JPMorgan Chase & Co.)
Average rating: ***..
(3.14, 7 ratings)
Time series are everywhere around us. Understanding them requires taking into account the sequence of values seen in previous steps and even long-term temporal correlations. Join Bruno Gonçalves to learn how to use recurrent neural networks to model and forecast time series and discover the advantages and disadvantages of recurrent neural networks with respect to more traditional approaches. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 15/16 Level: Beginner
Secondary topics:  Machine Learning in the enterprise
Average rating: **...
(2.67, 9 ratings)
Janet Forbes, Danielle Leighton, and Lindsay Brin lead a primer on crafting well-conceived data science projects that uncover valuable business insights. Using case studies and hands-on skills development, Janet, Danielle, and Lindsay walk you through essential techniques for effecting real business change. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 10 Level: Intermediate
Jeroen Janssens (Data Science Workshops B.V.)
Average rating: ***..
(3.00, 3 ratings)
The Unix command line remains an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful command-line tools, you can quickly scrub, explore, and model your data as well as hack together prototypes. Join Jeroen Janssens for a hands-on workshop based on his book Data Science at the Command Line. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 09 Level: Intermediate
Secondary topics:  Model lifecycle management
Brian Foo (Google), Holden Karau (Google), Jay Smith (Google)
Average rating: **...
(2.00, 7 ratings)
TensorFlow and Keras are popular libraries for training deep models due to hardware accelerator support. Brian Foo, Jay Smith, and Holden Karau explain how to bring deep learning models from training to serving in a cloud production environment. You'll learn how to unit-test, export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 06 Level: Intermediate
Carolyn Duby (Hortonworks)
Carolyn Duby shows you how to find the cybersecurity threat needle in your event haystack using Apache Metron: a real-time, horizontally scalable open source platform. After this interactive overview of the platform's major features, you'll be ready to analyze your own haystack back at the office. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 11 Level: Intermediate
Secondary topics:  Ethics and Privacy
Aileen Nielsen (Skillman Consulting)
Average rating: ****.
(4.00, 4 ratings)
There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is reproducing or even amplifying existing prejudices and social inequalities. Aileen Nielsen demonstrates how to identify and avoid bias and other unfairness in your analyses. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 14 Level: Intermediate
Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera)
Attend this tutorial to learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows and explore considerations and best practices for data analytics pipelines in the cloud. Along the way, you'll see how to share metadata across workloads in a big data PaaS. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 12/13 Level: Intermediate
Jorge A. Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Faria Bruno (Amazon Web Services)
Average rating: **...
(2.86, 7 ratings)
Want to learn how to use Amazon's big data web services to launch your first big data application in the cloud? Jorge Lopez, Radhika Ravirala, Paul Sears, and Bruno Faria walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.