Sep 23–26, 2019

Tutorials

These expert-led presentations on Tuesday, September 24 give you a chance to dive deep into the subject matter. Please note: to attend tutorials, you must be registered for a Gold or Silver pass; does not include access to training courses on Monday or Tuesday.

Tuesday, September 24

Add to your personal schedule
9:00am12:30pm
Location: 1E 10
Secondary topics:  Data Integration and Data Processing, Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
Ricardo Ferreira (Confluent)
Building stream processing applications are certainly one of the hot topics among the IT community. Though a lot has been talked about this subject, one might say that building stream processing applications is the new sex during teenage. This tutorial aims to change this by introducing KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1E 15/16
Secondary topics:  Privacy and Security
Mark Donsky (Okera)
New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span on-prem, private cloud, multi-cloud, and hybrid cloud. We will share hands-on best practices for meeting these challenges, with special attention to CCPA. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1A 23/24
Secondary topics:  Text and Language processing and analysis
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but what happens when we run into text data? In this tutorial, I will walk through the steps to turn text data into a format that a machine can understand, share some of the most popular text analytics techniques, and showcase several natural language processing (NLP) libraries in Python including NLTK, TextBlob, spaCy and gensim. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1A 10
Secondary topics:  Culture and Organization
Rossella Blatt Vital (Wonderlic)
Creating and leading a successful ML strategy is an elegant orchestration of many components: master the key ML concepts, operationalize the ML workflow, prioritize highest value projects, build a high performing team, nurture strategic partnerships, align with the company’s mission, etc. This tutorial aims to share insights and lessons learned in how to create and lead a flourishing ML practice. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1A 12/14
Secondary topics:  Culture and Organization, Model Development, Governance, Operations
Sourav Dey (Manifold), Jakov Kucan (Manifold)
In this tutorial, we will walk through the six steps of our Lean AI process and explain how they help your ML engineers work as an an integrated part of your development and production teams. We will also walk through a hands-on example using real-world data from one of our client companies, so you can get up and running with Docker and Orbyter and see first-hand how streamlined they can make... Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1E 08
Secondary topics:  Deep Learning
Bruno Goncalves (Data For Science, Inc)
Students will learn, in a hands-on way, the theoretical foundations and principal ideas underlying Deep Learning and Neural Networks. The code structure of the implementations provided is meant to closely resemble he way Keras is structured so that by the end of the course, students will be prepared to dive deeper into the deep learning applications of their choice. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1E 14
Secondary topics:  Cloud Platforms and SaaS, Data Management and Storage
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1E 09
Secondary topics:  Cloud Platforms and SaaS, Data, Analytics, and AI Architecture, Streaming and IoT, Temporal data and time-series analytics
Arun Kejariwal (Facebook), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
In this tutorial, we shall walk the audience through the landscape of streaming systems and overview the inception and growth of the serverless paradigm. Next, we shall present a deep dive of Apache Pulsar which provides native serverless support in the form of Pulsar functions and paint a bird’s eye view of the application domains where Pulsar functions can be leveraged. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1A 21/22
Secondary topics:  Model Development, Governance, Operations
Jules Damji (Databricks)
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1E 11
Secondary topics:  Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera)
Too many edge devices and agents. How does one control and manage them. How do we have handle the difficulty in collecting real-time data and most importantly, the trouble with updating specific set of agents with edge applications. Get your hands dirty with Cloudera Edge Management that addresses these challenges with ease. Read more.
Add to your personal schedule
9:00am12:30pm
Location: 1E 12/13
Secondary topics:  Data Management and Storage, Deep dive into specific tools, platforms, or frameworks
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
Add to your personal schedule
9:00am5:00pm
Location: 1A 08
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Moto Tohda (Tokyo Century (USA) Inc.), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
9:00am5:00pm
Location: 1A 06
Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Capital One), Victoriya Kalmanovich (Navy), Shreya Sharma (Expedia Inc.), Martin Mendez-Costabel (Bayer Crop Science), Gloria Macia (Roche AG), Gwen Campbell (Revibe Technologies, Inc), Moise Convolbo (Rakuten), Muhammed Idris (Capria VC | TeraCrunch)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1A 10
Secondary topics:  Culture and Organization
Mac Steele (Domino Data Lab), Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. Attendees will learn how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 10
Secondary topics:  Model Development, Governance, Operations
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1A 23/24
Secondary topics:  Deep dive into specific tools, platforms, or frameworks, Text and Language processing and analysis
David Talby (Pacific AI), Alex Thomas (Indeed), Saif Addin Ellafi (John Snow Labs)
This is a hands-on tutorial on state-of-the-art NLP using the highly performant, highly scalable open-source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 08
Secondary topics:  Streaming and IoT, Temporal data and time-series analytics
Sophie Watson (Red Hat), William Benton (Red Hat)
In this hands-on workshop, we’ll introduce several data structures that let you answer interesting queries about massive data sets in fixed amounts of space and constant time. This seems like magic, but we'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 11
Secondary topics:  Culture and Organization
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 15/16
Secondary topics:  Privacy and Security
Carolyn Duby (Hortonworks)
Bring your laptop, roll up your sleeves, and get ready to crunch some cyber security events with Apache Metron, an open source big data cyber security platform. Learn how Metron finds actionable events in real time. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 12/13
Secondary topics:  Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt Cloud platforms such as AWS for agility, elasticity and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. In this session, you will learn important considerations in choosing the right database based on your use cases and access pattern while migrating an application or building a new application on cloud. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1A 21/22
Secondary topics:  Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
In this workshop we’ll introduce the Amazon SageMaker machine learning platform, followed by a high level discussion of recommender systems. Next we’ll dig into different machine learning approaches for recommender systems. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 14
Secondary topics:  Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera)
Kafka is omnipresent and is the backbone of not only streaming analytics applications but data lakes as well. The challenge is understanding what is going on overall in the Kafka cluster including performance, issues and message flows. This session gives a hands on experience to visualize their entire Kafka environment end-to-end and simplifies Kafka operations via SMM. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1E 09
Secondary topics:  Cloud Platforms and SaaS, Data, Analytics, and AI Architecture
Mark Madsen (Teradata), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
1:30pm5:00pm
Location: 1A 12/14
Secondary topics:  Deep Learning, Financial Services, Text and Language processing and analysis
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    Contact list

    View a complete list of Strata Data Conference contacts