Sep 23–26, 2019

Tutorials

These expert-led presentations on Tuesday, September 24 give you a chance to dive deep into the subject matter. Please note: to attend tutorials, you must be registered for a Gold or Silver pass; does not include access to training courses on Monday or Tuesday.

Learn more about a free 1-day training course, Machine Learning for the Enterprise (sponsored by IBM), available to Bronze pass holders.

Tuesday, September 24

9:00am12:30pm
Location: 1E 10
Viktor Gamov (Confluent)
Building stream processing applications is certainly one of the hot topics in the IT community. But if you've ever thought you needed to be a programmer to do stream processing and build stream processing data pipelines, think again. Viktor Gamov explores KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
9:00am12:30pm
Location: 1E 15/16
Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera)
New regulations drive compliance, governance, and security challenges for big data. Infosec and security groups must ensure a secured and governed environment across workloads that span on-premises, private cloud, multicloud, and hybrid cloud. Mark Donsky, Lars George, Michael Ernest, and Ifigeneia Derekli outline hands-on best practices for meeting these challenges with special attention to CCPA. Read more.
9:00am12:30pm
Location: 1A 23/24
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but you need to decide what to do when you run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, explores some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python, including NLTK, TextBlob, spaCy, and gensim. Read more.
9:00am12:30pm
Location: 1A 10
Rossella Blatt Vital (Wonderlic), Ross Piper (Wonderlic), Daniel Schmerling (Wonderlic)
Creating and leading a successful ML strategy is an elegant orchestration of many components: master key ML concepts, operationalize ML workflow, prioritize highest-value projects, build a high-performing team, nurture strategic partnerships, align with the company’s mission, etc. Rossella Blatt Vital details insights and lessons learned in how to create and lead a flourishing ML practice. Read more.
9:00am12:30pm
Location: 1A 12/14
Sourav Dey (Manifold), Jakov Kucan (Manifold)
Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You'll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. Read more.
9:00am12:30pm
Location: 1E 12/13
Bruno Goncalves (Data For Science, Inc)
You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Gonçalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice. Read more.
9:00am12:30pm
Location: 1E 14
James Morantus (Cloudera), Tony Huinker (Cloudera), Naren Koneru (Cloudera), Ramachandran Venkatesh (Cloudera), Gunther Hagleitner (Cloudera), Olli Draese (Cloudera)
Organizations now run diverse, multidisciplinary, big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature. There are many challenges with moving these workloads to the cloud. In this talk we start off with a technical deep... Read more.
9:00am12:30pm
Location: 1E 09
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.
9:00am12:30pm
Location: 1A 21
Jules Damji (Databricks)
ML development brings many new complexities beyond the software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information. Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. Read more.
9:00am12:30pm
Location: 1E 11
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera), Andre Araujo (Cloudera), Hemanth Yamijala (Cloudera)
There are too many edge devices and agents, and you need to control and manage them. Purnima Reddy Kuchikulla, Timothy Spann, Abdelkrim Hadjidj, and Andre Araujo walk you through handling the difficulty in collecting real-time data and the trouble with updating a specific set of agents with edge applications. Get your hands dirty with CEM, which addresses these challenges with ease. Read more.
9:00am12:30pm
Location: 1E 08
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
9:00am5:00pm
Location: 1A 08
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Brian Lynch (TD Bank Group), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Karan Jaswal (Cinchy), Moto Tohda (Tokyo Century (USA)), Viridiana Lourdes (Ayasdi), Peter Swartz (Altana Trade), Mikheil Nadareishvili (TBC Bank)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
9:00am5:00pm
Location: 1A 06
David Boyle (Audience Strategies), Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Capital One), Victoriya Kalmanovich (Navy), Tusharadri Mukherjee (Lenovo), David Boyle (Audience Strategies), Richard Evans (Statistics Canada), Leah Xu (Spotify), Victoriya Kalmanovich (Navy), Moise Convolbo (Rakuten), Martin Mendez-Costabel (Bayer Crop Science), gloria macia (Roche AG), Gwen Campbell (Revibe Technologies), Moise Convolbo (Rakuten), Muhammed Idris (Capria VC | TeraCrunch)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata. Read more.
1:30pm5:00pm
Location: 1A 10
Alexander Izydorczyk (Coatue Managment), Benjamin Singleton (JetBlue), Joshua Poduska (Domino Data Lab)
The honeymoon era of data science is ending and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. The speakers explore how leading organizations take a holistic approach to people, process, and technology to build a sustainable advantage. Read more.
1:30pm5:00pm
Location: 1E 15/16
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
Boris Lublinsky and Dean Wampler examine ML use in streaming data pipelines, how to do periodic model retraining, and low-latency scoring in live streams. Learn about Kafka as the data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, metadata tracking, and more. Read more.
1:30pm5:00pm
Location: 1A 23/24
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, Saif Addin Ellafi, and Claudiu Branzan walk you through state-of-the-art natural language processing (NLP) using the highly performant, highly scalable open source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:30pm5:00pm
Location: 1E 11
Sophie Watson (Red Hat), William Benton (Red Hat)
Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
1:30pm5:00pm
Location: 1E 10
Ted Malaska (Capital One), Jonathan Seidman (Cloudera), Matthew Schumpert (Cloudera, Inc.), Raman Rajasekhar (Cloudera Inc), Krishna Maheshwari (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this has led to new challenges for organizations in creating successful data practices. Ted Malaska and Jonathan Seidman detail guidelines and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
1:30pm5:00pm
Location: 1E 08
Carolyn Duby (Cloudera), Madhan Neethiraj (Cloudera), Michael Gregory (Cloudera), Sangeeta Doraiswamy (cloudera)
Bring your laptop, roll up your sleeves, and get ready to crunch some cybersecurity events with Apache Metron, an open source big data cybersecurity platform. Carolyn Duby walks you through how Metron finds actionable events in real time. Read more.
1:30pm5:00pm
Location: 1E 09
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.
1:30pm5:00pm
Location: 1A 21
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.
1:30pm5:00pm
Location: 1E 14
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera), Attila Kanto (Cloudera), Tony Wu (Cloudera)
Kafka is omnipresent and the backbone of streaming analytics applications and data lakes. The challenge is understanding what's going on overall in the Kafka cluster, including performance, issues, and message flows. Purnima Reddy Kuchikulla and Dan Chaffelson walk you through a hands-on experience to visualize the entire Kafka environment end-to-end and simplify Kafka operations via SMM. Read more.
1:30pm5:00pm
Location: 1E 12/13
Mark Madsen (Teradata), Todd Walter (Archimedata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
1:30pm5:00pm
Location: 1A 12/14
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include Word2Vec, recurrent neural networks (RNNs) and variants (long short-term memory [LSTM] and gated recurrent unit [GRU]), and convolutional neural networks. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires