Sep 23–26, 2019
 
1A 12/14
Add Streamlining a Machine Learning Project Team to your personal schedule
9:00am Tutorial Streamlining a Machine Learning Project Team Sourav Dey (Manifold), Alex Ng (Manifold)
Add Deep learning methods for natural language processing to your personal schedule
1:30pm Tutorial Deep learning methods for natural language processing Garrett Hoffman (StockTwits)
1A 15/16
Add Hands-on data science with Python (Day 2) to your personal schedule
9:00am Training Hands-on data science with Python (Day 2) Michael Cullan (The Data Incubator)
1A 21/22
Add Managing the Complete Machine Learning Lifecycle with MLflow to your personal schedule
9:00am Tutorial Managing the Complete Machine Learning Lifecycle with MLflow Jules Damji (Databricks)
Add Building a recommender system with Amazon ML Services to your personal schedule
1:30pm Tutorial Building a recommender system with Amazon ML Services Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
1A 23/24
Add Introduction to Natural Language Processing in Python to your personal schedule
9:00am Tutorial Introduction to Natural Language Processing in Python Alice Zhao (Metis)
Add Natural language understanding at scale with Spark NLP to your personal schedule
1:30pm Tutorial Natural language understanding at scale with Spark NLP David Talby (Pacific AI), Alex Thomas (Indeed), Saif Addin Ellafi (John Snow Labs)
1E 09
Add Serverless Streaming Architectures and Algorithms for the Enterprise to your personal schedule
9:00am Tutorial Serverless Streaming Architectures and Algorithms for the Enterprise Arun Kejariwal (Facebook), Karthik Ramasamy (Streamlio)
Add Architecting a data platform for enterprise use to your personal schedule
1:30pm Tutorial Architecting a data platform for enterprise use Mark Madsen (Teradata), Todd Walter (Teradata)
1E 12/13
Add Learning Presto: SQL on anything to your personal schedule
9:00am Tutorial Learning Presto: SQL on anything Matt Fuller (Starburst)
Add From relational databases to Cloud databases, using the right tool for the right job. to your personal schedule
1:30pm Tutorial From relational databases to Cloud databases, using the right tool for the right job. Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
1E 14
Add Running multidisciplinary big data workloads in the cloud to your personal schedule
9:00am Tutorial Running multidisciplinary big data workloads in the cloud Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Add Kafka/SMM(Streams Messaging Manager) Crash Course to your personal schedule
1:30pm Tutorial Kafka/SMM(Streams Messaging Manager) Crash Course Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera)
1A 01/02
Add Big data for managers (Day 2) to your personal schedule
9:00am Training Big data for managers (Day 2) Michael Li (The Data Incubator), Ana Hocevar (The Data Incubator)
1A 03
Add Recommendation System using Deep Learning (Day 2) to your personal schedule
9:00am Training Recommendation System using Deep Learning (Day 2) Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ Consulting)
1E 06
Add Professional Kafka development (Day 2) to your personal schedule
9:00am Training Professional Kafka development (Day 2) Jesse Anderson (Big Data Institute)
1E 15/16
Add Apache Metron: Open source cybersecurity at scale to your personal schedule
1:30pm Tutorial Apache Metron: Open source cybersecurity at scale Carolyn Duby (Hortonworks)
1A 17
Add Building a serverless big data application on AWS (Day 2) to your personal schedule
9:00am Training Building a serverless big data application on AWS (Day 2) Jorge Lopez (Amazon Web Services)
1A 18
1E 07
Add Machine learning from scratch in TensorFlow (Day 2) to your personal schedule
9:00am Training Machine learning from scratch in TensorFlow (Day 2) Dylan Bargteil (The Data Incubator)
1A 06
Add Data Case Studies to your personal schedule
9:00am Tutorial Data Case Studies Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Priceline), Victoriya Kalmanovich (Navy), Shreya Sharma (Expedia Inc.), Martin Mendez-Costabel (Bayer Crop Science), Gloria Macia (Roche AG), Gwen Campbell (Revibe Technologies, Inc), Moise Convolbo (Rakuten)
1A 08
Add Findata Day to your personal schedule
9:00am Tutorial Findata Day Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), elva fernandez (American Express), Moto Tohda (Tokyo Century (USA) Inc.), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi)
1A 10
Add Building and Leading a Successful AI Practice for your Organization  to your personal schedule
9:00am Tutorial Building and Leading a Successful AI Practice for your Organization Rossella Blatt Vital (Wonderlic)
Add Managing data science in the enterprise to your personal schedule
1:30pm Tutorial Managing data science in the enterprise Mac Steele (Domino Data Lab), Nick Elprin (Domino Data Lab)
1E 08
Add Deep Learning from Scratch to your personal schedule
9:00am Tutorial Deep Learning from Scratch Bruno Goncalves (Data For Science, Inc)
Add Sketching data and other magic tricks to your personal schedule
1:30pm Tutorial Sketching data and other magic tricks Sophie Watson (Red Hat), William Benton (Red Hat)
1E 10
Add Real-time SQL Stream Processing at Scale with Apache Kafka and KSQL to your personal schedule
9:00am Tutorial Real-time SQL Stream Processing at Scale with Apache Kafka and KSQL Ricardo Ferreira (Confluent)
Add Hands-on Machine Learning with Kafka-based Streaming Pipelines to your personal schedule
1:30pm Tutorial Hands-on Machine Learning with Kafka-based Streaming Pipelines Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
1E 11
Add IoT - Cloudera Edge Management to your personal schedule
9:00am Tutorial IoT - Cloudera Edge Management Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera)
Add Foundations for Successful Data Projects to your personal schedule
1:30pm Tutorial Foundations for Successful Data Projects Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Add Opening Reception to your personal schedule
5:00pm Opening Reception | Room: Expo Hall - 3B
12:30pm Lunch | Room: Lunch
10:30am Morning break sponsored by Microsoft | Room: Break
3:00pm Afternoon break sponsored by Dataiku | Room: Break
9:00am-12:30pm (3h 30m) Data Science, Machine Learning, & AI Culture and Organization, Model Development, Governance, Operations
Streamlining a Machine Learning Project Team
Sourav Dey (Manifold), Alex Ng (Manifold)
Many teams are still run as if data science is about experimentation, but those days are over. Now it must offer turnkey solutions to take models into production. We'll explain how to streamline a ML project and help your engineers work as an integrated part of production teams, using a Lean AI process and the Orbyter package for Docker-first data science.
1:30pm-5:00pm (3h 30m) Data Science, Machine Learning, & AI Deep Learning, Financial Services, Text and Language processing and analysis
Deep learning methods for natural language processing
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks.
9:00am-5:00pm (8h) Data Science, Machine Learning, & AI
Hands-on data science with Python (Day 2)
Michael Cullan (The Data Incubator)
Michael Cullan walks you through developing a machine learning pipeline, from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.
9:00am-12:30pm (3h 30m) Data Science, Machine Learning, & AI Model Development, Governance, Operations
Managing the Complete Machine Learning Lifecycle with MLflow
Jules Damji (Databricks)
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work
1:30pm-5:00pm (3h 30m) Data Science, Machine Learning, & AI Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks
Building a recommender system with Amazon ML Services
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
In this workshop we’ll introduce the Amazon SageMaker machine learning platform, followed by a high level discussion of recommender systems. Next we’ll dig into different machine learning approaches for recommender systems.
9:00am-12:30pm (3h 30m) Data Science, Machine Learning, & AI Text and Language processing and analysis
Introduction to Natural Language Processing in Python
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but what happens when we run into text data? In this tutorial, I will walk through the steps to turn text data into a format that a machine can understand, share some of the most popular text analytics techniques, and showcase several natural language processing (NLP) libraries in Python including NLTK, TextBlob, spaCy and gensim.
1:30pm-5:00pm (3h 30m) Data Science, Machine Learning, & AI Deep dive into specific tools, platforms, or frameworks, Text and Language processing and analysis
Natural language understanding at scale with Spark NLP
David Talby (Pacific AI), Alex Thomas (Indeed), Saif Addin Ellafi (John Snow Labs)
This is a hands-on tutorial on state-of-the-art NLP using the highly performant, highly scalable open-source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve.
9:00am-12:30pm (3h 30m) Data Engineering and Architecture, Streaming and IoT Cloud Platforms and SaaS, Data, Analytics, and AI Architecture, Streaming and IoT, Temporal data and time-series analytics
Serverless Streaming Architectures and Algorithms for the Enterprise
Arun Kejariwal (Facebook), Karthik Ramasamy (Streamlio)
In this tutorial, we shall walk the audience through the landscape of streaming systems and overview the inception and growth of the serverless paradigm. Next, we shall present a deep dive of Apache Pulsar which provides native serverless support in the form of Pulsar functions and paint a bird’s eye view of the application domains where Pulsar functions can be leveraged.
1:30pm-5:00pm (3h 30m) Data Engineering and Architecture Cloud Platforms and SaaS, Data, Analytics, and AI Architecture
Architecting a data platform for enterprise use
Mark Madsen (Teradata), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure.
9:00am-12:30pm (3h 30m) Data Engineering and Architecture Data Management and Storage, Deep dive into specific tools, platforms, or frameworks
Learning Presto: SQL on anything
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today.
1:30pm-5:00pm (3h 30m) Data Engineering and Architecture Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture
From relational databases to Cloud databases, using the right tool for the right job.
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt Cloud platforms such as AWS for agility, elasticity and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. In this session, you will learn important considerations in choosing the right database based on your use cases and access pattern while migrating an application or building a new application on cloud.
9:00am-12:30pm (3h 30m) Data Engineering and Architecture Cloud Platforms and SaaS, Data Management and Storage
Running multidisciplinary big data workloads in the cloud
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.
1:30pm-5:00pm (3h 30m) Data Engineering and Architecture, Streaming and IoT Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
Kafka/SMM(Streams Messaging Manager) Crash Course
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera)
Kafka is omnipresent and is the backbone of not only streaming analytics applications but data lakes as well. The challenge is understanding what is going on overall in the Kafka cluster including performance, issues and message flows. This session gives a hands on experience to visualize their entire Kafka environment end-to-end and simplifies Kafka operations via SMM.
9:00am-5:00pm (8h) Strata Business Summit
Big data for managers (Day 2)
Michael Li (The Data Incubator), Ana Hocevar (The Data Incubator)
Michael Li and Ana Hocevar offer a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and utilize their input and analysis for your business’s strategic priorities and decision making.
9:00am-5:00pm (8h) Data Science, Machine Learning, & AI
Recommendation System using Deep Learning (Day 2)
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ Consulting)
In this two-days workshop, you will learn the different paradigms of recommendation systems and get introduced to the usage of deep-learning based approaches . By the end of the workshop, you will have enough practical hands-on knowledge to build, select, deploy and maintain a recommendation system for your problem.
9:00am-5:00pm (8h) Data Engineering and Architecture
Professional Kafka development (Day 2)
Jesse Anderson (Big Data Institute)
Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL.
9:00am-12:30pm (3h 30m) Security and Privacy Privacy and Security
Getting ready for CCPA: securing data lakes for heavy privacy regulation
Mark Donsky (Okera)
New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span on-prem, private cloud, multi-cloud, and hybrid cloud. We will share hands-on best practices for meeting these challenges, with special attention to CCPA.
1:30pm-5:00pm (3h 30m) Security and Privacy Privacy and Security
Apache Metron: Open source cybersecurity at scale
Carolyn Duby (Hortonworks)
Bring your laptop, roll up your sleeves, and get ready to crunch some cyber security events with Apache Metron, an open source big data cyber security platform. Learn how Metron finds actionable events in real time.
9:00am-5:00pm (8h) Data Engineering and Architecture
Building a serverless big data application on AWS (Day 2)
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more.
9:00am-5:00pm (8h) Data Science, Machine Learning, & AI
Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow (Day 2)
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools.
9:00am-5:00pm (8h) Data Science, Machine Learning, & AI
Machine learning from scratch in TensorFlow (Day 2)
Dylan Bargteil (The Data Incubator)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil offers an overview of TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications.
9:00am-5:00pm (8h)
Data Case Studies
Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Priceline), Victoriya Kalmanovich (Navy), Shreya Sharma (Expedia Inc.), Martin Mendez-Costabel (Bayer Crop Science), Gloria Macia (Roche AG), Gwen Campbell (Revibe Technologies, Inc), Moise Convolbo (Rakuten)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata.
9:00am-5:00pm (8h)
Findata Day
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), elva fernandez (American Express), Moto Tohda (Tokyo Century (USA) Inc.), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry.
9:00am-12:30pm (3h 30m) Executive Briefing and best practices, Strata Business Summit Culture and Organization
Building and Leading a Successful AI Practice for your Organization
Rossella Blatt Vital (Wonderlic)
Creating and leading a successful ML strategy is an elegant orchestration of many components: master the key ML concepts, operationalize the ML workflow, prioritize highest value projects, build a high performing team, nurture strategic partnerships, align with the company’s mission, etc. This tutorial aims to share insights and lessons learned in how to create and lead a flourishing ML practice.
1:30pm-5:00pm (3h 30m) Executive Briefing and best practices, Strata Business Summit Culture and Organization
Managing data science in the enterprise
Mac Steele (Domino Data Lab), Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. Attendees will learn how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage.
9:00am-12:30pm (3h 30m) Data Science, Machine Learning, & AI Deep Learning
Deep Learning from Scratch
Bruno Goncalves (Data For Science, Inc)
Students will learn, in a hands-on way, the theoretical foundations and principal ideas underlying Deep Learning and Neural Networks. The code structure of the implementations provided is meant to closely resemble he way Keras is structured so that by the end of the course, students will be prepared to dive deeper into the deep learning applications of their choice.
1:30pm-5:00pm (3h 30m) Data Science, Machine Learning, & AI Streaming and IoT, Temporal data and time-series analytics
Sketching data and other magic tricks
Sophie Watson (Red Hat), William Benton (Red Hat)
In this hands-on workshop, we’ll introduce several data structures that let you answer interesting queries about massive data sets in fixed amounts of space and constant time. This seems like magic, but we'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications.
9:00am-12:30pm (3h 30m) Data Engineering and Architecture Data Integration and Data Processing, Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
Real-time SQL Stream Processing at Scale with Apache Kafka and KSQL
Ricardo Ferreira (Confluent)
Building stream processing applications are certainly one of the hot topics among the IT community. Though a lot has been talked about this subject, one might say that building stream processing applications is the new sex during teenage. This tutorial aims to change this by introducing KSQL, the stream processing query engine built on top of Apache Kafka.
1:30pm-5:00pm (3h 30m) Data Engineering and Architecture Model Development, Governance, Operations
Hands-on Machine Learning with Kafka-based Streaming Pipelines
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques.
9:00am-12:30pm (3h 30m) Data Engineering and Architecture, Streaming and IoT Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
IoT - Cloudera Edge Management
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera)
Too many edge devices and agents. How does one control and manage them. How do we have handle the difficulty in collecting real-time data and most importantly, the trouble with updating specific set of agents with edge applications. Get your hands dirty with Cloudera Edge Management that addresses these challenges with ease.
1:30pm-5:00pm (3h 30m) Data Engineering and Architecture Culture and Organization
Foundations for Successful Data Projects
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects.
5:00pm-6:30pm (1h 30m)
Opening Reception
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Tuesday.
12:30pm-1:30pm (1h)
Break: Lunch
10:30am-11:00am (30m)
Break: Morning break sponsored by Microsoft
3:00pm-3:30pm (30m)
Break: Afternoon break sponsored by Dataiku

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts