Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA
 
LL20 A
Add Data Case Studies to your personal schedule
9:00am Data Case Studies Mike Prorock (mesur.io)
LL20 C
Add Developing a Modern Enterprise Data Strategy to your personal schedule
9:00am Developing a Modern Enterprise Data Strategy John Akred (Silicon Valley Data Science), Cindi Thompson (Silicon Valley Data Science)
Add Managing data science in the enterprise to your personal schedule
1:30pm Managing data science in the enterprise Nick Elprin (Domino Data Lab)
LL20 D
LL21 B
Add Building your first big data application on AWS to your personal schedule
9:00am Building your first big data application on AWS Jorge A. Lopez (Amazon Web Services)
Add Deploying deep learning with TensorFlow to your personal schedule
1:30pm Deploying deep learning with TensorFlow Ron Bodkin (Google)
LL21 C/D
Add Using R and Python for scalable data science, machine learning, and AI to your personal schedule
9:00am Using R and Python for scalable data science, machine learning, and AI Mario Inchiosa (Microsoft), Vanja Paunic (Microsoft), Robert Horton (Microsoft), Debraj GuhaThakurta (Microsoft), Ali Zaidi (Microsoft), Tomas Singliar (Microsoft), John-Mark Agosta (Microsoft)
Add A/B Testing at Scale: Accelerating Software Innovation to your personal schedule
1:30pm A/B Testing at Scale: Accelerating Software Innovation Ronny Kohavi (Microsoft), Alex Deng (Microsoft), Pavel Dmitriev (Microsoft), Paul Raff (Microsoft)
LL21 E/F
Add Getting started with TensorFlow to your personal schedule
9:00am Getting started with TensorFlow Yufeng Guo (Google), Amy Unruh (Google)
Add Deep Learning Based Search and Recommendation Systems Using TensorFlow to your personal schedule
1:30pm Deep Learning Based Search and Recommendation Systems Using TensorFlow Abhishek Kumar (Sapient), Dr. Vijay Srinivas Agneeswaran (SapientNitro)
210 A/E
Add Understanding data at scale leveraging Spark and Deep Learning Frameworks. to your personal schedule
9:00am Understanding data at scale leveraging Spark and Deep Learning Frameworks. Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Add Natural language understanding at scale with spaCy and Spark NLP to your personal schedule
1:30pm Natural language understanding at scale with spaCy and Spark NLP David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
210 C/G
Add Stream processing with Kafka to your personal schedule
9:00am Stream processing with Kafka Tim Berglund (Confluent)
Add Streaming applications as microservices using Kafka,  Akka Streams, and Kafka Streams to your personal schedule
1:30pm Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
210 D/H
Add Learning PyTorch by building a recommender system to your personal schedule
9:00am Learning PyTorch by building a recommender system Mo Patel (Independent), Neejole Patel (Virginia Tech)
LL20 B
Add Media and Ad Tech Day to your personal schedule
9:00am Media and Ad Tech Day Ray Bernard (SuprFanz), Jennifer Webb (SuprFanz)
LL21 A
Add A deep dive into running data analytic workloads in the cloud to your personal schedule
9:00am A deep dive into running data analytic workloads in the cloud Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Vinithra Varadharajan (Cloudera), Jennifer Wu (Cloudera)
210 B/F
Add Modern real-time streaming architectures to your personal schedule
9:00am Modern real-time streaming architectures Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio)
Add Architecting A Data Platform to your personal schedule
1:30pm Architecting A Data Platform John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
10:30am Break | Room: Break
12:30pm Break | Room: Lunch
3:00pm Break | Room: Lunch
Add Opening Reception to your personal schedule
5:00pm Opening Reception | Room: Hall 1, 2, 3
9:00am-5:00pm (8h) Strata Business Summit
Data Case Studies
Mike Prorock (mesur.io)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions.
9:00am-12:30pm (3h 30m) Data-driven business management, Strata Business Summit
Developing a Modern Enterprise Data Strategy
John Akred (Silicon Valley Data Science), Cindi Thompson (Silicon Valley Data Science)
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those key aspirations that will define an organization’s future vision. In this tutorial, we explain how to create a modern data strategy that powers data-driven business.
1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit
Managing data science in the enterprise
Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Nick Elprin details how leading organizations have taken a holistic approach to people, process, and technology to build a sustainable competitive advantage.
9:00am-5:00pm (8h) Data science and machine learning
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
Join in for an introduction to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case.
9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data engineering and architecture
Building your first big data application on AWS
Jorge A. Lopez (Amazon Web Services)
Want to learn how to use Amazon's big data web services to launch your first big data application on the cloud? Jorge Lopez walks you through building a big data application using a combination of open source technologies and AWS managed services.
1:30pm-5:00pm (3h 30m) Data engineering and architecture
Deploying deep learning with TensorFlow
Ron Bodkin (Google)
TensorFlow and Keras are popular libraries for machine learning because of their support for deep learning and GPU deployment. Join Ron Bodkin to learn how to execute these libraries in production with vision and recommendation models and how to export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes.
9:00am-12:30pm (3h 30m) Data science and machine learning
Using R and Python for scalable data science, machine learning, and AI
Mario Inchiosa (Microsoft), Vanja Paunic (Microsoft), Robert Horton (Microsoft), Debraj GuhaThakurta (Microsoft), Ali Zaidi (Microsoft), Tomas Singliar (Microsoft), John-Mark Agosta (Microsoft)
R and Python top the list of languages used in data science and machine learning, and data scientists and engineers fluent in one of these languages are increasingly marketable. Come learn how to build and operationalize machine learning models using distributed functions and do scalable, end-to-end data science in R and Python on single machines, Spark clusters, and cloud-based infrastructure.
1:30pm-5:00pm (3h 30m) Big data and data science in the cloud, Data science and machine learning, Data-driven business management
A/B Testing at Scale: Accelerating Software Innovation
Ronny Kohavi (Microsoft), Alex Deng (Microsoft), Pavel Dmitriev (Microsoft), Paul Raff (Microsoft)
Controlled experiments, including A/B tests, have revolutionized the way software is being developed, with new ideas objectively evaluated with real users. We provide an intro and lessons learned from one of the largest A/B testing platforms on the planet, running at Microsoft and executing over 10K experiments/year.
9:00am-12:30pm (3h 30m) Data science and machine learning
Getting started with TensorFlow
Yufeng Guo (Google), Amy Unruh (Google)
Yufeng Guo and Amy Unruh walk you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng and Amy take you from a conceptual overview all the way to building complex classifiers and explain how you can apply deep learning to complex problems in science and industry.
1:30pm-5:00pm (3h 30m) Data science and machine learning, Media, entertainment, and advertising
Deep Learning Based Search and Recommendation Systems Using TensorFlow
Abhishek Kumar (Sapient), Dr. Vijay Srinivas Agneeswaran (SapientNitro)
The key takeaways are: 1. Introduction to deep learning - different networks such as RBMs, Conv nets, auto-encoders. 2. Introduction to recommendation systems - why deep learning is required for hybrid systems. 3. complete hands-on TensorFlow tutorial, including TensorBoard. 4. end-to-end view of deep learning based recommendation and learning to rank systems.
9:00am-12:30pm (3h 30m) Data science and machine learning
Understanding data at scale leveraging Spark and Deep Learning Frameworks.
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
We go through approaches for preprocessing, training, inference and deployment across data sets (time-series, audio, video and text), leveraging Spark, extended ecosystem of libraries and Deep Learning Frameworks. We use respective (sample) data and code to understand implementation nuances, and subsequently highlight the bottlenecks and solutions for data/model at scale.
1:30pm-5:00pm (3h 30m) Data science and machine learning
Natural language understanding at scale with spaCy and Spark NLP
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
Natural language processing is a key component in many data science systems that must understand or reason about text. This is a hands-on tutorial for scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings.
9:00am-12:30pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications
Stream processing with Kafka
Tim Berglund (Confluent)
Tim Berglund leads a basic architectural introduction to Kafka and walks you through using Kafka Streams and KSQL to process streaming data.
1:30pm-5:00pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications
Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams
Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You'll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to chose them instead.
9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data science and machine learning
Learning PyTorch by building a recommender system
Mo Patel (Independent), Neejole Patel (Virginia Tech)
Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Mo Patel and Neejole Patel walk you through using PyTorch to build a content recommendation model.
1:30pm-5:00pm (3h 30m) Data science and machine learning, Visualization and user experience
Custom interactive visualizations and dashboards for one billion datapoints on a laptop in 30 lines of Python
James Bednar (Anaconda), Philipp Rudiger (Anaconda)
Python lets you solve data science problems by stitching together packages from its ecosystem, but it can be difficult to choose packages that work well together. James Bednar and Philipp Rudiger walk you through a concise, fast, easily customizable, and fully reproducible recipe for interactive visualization of millions or billions of datapoints—all in just 30 lines of Python code.
9:00am-5:00pm (8h) Strata Business Summit
Media and Ad Tech Day
Ray Bernard (SuprFanz), Jennifer Webb (SuprFanz)
Hear from innovators in ad tech, measurement, automation, and audience engagement about where the media industry is today—and where it's likely to go next.
9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data engineering and architecture
A deep dive into running data analytic workloads in the cloud
Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Vinithra Varadharajan (Cloudera), Jennifer Wu (Cloudera)
Vinithra Varadharajan, Philip Langdale, Eugene Fratkin, and Jennifer Wu lead a deep dive into running data analytic workloads in a managed service capacity in the public cloud and highlight cloud infrastructure best practices.
1:30pm-5:00pm (3h 30m) Data engineering and architecture
How to use Impala query plan and profile to fix performance issues
Juan Yu (Cloudera)
Apache Impala (incubating) is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. Juan Yu explores the cost model Impala planner uses, how Impala optimizes queries, how to identify performance bottleneck through query plan and profile, and how to drive Impala to its full potential.
9:00am-12:30pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications
Modern real-time streaming architectures
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio)
Across diverse segments in industry, there has been a shift in focus from big data to fast data. Karthik Ramasamy, Sanjeev Kulkarni, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them.
1:30pm-5:00pm (3h 30m) Data engineering and architecture
Architecting A Data Platform
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop, Spark and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
10:30am-11:00am (30m)
Break
12:30pm-1:30pm (1h)
Break
3:00pm-3:30pm (30m)
Break
5:00pm-6:30pm (1h 30m)
Opening Reception
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors.