Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA
 
LL20 A
9:00am Data Case Studies Madhav Madaboosi (BP), Meenakshisundaram Thandavarayan (Infosys), Matt Conners (Microsoft), Katie Malone (Civis Analytics), Mike Prorock (mesur.io), Thomas Miller (Northwestern University), Ann Nguyen (Whole Whale), Jennie Shin (Kaiser Permanente), Valentin Bercovici (PencilDATA), Wayde Fleener (General Mills), Joe Dumoulin (Next IT), Jules Malin (GoPro), Taylor Martin Martin (O'Reilly Media), Divya Ramachandran (Captricity)
LL20 C
9:00am Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments Mark Donsky (Okera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera)
1:30pm Natural language understanding at scale with spaCy and Spark NLP David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)
LL20 D
LL21 B
9:00am Building your first big data application on AWS Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Ryan Nienhuis (Amazon Web Services), Randy Ridgley (Amazon Web Services)
1:30pm Deploying deep learning with TensorFlow Ron Bodkin (Google), Brian Foo (Google)
LL21 C/D
9:00am Using R and Python for scalable data science, machine learning, and AI Mario Inchiosa (Microsoft), Vanja Paunic (Microsoft), Robert Horton (Microsoft), Debraj GuhaThakurta (Microsoft), Ali-Kazim Zaidi (Microsoft), Tomas Singliar (Microsoft), John-Mark Agosta (Microsoft)
1:30pm A/B testing at scale: Accelerating software innovation Ronny Kohavi (Microsoft), Alex Deng (Microsoft), Somit Gupta (Microsoft), Paul Raff (Microsoft)
LL21 E/F
9:00am Getting started with TensorFlow Martin Görner (Google)
1:30pm Deep learning-based search and recommendation systems using TensorFlow Abhishek Kumar (Publicis Sapient), Vijay Agneeswaran (Walmart Labs)
210 A/E
9:00am Big data analytics and machine learning techniques to drive and grow business Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)
1:30pm Managing data science in the enterprise Nick Elprin (Domino Data Lab)
210 C/G
9:00am Stream processing with Kafka Tim Berglund (Confluent)
1:30pm Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
210 D/H
9:00am A deep dive into running data analytic workloads in the cloud Jason Wang (Cloudera), Mala Ramakrishnan (Cloudera), Stefan Salandy (Cloudera), Aishwarya Venkataraman (Cloudera), Vinithra Varadharajan (Cloudera), Aaron Myers (Cloudera, Inc.)
LL20 B
9:00am Media and Ad Tech Day David Boyle (Audience Strategies), Violeta Hennessey (Warner Bros.), April Chen (Civis Analytics), Sridhar Alla (BlueWhale), Noah Gift (UC Davis), Blake Irvine (Netflix), Kevin Lyons (Nielsen Marketing Cloud), Jennifer Webb (SuprFanz), Rizwan Patel (Caesars Entertainment), Anthony Accardo (Disney), Amanda Gerdes (Blizzard Entertainment), Violeta Hennessey (Warner Bros.), Aneesh Karve (Quilt), David Boyle (Audience Strategies), Pete Skomoroch (Workday)
LL21 A
9:00am Learning PyTorch by building a recommender system Mo Patel (Independent), Neejole Patel (Virginia Tech)
210 B/F
9:00am Modern real-time streaming architectures Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (StreamNative), Arun Kejariwal (Independent)
1:30pm Time series data: Architecture and use cases Ted Malaska (Capital One)
6:30pm Ignite Strata San Jose | Room: Grand Ballroom 220
5:00pm Opening Reception | Room: Hall 1, 2, 3
10:30am Morning break | Room: Executive Concourse
3:00pm Afternoon break | Room: Executive Concourse
12:30pm Lunch | Room: 230 A-C
9:00am-5:00pm (8h) Strata Business Summit
Data Case Studies
Madhav Madaboosi (BP), Meenakshisundaram Thandavarayan (Infosys), Matt Conners (Microsoft), Katie Malone (Civis Analytics), Mike Prorock (mesur.io), Thomas Miller (Northwestern University), Ann Nguyen (Whole Whale), Jennie Shin (Kaiser Permanente), Valentin Bercovici (PencilDATA), Wayde Fleener (General Mills), Joe Dumoulin (Next IT), Jules Malin (GoPro), Taylor Martin Martin (O'Reilly Media), Divya Ramachandran (Captricity)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions.
9:00am-12:30pm (3h 30m) Data engineering and architecture, Law, ethics, and governance
Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments
Mark Donsky (Okera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera)
New regulations are driving compliance, governance, and security challenges for big data, and infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span a variety of deployments. Mark Donsky, Andre Araujo, Syed Rafice, and Mubashir Kazia walk you through securing a Hadoop cluster, with special attention to GDPR.
1:30pm-5:00pm (3h 30m) Data science and machine learning
Natural language understanding at scale with spaCy and Spark NLP
David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)
Natural language processing is a key component in many data science systems. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP, using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings.
9:00am-5:00pm (8h) Data science and machine learning
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
Joseph Kambourakis (databricks)
Join Joseph Kambourakis for an introduction to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case.
9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data engineering and architecture
Building your first big data application on AWS
Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Ryan Nienhuis (Amazon Web Services), Randy Ridgley (Amazon Web Services)
Want to learn how to use Amazon's big data web services to launch your first big data application in the cloud? Jorge Lopez walks you through building a big data application using a combination of open source technologies and AWS managed services.
1:30pm-5:00pm (3h 30m) Data engineering and architecture
Deploying deep learning with TensorFlow
Ron Bodkin (Google), Brian Foo (Google)
TensorFlow and Keras are popular libraries for machine learning because of their support for deep learning and GPU deployment. Join Ron Bodkin and Brian Foo to learn how to execute these libraries in production with vision and recommendation models and how to export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes.
9:00am-12:30pm (3h 30m) Data science and machine learning
Using R and Python for scalable data science, machine learning, and AI
Mario Inchiosa (Microsoft), Vanja Paunic (Microsoft), Robert Horton (Microsoft), Debraj GuhaThakurta (Microsoft), Ali-Kazim Zaidi (Microsoft), Tomas Singliar (Microsoft), John-Mark Agosta (Microsoft)
R and Python top the list of languages used in data science and machine learning, and data scientists and engineers fluent in one of these languages are increasingly marketable. Come learn how to build and operationalize machine learning models using distributed functions and do scalable, end-to-end data science in R and Python on single machines, Spark clusters, and cloud-based infrastructure.
1:30pm-5:00pm (3h 30m) Big data and data science in the cloud, Data science and machine learning, Data-driven business management
A/B testing at scale: Accelerating software innovation
Ronny Kohavi (Microsoft), Alex Deng (Microsoft), Somit Gupta (Microsoft), Paul Raff (Microsoft)
Controlled experiments such as A/B tests have revolutionized the way software is being developed, allowing real users to objectively evaluate new ideas. Ronny Kohavi, Alex Deng, Somit Gupta, and Paul Raff lead an introduction to A/B testing and share lessons learned from one of the largest A/B testing platforms on the planet, running at Microsoft, which executes over 10K experiments a year.
9:00am-12:30pm (3h 30m) Data science and machine learning
Getting started with TensorFlow
Martin Görner (Google)
Martin Görner walks you through training and deploying a machine learning system using popular open source library TensorFlow. Martin takes you from a conceptual overview all the way to building complex classifiers and explains how you can apply deep learning to complex problems in science and industry.
1:30pm-5:00pm (3h 30m) Data science and machine learning, Media, entertainment, and advertising
Deep learning-based search and recommendation systems using TensorFlow
Abhishek Kumar (Publicis Sapient), Vijay Agneeswaran (Walmart Labs)
Abhishek Kumar and Vijay Srinivas Agneeswaran offer an introduction to deep learning-based recommendation and learning-to-rank systems using TensorFlow. You'll learn how to build a recommender system based on intent prediction using deep learning that is based on a real-world implementation for an ecommerce client.
9:00am-12:30pm (3h 30m) Data science and machine learning, Data-driven business management, Strata Business Summit
Big data analytics and machine learning techniques to drive and grow business
Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)
Burcu Baran, Wei Di, Michael Li, and Chi-Yi Kuan walk you through the big data analytics and data science lifecycle and share their experience and lessons learned leveraging advanced analytics and machine learning techniques such as predictive modeling to drive and grow business at LinkedIn.
1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit
Managing data science in the enterprise
Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Nick Elprin details how leading organizations have taken a holistic approach to people, process, and technology to build a sustainable competitive advantage.
9:00am-12:30pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications
Stream processing with Kafka
Tim Berglund (Confluent)
Tim Berglund leads a basic architectural introduction to Kafka and walks you through using Kafka Streams and KSQL to process streaming data.
1:30pm-5:00pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications
Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams
Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You'll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead.
9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data engineering and architecture
A deep dive into running data analytic workloads in the cloud
Jason Wang (Cloudera), Mala Ramakrishnan (Cloudera), Stefan Salandy (Cloudera), Aishwarya Venkataraman (Cloudera), Vinithra Varadharajan (Cloudera), Aaron Myers (Cloudera, Inc.)
Aishwarya Venkataraman, Jason Wang, Mala Ramakrishnan, Stefan Salandy, and Vinithra Varadharajan lead a deep dive into running data analytic workloads in a managed service capacity in the public cloud and highlight cloud infrastructure best practices.
1:30pm-5:00pm (3h 30m) Data science and machine learning, Visualization and user experience
Custom interactive visualizations and dashboards for one billion datapoints on a laptop in 30 lines of Python
James Bednar (Anaconda), Philipp Rudiger (Anaconda)
Python lets you solve data science problems by stitching together packages from its ecosystem, but it can be difficult to choose packages that work well together. James Bednar and Philipp Rudiger walk you through a concise, fast, easily customizable, and fully reproducible recipe for interactive visualization of millions or billions of datapoints—all in just 30 lines of Python code.
9:00am-5:00pm (8h) Strata Business Summit
Media and Ad Tech Day
David Boyle (Audience Strategies), Violeta Hennessey (Warner Bros.), April Chen (Civis Analytics), Sridhar Alla (BlueWhale), Noah Gift (UC Davis), Blake Irvine (Netflix), Kevin Lyons (Nielsen Marketing Cloud), Jennifer Webb (SuprFanz), Rizwan Patel (Caesars Entertainment), Anthony Accardo (Disney), Amanda Gerdes (Blizzard Entertainment), Violeta Hennessey (Warner Bros.), Aneesh Karve (Quilt), David Boyle (Audience Strategies), Pete Skomoroch (Workday)
Hear from innovators in ad tech, measurement, automation, and audience engagement about where the media industry is today—and where it's likely to go next.
9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data science and machine learning Graphs and Time-series
Learning PyTorch by building a recommender system
Mo Patel (Independent), Neejole Patel (Virginia Tech)
Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Mo Patel and Neejole Patel walk you through using PyTorch to build a content recommendation model.
1:30pm-5:00pm (3h 30m) Data engineering and architecture
How to use Impala's query plan and profile to fix performance issues
Juan Yu (Cloudera)
Apache Impala (incubating) is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. Juan Yu demystifies the cost model Impala Planner uses and how Impala optimizes queries and explains how to identify performance bottleneck through query plan and profile and how to drive Impala to its full potential.
9:00am-12:30pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications Graphs and Time-series
Modern real-time streaming architectures
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (StreamNative), Arun Kejariwal (Independent)
Across diverse segments in industry, there has been a shift in focus from big data to fast data. Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them.
1:30pm-5:00pm (3h 30m) Data engineering and architecture Graphs and Time-series
Time series data: Architecture and use cases
Ted Malaska (Capital One)
If you have data that has a time factor to it, then you need to think in terms of time series datasets. Ted Malaska explores time series in all of its forms, from tumbling windows to sessionization in batch or in streaming. You'll gain exposure to the tools and background you need to be successful in the world of time-oriented data.
6:30pm-8:00pm (1h 30m)
Ignite Strata San Jose
Ignite is happening at Strata on Tuesday, March 6. Join us for a fun, high-energy evening of five-minute talks—all aspiring to live up to the Ignite motto: Enlighten us, but make it quick.
5:00pm-6:30pm (1h 30m)
Opening Reception
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors.
10:30am-11:00am (30m)
Break: Morning break
3:00pm-3:30pm (30m)
Break: Afternoon break
12:30pm-1:30pm (1h)
Break: Lunch