9:00am Data Case Studies Madhav Madaboosi (BP), Meenakshisundaram Thandavarayan (Infosys), Matt Conners (Microsoft), Katie Malone (Civis Analytics), Mike Prorock (mesur.io), Thomas Miller (Northwestern University), Ann Nguyen (Whole Whale), Jennie Shin (Kaiser Permanente), Valentin Bercovici (PencilDATA), Wayde Fleener (General Mills), Joe Dumoulin (Next IT), Jules Malin (GoPro), Taylor Martin Martin (O'Reilly Media), Divya Ramachandran (Captricity)

LL20 C

9:00am Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments Mark Donsky (Okera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera)

1:30pm Natural language understanding at scale with spaCy and Spark NLP David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)

LL20 D

9:00am Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML Joseph Kambourakis (databricks)

LL21 B

9:00am Building your first big data application on AWS Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Ryan Nienhuis (Amazon Web Services), Randy Ridgley (Amazon Web Services)

1:30pm Deploying deep learning with TensorFlow Ron Bodkin (Google), Brian Foo (Google)

LL21 C/D

9:00am Using R and Python for scalable data science, machine learning, and AI Mario Inchiosa (Microsoft), Vanja Paunic (Microsoft), Robert Horton (Microsoft), Debraj GuhaThakurta (Microsoft), Ali-Kazim Zaidi (Microsoft), Tomas Singliar (Microsoft), John-Mark Agosta (Microsoft)

1:30pm A/B testing at scale: Accelerating software innovation Ronny Kohavi (Microsoft), Alex Deng (Microsoft), Somit Gupta (Microsoft), Paul Raff (Microsoft)

LL21 E/F

9:00am Getting started with TensorFlow Martin Görner (Google)

1:30pm Deep learning-based search and recommendation systems using TensorFlow Abhishek Kumar (Publicis Sapient), Vijay Agneeswaran (Walmart Labs)

210 A/E

9:00am Big data analytics and machine learning techniques to drive and grow business Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)

1:30pm Managing data science in the enterprise Nick Elprin (Domino Data Lab)

210 C/G

9:00am Stream processing with Kafka Tim Berglund (Confluent)

1:30pm Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

210 D/H

9:00am A deep dive into running data analytic workloads in the cloud Jason Wang (Cloudera), Mala Ramakrishnan (Cloudera), Stefan Salandy (Cloudera), Aishwarya Venkataraman (Cloudera), Vinithra Varadharajan (Cloudera), Aaron Myers (Cloudera, Inc.)

1:30pm Custom interactive visualizations and dashboards for one billion datapoints on a laptop in 30 lines of Python James Bednar (Anaconda), Philipp Rudiger (Anaconda)

LL20 B

9:00am Media and Ad Tech Day David Boyle (Audience Strategies), Violeta Hennessey (Warner Bros.), April Chen (Civis Analytics), Sridhar Alla (BlueWhale), Noah Gift (UC Davis), Blake Irvine (Netflix), Kevin Lyons (Nielsen Marketing Cloud), Jennifer Webb (SuprFanz), Rizwan Patel (Caesars Entertainment), Anthony Accardo (Disney), Amanda Gerdes (Blizzard Entertainment), Violeta Hennessey (Warner Bros.), Aneesh Karve (Quilt), David Boyle (Audience Strategies), Pete Skomoroch (Workday)

LL21 A

9:00am Learning PyTorch by building a recommender system Mo Patel (Independent), Neejole Patel (Virginia Tech)

1:30pm How to use Impala's query plan and profile to fix performance issues Juan Yu (Cloudera)

210 B/F

9:00am Modern real-time streaming architectures Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (StreamNative), Arun Kejariwal (Independent)

1:30pm Time series data: Architecture and use cases Ted Malaska (Capital One)

6:30pm Ignite Strata San Jose | Room: Grand Ballroom 220

5:00pm Opening Reception | Room: Hall 1, 2, 3

10:30am Morning break | Room: Executive Concourse

3:00pm Afternoon break | Room: Executive Concourse

12:30pm Lunch | Room: 230 A-C

9:00am-5:00pm (8h) Strata Business Summit

Data Case Studies

Madhav Madaboosi (BP), Meenakshisundaram Thandavarayan (Infosys), Matt Conners (Microsoft), Katie Malone (Civis Analytics), Mike Prorock (mesur.io), Thomas Miller (Northwestern University), Ann Nguyen (Whole Whale), Jennie Shin (Kaiser Permanente), Valentin Bercovici (PencilDATA), Wayde Fleener (General Mills), Joe Dumoulin (Next IT), Jules Malin (GoPro), Taylor Martin Martin (O'Reilly Media), Divya Ramachandran (Captricity)

Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions.

9:00am-12:30pm (3h 30m) Data engineering and architecture, Law, ethics, and governance

Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments

Mark Donsky (Okera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera)

New regulations are driving compliance, governance, and security challenges for big data, and infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span a variety of deployments. Mark Donsky, Andre Araujo, Syed Rafice, and Mubashir Kazia walk you through securing a Hadoop cluster, with special attention to GDPR.

1:30pm-5:00pm (3h 30m) Data science and machine learning

Natural language understanding at scale with spaCy and Spark NLP

David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)

Natural language processing is a key component in many data science systems. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP, using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings.

9:00am-5:00pm (8h) Data science and machine learning

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Joseph Kambourakis (databricks)

Join Joseph Kambourakis for an introduction to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case.

9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data engineering and architecture

Building your first big data application on AWS

Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Ryan Nienhuis (Amazon Web Services), Randy Ridgley (Amazon Web Services)

Want to learn how to use Amazon's big data web services to launch your first big data application in the cloud? Jorge Lopez walks you through building a big data application using a combination of open source technologies and AWS managed services.

1:30pm-5:00pm (3h 30m) Data engineering and architecture

Deploying deep learning with TensorFlow

Ron Bodkin (Google), Brian Foo (Google)

TensorFlow and Keras are popular libraries for machine learning because of their support for deep learning and GPU deployment. Join Ron Bodkin and Brian Foo to learn how to execute these libraries in production with vision and recommendation models and how to export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes.

9:00am-12:30pm (3h 30m) Data science and machine learning

Using R and Python for scalable data science, machine learning, and AI

Mario Inchiosa (Microsoft), Vanja Paunic (Microsoft), Robert Horton (Microsoft), Debraj GuhaThakurta (Microsoft), Ali-Kazim Zaidi (Microsoft), Tomas Singliar (Microsoft), John-Mark Agosta (Microsoft)

R and Python top the list of languages used in data science and machine learning, and data scientists and engineers fluent in one of these languages are increasingly marketable. Come learn how to build and operationalize machine learning models using distributed functions and do scalable, end-to-end data science in R and Python on single machines, Spark clusters, and cloud-based infrastructure.

1:30pm-5:00pm (3h 30m) Big data and data science in the cloud, Data science and machine learning, Data-driven business management

A/B testing at scale: Accelerating software innovation

Ronny Kohavi (Microsoft), Alex Deng (Microsoft), Somit Gupta (Microsoft), Paul Raff (Microsoft)

Controlled experiments such as A/B tests have revolutionized the way software is being developed, allowing real users to objectively evaluate new ideas. Ronny Kohavi, Alex Deng, Somit Gupta, and Paul Raff lead an introduction to A/B testing and share lessons learned from one of the largest A/B testing platforms on the planet, running at Microsoft, which executes over 10K experiments a year.

9:00am-12:30pm (3h 30m) Data science and machine learning

Getting started with TensorFlow

Martin Görner (Google)

Martin Görner walks you through training and deploying a machine learning system using popular open source library TensorFlow. Martin takes you from a conceptual overview all the way to building complex classifiers and explains how you can apply deep learning to complex problems in science and industry.

1:30pm-5:00pm (3h 30m) Data science and machine learning, Media, entertainment, and advertising

Deep learning-based search and recommendation systems using TensorFlow

Abhishek Kumar (Publicis Sapient), Vijay Agneeswaran (Walmart Labs)

Abhishek Kumar and Vijay Srinivas Agneeswaran offer an introduction to deep learning-based recommendation and learning-to-rank systems using TensorFlow. You'll learn how to build a recommender system based on intent prediction using deep learning that is based on a real-world implementation for an ecommerce client.

9:00am-12:30pm (3h 30m) Data science and machine learning, Data-driven business management, Strata Business Summit

Big data analytics and machine learning techniques to drive and grow business

Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)

Burcu Baran, Wei Di, Michael Li, and Chi-Yi Kuan walk you through the big data analytics and data science lifecycle and share their experience and lessons learned leveraging advanced analytics and machine learning techniques such as predictive modeling to drive and grow business at LinkedIn.

1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit

Managing data science in the enterprise

Nick Elprin (Domino Data Lab)

The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Nick Elprin details how leading organizations have taken a holistic approach to people, process, and technology to build a sustainable competitive advantage.

9:00am-12:30pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications

Stream processing with Kafka

Tim Berglund (Confluent)

Tim Berglund leads a basic architectural introduction to Kafka and walks you through using Kafka Streams and KSQL to process streaming data.

1:30pm-5:00pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You'll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead.

9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data engineering and architecture

A deep dive into running data analytic workloads in the cloud

Jason Wang (Cloudera), Mala Ramakrishnan (Cloudera), Stefan Salandy (Cloudera), Aishwarya Venkataraman (Cloudera), Vinithra Varadharajan (Cloudera), Aaron Myers (Cloudera, Inc.)

Aishwarya Venkataraman, Jason Wang, Mala Ramakrishnan, Stefan Salandy, and Vinithra Varadharajan lead a deep dive into running data analytic workloads in a managed service capacity in the public cloud and highlight cloud infrastructure best practices.

1:30pm-5:00pm (3h 30m) Data science and machine learning, Visualization and user experience

Custom interactive visualizations and dashboards for one billion datapoints on a laptop in 30 lines of Python

James Bednar (Anaconda), Philipp Rudiger (Anaconda)

Python lets you solve data science problems by stitching together packages from its ecosystem, but it can be difficult to choose packages that work well together. James Bednar and Philipp Rudiger walk you through a concise, fast, easily customizable, and fully reproducible recipe for interactive visualization of millions or billions of datapoints—all in just 30 lines of Python code.

9:00am-5:00pm (8h) Strata Business Summit

Media and Ad Tech Day

David Boyle (Audience Strategies), Violeta Hennessey (Warner Bros.), April Chen (Civis Analytics), Sridhar Alla (BlueWhale), Noah Gift (UC Davis), Blake Irvine (Netflix), Kevin Lyons (Nielsen Marketing Cloud), Jennifer Webb (SuprFanz), Rizwan Patel (Caesars Entertainment), Anthony Accardo (Disney), Amanda Gerdes (Blizzard Entertainment), Violeta Hennessey (Warner Bros.), Aneesh Karve (Quilt), David Boyle (Audience Strategies), Pete Skomoroch (Workday)

Hear from innovators in ad tech, measurement, automation, and audience engagement about where the media industry is today—and where it's likely to go next.

9:00am-12:30pm (3h 30m) Big data and data science in the cloud, Data science and machine learning Graphs and Time-series

Learning PyTorch by building a recommender system

Mo Patel (Independent), Neejole Patel (Virginia Tech)

Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Mo Patel and Neejole Patel walk you through using PyTorch to build a content recommendation model.

1:30pm-5:00pm (3h 30m) Data engineering and architecture

How to use Impala's query plan and profile to fix performance issues

Juan Yu (Cloudera)

Apache Impala (incubating) is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. Juan Yu demystifies the cost model Impala Planner uses and how Impala optimizes queries and explains how to identify performance bottleneck through query plan and profile and how to drive Impala to its full potential.

9:00am-12:30pm (3h 30m) Data engineering and architecture, Streaming systems and real-time applications Graphs and Time-series

Modern real-time streaming architectures

Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (StreamNative), Arun Kejariwal (Independent)

Across diverse segments in industry, there has been a shift in focus from big data to fast data. Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them.

1:30pm-5:00pm (3h 30m) Data engineering and architecture Graphs and Time-series

Time series data: Architecture and use cases

Ted Malaska (Capital One)

If you have data that has a time factor to it, then you need to think in terms of time series datasets. Ted Malaska explores time series in all of its forms, from tumbling windows to sessionization in batch or in streaming. You'll gain exposure to the tools and background you need to be successful in the world of time-oriented data.

6:30pm-8:00pm (1h 30m)

Ignite Strata San Jose

Ignite is happening at Strata on Tuesday, March 6. Join us for a fun, high-energy evening of five-minute talks—all aspiring to live up to the Ignite motto: Enlighten us, but make it quick.

5:00pm-6:30pm (1h 30m)

Opening Reception

Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors.

10:30am-11:00am (30m)

Break: Morning break

3:00pm-3:30pm (30m)

Break: Afternoon break

12:30pm-1:30pm (1h)

Break: Lunch

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com