Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule List View Grid View

Topics

S11A

11:15 The cloud is expensive, so build your own redundant Hadoop clusters. Stuart Pook (Criteo)

12:05 Using a global data fabric to run a mixed cloud deployment Jim Scott (NVIDIA)

14:05 Analytics in the cloud: Building a modern cloud-based big data warehouse Greg Rahn (Cloudera)

14:55 Data science across data sources with Apache Arrow Tomer Shiran (Dremio)

16:35 Making stateless containers reliable and available even with stateful applications Paul Curtis (Weaveworks)

17:25 Practical advice for driving down the cost of cloud big data platforms Christopher Royles (Cloudera)

S11B

11:15 Web analytics at scale with Druid at Naver Jason Heo (Naver), Dooyong Kim (Navercorp)

12:05 Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)

14:05 Audi's journey to an enterprise big data platform Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)

14:55 Elastic map matching using Cloudera Altus and Apache Spark Timo Graen (Volkswagen AG ), Robert Neumann (Ultra Tendency)

16:35 Improving DevOps and QA efficiency using machine learning and NLP methods Ran Taig (Dell), Omer Sagi (Dell)

17:25 Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am Holden Karau (Independent), Rachel Warren (Salesforce Einstein)

Capital Suite 7

11:15 Architecting data platforms for cybersecurity Charaka Goonatilake (Panaseer)

12:05 Hadoop under attack: Securing data in a banking domain Federico Leven (ReactoData)

14:05 GPU-accelerated threat detection with GOAI Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)

14:55 The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense Lee Blum (Verint Systems)

16:35 How to protect big data in a containerized environment Thomas Phelan (HPE BlueData)

17:25 Security, governance, and cloud analytics, oh my! Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)

Capital Suite 8/9

11:15 Processing fast data with Apache Spark: A tale of two APIs Gerard Maas (Lightbend)

12:05 How BT delivers better broadband and TV using Spark and Kafka Phillip Radley (BT)

14:05 Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka Michael Noll (Confluent)

14:55 Multi-data center and multitenant durable messaging with Apache Pulsar Ivan Kelly (Streamlio)

16:35 Kafka in jail: Running Kafka in container-orchestrated clusters Sean Glover (Lightbend)

17:25 Stream processing for the practitioner: Blueprints for common stream processing use cases with Apache Flink Aljoscha Krettek (Ververica)

Capital Suite 10/11

11:15 Data science survival and growth within the corporate jungle: An easyJet case study Alberto Rey Villaverde (easyJet), Grigorios Mingas (easyJet)

12:05 Risk-sharing pools: Winning zero-sum games through machine learning Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)

14:05 Building a healthcare decision support system for ICD10/HCC coding through deep learning Manas Ranjan Kar (Episource)

14:55 DataOps: Nine steps to transform your data science impact Harvinder Atwal (Moneysupermarket)

16:35 Narrative extraction: Analyzing the world’s narratives through natural language understanding Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)

17:25 Rent, rain, and regulations: Leveraging structure in big data to predict criminal activity Jorie Koster-Hale (Dataiku)

Capital Suite 12

11:15 How will the GDPR impact machine learning? Steven Touw (Immuta)

12:05 Fairness and diversity in online social systems Elisa Celis (EPFL)

14:05 StreamDM: Advanced data science with Spark Streaming Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

14:55 Machine learning for time series: What works and what doesn't Mikio Braun (Zalando)

16:35 Correlation analysis on live data streams Arun Kejariwal (Independent), Francois Orsini (MZ)

17:25 Code Property Graph: A modern, queryable data storage for source code Fabian Yamaguchi (ShiftLeft)

Capital Suite 13

11:15 Distributed training of deep learning models Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)

12:05 Deep learning for recommender systems Nick Pentreath (IBM)

14:05 Real-time deep learning on video streams eran avidan (Intel)

14:55 Deep computer vision for manufacturing Aurélien Géron (Kiwisoft)

16:35 Using Siamese CNNs for removing duplicate entries from real estate listing databases Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)

17:25 Using LSTMs to aid professional translators Darren Cook (QQ Trend)

Capital Suite 14

11:15 Finding bias in social media recommendations Guillaume Chaslot (AlgoTransparency)

12:05 Data visualization in a big data world Jeff Fletcher (Cloudera)

14:05 Designing ethical artificial intelligence Jivan Virdee (Fjord), Hollie Lubbock (Fjord)

14:55 The business leader’s guide to designing indispensable analytics solutions and data products Brian O'Neill (Designing for Analytics)

16:35 Architectural design for interactive visualization Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)

17:25 Democratizing data within your organization Mark Grover (Lyft), Deepak Tiwari (Lyft)

Capital Suite 15/16

11:15 Leveraging public-private partnerships using data analytics for economic insights Audrey Lobo-Pulo (Phoensight), Nicholas O'Donnell (LinkedIn)

12:05 The app trap: Why every mobile app and mobile operator needs anomaly detection Ira Cohen (Anodot)

14:05 Solving data cleaning and unification using human-guided machine learning Ihab Ilyas (University of Waterloo)

14:55 Successful data cultures: Inclusivity, empathy, retention, and results Kim Nilsson (Pivigo), Phil Harvey (Microsoft)

16:35 Data Collaboratives Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)

17:25 Blind men and elephants: What’s missing from your big data? Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)

Capital Suite 17

11:15 Executive Briefing: GDPR—Getting your data ready for heavy, new EU privacy regulations Mark Donsky (Okera), Syed Rafice (Cloudera)

12:05 Executive Briefing: Becoming a data-driven enterprise—A maturity model Teresa Tung (Accenture), Jean-Luc Chatelain (Accenture)

14:05 Executive Briefing: Lessons learned managing data science projects—Adopting a team data science process Danielle Dean (iRobot)

14:55 Executive Briefing: What you need to know about fast data Dean Wampler (Anyscale)

16:35 Executive Briefing: BI on big data Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)

17:25 Executive Briefing: Why machine-learned models crash and burn in production and what to do about it David Talby (Pacific AI)

Expo Hall

11:15 Revolutionizing the newsroom with artificial intelligence Dan Gilbert (News UK), Jonathan Leslie (Pivigo)

12:05 Interpretable AI: Can we trust machine learning? Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)

14:05 Time for a new relation: Going from RDBMS to a graph database Patrick McFadin (DataStax)

14:55 Machine-learned model quality monitoring in fast data and streaming applications Emre Velipasaoglu (Lightbend)

16:35 Data-driven ecosystems in the automotive industry Tobias Burger (BMW Group), Joshua Goerner (BMW AG)

Capital Suite 2/3

11:15 Putting AI to work for business: It's a journey. (sponsored by IBM) CARLO APPUGLIESE (IBM)

14:05 A tale of two BI standards: Data warehouses and data lakes (sponsored by Arcadia Data) Randy Lea (Arcadia Data)

14:55 The IoT and AI for good (sponsored by Hitachi Vantara) Wael Elrifai (Hitachi Vantara)

16:35 The eAGLE accelerator: How to speed up migrations from legacy ETL to big data implementations Enric Biosca Trias (everis), Angel Valencia (everis)

17:25 Batch and real-time processing in LINE's log analysis platform Wataru Yukawa (LINE)

Capital Suite 4

11:15 Enabling data-driven development for autonomous driving at BMW (sponsored by BMW) Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG)

12:05 Cloud-native data science with Anaconda, Docker, and Kubernetes (sponsored by Anaconda) Mathew Lodge (Anaconda)

14:05 Operationalizing live data to benefit business (sponsored by WANdisco) Steve Kilgore (WANdisco)

14:55 Incorporating data sources inside and outside of the data center (sponsored by Cisco) Chiang Yang (Cisco)

16:35 Fortune 100 lessons: Architecting data lakes for real-time analytics and AI (sponsored by Attunity) Ted Orme (Attunity)

Auditorium
9:00 Wednesday opening welcome Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

9:05 Charting a data journey to the cloud Mick Hollison (Cloudera), Sven Loeffler (Deutsche Telekom), Robert Neumann (Ultra Tendency)

9:20 Journey to GDPR compliance Alison Howard (Microsoft)

9:35 Humans and the machine: Machine learning in context (sponsored by IBM) JEAN FRANCOIS PUGET (IBM Analytics)

9:45 Building a stronger data ecosystem Ben Lorica (O'Reilly)

9:55 The Paradise Papers: Behind the scenes with the ICIJ Pierre Romera (International Consortium of Investigative Journalists (ICIJ))

10:15 Data protection and innovation Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)

8:15 Speed Networking | Room: Auditorium Foyer

8:45 Coffee break sponsored by Confluent (7:30 - 9:00) | Room: Auditorium Foyer

19:05

10:45 Morning break | Room: Expo Hall (Capital Hall 24)

12:45 Lunch sponsored by IBM Wednesday Topic Tables at lunch | Room: Expo Hall (Capital Hall 24)

12:45 Wednesday Business Summit Lunch | Room: Expo Hall - SBS lunch (Capital Hall 24)

12:45 Women's Networking Lunch | Room: S11A

15:35 Afternoon break sponsored by Airbus | Room: Expo Hall (Capital Hall 24)

18:05 Expo Hall Reception | Room: Expo Hall (Capital Hall 24)

20:00 Data After Dark: A Night in Shoreditch (sponsored by Domino and Cloudera) | Room: Shoreditch

11:15-11:55 (40m) Big data and data science in the cloud, Data engineering and architecture

The cloud is expensive, so build your own redundant Hadoop clusters.

Stuart Pook (Criteo)

Criteo has a production cluster of 2K nodes running over 300K jobs a day in the company's own data centers. These clusters were meant to provide a redundant solution to Criteo's storage and compute needs. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo's progress in building another cluster to survive the loss of a full DC.

12:05-12:45 (40m) Data engineering and architecture, Streaming systems and real-time applications

Using a global data fabric to run a mixed cloud deployment

Jim Scott (NVIDIA)

Creating a business solution is a lot of work. Instead of building to run on a single cloud provider, it is far more cost effective to leverage the cloud as infrastructure as a service (IaaS). Jim Scott explains why a global data fabric is a requirement for running on all cloud providers simultaneously.

14:05-14:45 (40m) Big data and data science in the cloud, Data engineering and architecture

Analytics in the cloud: Building a modern cloud-based big data warehouse

Greg Rahn (Cloudera)

For many organizations, the next big data warehouse will be in the cloud. Greg Rahn shares considerations for evaluating the cloud for analytics and big data warehousing, including different architectural approaches to optimize price and performance.

14:55-15:35 (40m) Big data and data science in the cloud, Data engineering and architecture

Data science across data sources with Apache Arrow

Tomer Shiran (Dremio)

It's often impractical for organizations to physically consolidate all data into one system. Tomer Shiran offers an overview of Apache Arrow, an open source columnar, in-memory data representation that enables analytical systems and data sources to exchange and process data in real time, simplifying and accelerating data access without having to copy all data into one location.

16:35-17:15 (40m) Big data and data science in the cloud, Data engineering and architecture

Making stateless containers reliable and available even with stateful applications

Paul Curtis (Weaveworks)

The flexibility advantage conferred by containers depends on their ephemeral nature, so it’s useful to keep containers stateless. However, many applications require state—access to a scalable persistence layer that supports real mutable files, tables, and streams. Paul Curtis demonstrates how to make containerized applications reliable, available, and performant, even with stateful applications.

17:25-18:05 (40m) Big data and data science in the cloud, Data engineering and architecture

Practical advice for driving down the cost of cloud big data platforms

Christopher Royles (Cloudera)

Big data and cloud deployments return huge benefits in flexibility and economics but can also result in runaway costs and failed projects. Drawing on his production experience, Christopher Royles shares tips and best practices for determining initial sizing, strategic planning, and longer-term operation, helping you deliver an efficient platform, reduce costs, and implement a successful project.

11:15-11:55 (40m) Data engineering and architecture Data Platforms, Media, Advertising, Entertainment

Web analytics at scale with Druid at Naver

Jason Heo (Naver), Dooyong Kim (Navercorp)

Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid.

12:05-12:45 (40m) Data engineering and architecture, Data-driven business management Data Platforms, E-commerce and Retail, Transportation and Logistics

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)

Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average.

14:05-14:45 (40m) Data engineering and architecture Data Platforms, Transportation and Logistics

Audi's journey to an enterprise big data platform

Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)

Carsten Herbe and Matthias Graunitz detail Audi's journey from a Hadoop proof of concept to a multitenant enterprise platform, sharing lessons learned, the decisions Audi made, and how a number of use cases are implemented using the platform.

14:55-15:35 (40m) Data engineering and architecture Transportation and Logistics

Elastic map matching using Cloudera Altus and Apache Spark

Timo Graen (Volkswagen AG ), Robert Neumann (Ultra Tendency)

Map-matching applications exist in almost every telematics use case and are therefore crucial to all car manufacturers. Timo Graen and Robert Neumann detail the architecture behind Volkswagen Commercial Vehicle’s Altus-based map-matching application and lead a live demo featuring a map matching job in Altus.

16:35-17:15 (40m) Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications Text and Language processing and analysis

Improving DevOps and QA efficiency using machine learning and NLP methods

Ran Taig (Dell), Omer Sagi (Dell)

DevOps and QA engineers spend a significant amount of time investigating reoccurring issues. These issues are often represented by large configuration and log files, so the process of investigating whether two issues are duplicates can be a very tedious task. Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues.

17:25-18:05 (40m) Data engineering and architecture, Streaming systems and real-time applications

Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am

Holden Karau (Independent), Rachel Warren (Salesforce Einstein)

Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads.

11:15-11:55 (40m) Data engineering and architecture Security and Privacy

Architecting data platforms for cybersecurity

Charaka Goonatilake (Panaseer)

Data is becoming a crucial weapon to secure an organization against cyber threats. Charaka Goonatilake shares strategies for designing effective data platforms for cybersecurity using big data technologies, such as Spark and Hadoop, and explains how these platforms are being used in real-world examples of data-driven security.

12:05-12:45 (40m) Data engineering and architecture, Law, ethics, and governance, Platform security and cybersecurity Security and Privacy

Hadoop under attack: Securing data in a banking domain

Federico Leven (ReactoData)

The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, particularly in the Cloudera stack. Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented.

14:05-14:45 (40m) Big data and data science in the cloud, Data engineering and architecture, Data-driven business management, Emerging technologies and case studies, Platform security and cybersecurity, Streaming systems and real-time applications Security and Privacy

GPU-accelerated threat detection with GOAI

Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)

Joshua Patterson and Mike Wendt explain how NVIDIA used GPU-accelerated open source technologies to improve its cyberdefense platforms by leveraging software from the GPU Open Analytics Initiative (GOAI) and how the company accelerated anomaly detection with more efficient machine learning models, faster deployment, and more granular data exploration.

14:55-15:35 (40m) Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

Lee Blum (Verint Systems)

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results.

16:35-17:15 (40m) Data engineering and architecture, Platform security and cybersecurity Security and Privacy

How to protect big data in a containerized environment

Thomas Phelan (HPE BlueData)

Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE), but TDE can be difficult to configure and manage—issues that are only compounded when running on Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them.

17:25-18:05 (40m) Big data and data science in the cloud, Data engineering and architecture, Law, ethics, and governance, Platform security and cybersecurity Security and Privacy

Security, governance, and cloud analytics, oh my!

Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)

Having so many cloud-based analytics services available is a dream come true. However, it's a nightmare to manage proper security and governance across all those different services. Nikki Rouda and Nick Curcuru share advice on how to minimize the risk and effort in protecting and managing data for multidisciplinary analytics and explain how to avoid the hassle and extra cost of siloed approaches.

11:15-11:55 (40m) Data engineering and architecture, Streaming systems and real-time applications

Processing fast data with Apache Spark: A tale of two APIs

Gerard Maas (Lightbend)

Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences in key aspects of a streaming application, from the API user experience to dealing with time and with state and machine learning capabilities, and shares practical guidance on picking one or combining both to implement resilient streaming pipelines.

12:05-12:45 (40m) Data engineering and architecture Telecom

How BT delivers better broadband and TV using Spark and Kafka

Phillip Radley (BT)

In the past year, British Telecom has added a streaming network analytics use case to its multitenant data platform. Phillip Radley demonstrates how the solution works and explains how it delivers better broadband and TV services, using Kafka and Spark on YARN and HDFS encryption.

14:05-14:45 (40m) Data engineering and architecture, Streaming systems and real-time applications

Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka

Michael Noll (Confluent)

Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL.

14:55-15:35 (40m) Data engineering and architecture, Law, ethics, and governance, Streaming systems and real-time applications

Multi-data center and multitenant durable messaging with Apache Pulsar

Ivan Kelly (Streamlio)

Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access.

16:35-17:15 (40m) Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications

Kafka in jail: Running Kafka in container-orchestrated clusters

Sean Glover (Lightbend)

Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters.

17:25-18:05 (40m) Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications

Stream processing for the practitioner: Blueprints for common stream processing use cases with Apache Flink

Aljoscha Krettek (Ververica)

Aljoscha Krettek offers an overview of the modern stream processing space, details the challenges posed by stateful and event-time-aware stream processing, and shares core archetypes ("application blueprints”) for stream processing drawn from real-world use cases with Apache Flink.

11:15-11:55 (40m) Data science and machine learning, Data-driven business management Transportation and Logistics

Data science survival and growth within the corporate jungle: An easyJet case study

Alberto Rey Villaverde (easyJet), Grigorios Mingas (easyJet)

Because in-house data science teams work with a range of business functions, traditional data science processes are often too abstract to cope with the complexity of these environments. Alberto Rey Villaverde and Grigorios Mingas share case studies from easyJet that highlight some unpredictable hurdles related to requirements, data, infrastructure, and deployment and explain how they solved them.

12:05-12:45 (40m) Data science and machine learning Financial Services

Risk-sharing pools: Winning zero-sum games through machine learning

Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)

Risk-sharing pools allow insurers to get rid of risks they are forced to insure in highly regulated markets. Insurers thus cede both the risk and its premium. But are they ceding the right risk or simply giving up premium? Baiju Devani and Étienne Chassé St-Laurent share an applied machine learning approach that leverages an ensemble of models to gain a distinctive market advantage.

14:05-14:45 (40m) Data science and machine learning, Data-driven business management, Emerging technologies and case studies

Building a healthcare decision support system for ICD10/HCC coding through deep learning

Manas Ranjan Kar (Episource)

Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work.

14:55-15:35 (40m) Data science and machine learning

DataOps: Nine steps to transform your data science impact

Harvinder Atwal (Moneysupermarket)

Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, and shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, and more.

16:35-17:15 (40m) Data science and machine learning Text and Language processing and analysis

Narrative extraction: Analyzing the world’s narratives through natural language understanding

Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)

Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication.

17:25-18:05 (40m) Data science and machine learning, Emerging technologies and case studies, Law, ethics, and governance

Rent, rain, and regulations: Leveraging structure in big data to predict criminal activity

Jorie Koster-Hale (Dataiku)

Because crime is affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future.

11:15-11:55 (40m) Data science and machine learning, Law, ethics, and governance Security and Privacy

How will the GDPR impact machine learning?

Steven Touw (Immuta)

The Strata Data conference in London takes place during one of the most important weeks in the history of data regulation, as GDPR begins to be enforced. Steve Touw explores the effects of the GDPR on deploying machine learning models in the EU.

12:05-12:45 (40m) Data science and machine learning Media, Advertising, Entertainment, Security and Privacy

Fairness and diversity in online social systems

Elisa Celis (EPFL)

There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization.

14:05-14:45 (40m) Data science and machine learning, Streaming systems and real-time applications Telecom, Time Series and Graphs

StreamDM: Advanced data science with Spark Streaming

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noah’s Ark Lab and Télécom ParisTech.

14:55-15:35 (40m) Data science and machine learning E-commerce and Retail, Financial Services, Time Series and Graphs

Machine learning for time series: What works and what doesn't

Mikio Braun (Zalando)

Time series data has many applications in industry, in particular predicting the future based on historical data. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn't.

16:35-17:15 (40m) Data science and machine learning Time Series and Graphs

Correlation analysis on live data streams

Arun Kejariwal (Independent), Francois Orsini (MZ)

The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.

17:25-18:05 (40m) Data science and machine learning Security and Privacy, Time Series and Graphs

Code Property Graph: A modern, queryable data storage for source code

Fabian Yamaguchi (ShiftLeft)

Fabian Yamaguchi offers an overview of Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed.

11:15-11:55 (40m) Data science and machine learning

Distributed training of deep learning models

Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)

Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud, using a ResNet network trained on the ImageNet dataset as an example. You'll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for.

12:05-12:45 (40m) Data science and machine learning E-commerce and Retail, Media, Advertising, Entertainment

Deep learning for recommender systems

Nick Pentreath (IBM)

In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice.

14:05-14:45 (40m) Data science and machine learning, Streaming systems and real-time applications Security and Privacy

Real-time deep learning on video streams

eran avidan (Intel)

Deep learning is revolutionizing many domains within computer vision, but doing real-time analysis is challenging. Eran Avidan offers an overview of a novel architecture based on Redis, Docker, and TensorFlow that enables real-time analysis of high-resolution streaming video.

14:55-15:35 (40m) Data science and machine learning, Emerging technologies and case studies

Deep computer vision for manufacturing

Aurélien Géron (Kiwisoft)

Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact on manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way.

16:35-17:15 (40m) Big data and data science in the cloud, Data science and machine learning Data Integration and Data Pipelines sessions, Media, Advertising, Entertainment

Using Siamese CNNs for removing duplicate entries from real estate listing databases

Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)

Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Sergey Ermolin and Olga Ermolin detail an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages a transfer learning Siamese architecture based on VGG-16 CNN topology.

17:25-18:05 (40m) Data science and machine learning, Emerging technologies and case studies Text and Language processing and analysis

Using LSTMs to aid professional translators

Darren Cook (QQ Trend)

Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English.

11:15-11:55 (40m) Data science and machine learning, Law, ethics, and governance Media, Advertising, Entertainment, Security and Privacy

Finding bias in social media recommendations

Guillaume Chaslot (AlgoTransparency)

An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines.

12:05-12:45 (40m) Data science and machine learning, Visualization and user experience Visualization, Design, and UX

Data visualization in a big data world

Jeff Fletcher (Cloudera)

As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models.

14:05-14:45 (40m) Data science and machine learning, Data-driven business management, Law, ethics, and governance

Designing ethical artificial intelligence

Jivan Virdee (Fjord), Hollie Lubbock (Fjord)

Artificial intelligence systems are powerful agents of change in our society, but as this technology becomes increasingly prevalent—transforming our understanding of ourselves and our society—issues around ethics and regulation will arise. Jivan Virdee and Hollie Lubbock explore how to address fairness, accountability, and the long-term effects on our society when designing with data.

14:55-15:35 (40m) Data science and machine learning, Data-driven business management, Visualization and user experience Visualization, Design, and UX

The business leader’s guide to designing indispensable analytics solutions and data products

Brian O'Neill (Designing for Analytics)

Gartner says 85%+ of big data projects will fail. Your own company may have even spent millions on a recent project that isn’t really delivering the value or UX everyone hoped for. Brian O'Neill explains why CDOs, PMs, and business leaders who leverage design to prioritize utility, usability, and customer value will realize the best ROIs and demonstrates how to start evaluating your UX.

16:35-17:15 (40m) Data science and machine learning, Visualization and user experience Visualization, Design, and UX

Architectural design for interactive visualization

Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)

Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity.

17:25-18:05 (40m) Data science and machine learning

Democratizing data within your organization

Mark Grover (Lyft), Deepak Tiwari (Lyft)

Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data.

11:15-11:55 (40m) Strata Business Summit Financial Services

Leveraging public-private partnerships using data analytics for economic insights

Audrey Lobo-Pulo (Phoensight), Nicholas O'Donnell (LinkedIn)

In October 2017, LinkedIn and the Australian Treasury teamed up to gain a deeper understanding of the Australian labor market through new data insights, which may inform economic policy and directly benefit society. Audrey Lobo-Pulo and Nick O'Donnell share some of the discoveries from this collaboration as well as the practicalities of working in a public-private partnership.

12:05-12:45 (40m) Data-driven business management, Strata Business Summit, Streaming systems and real-time applications Telecom, Time Series and Graphs

The app trap: Why every mobile app and mobile operator needs anomaly detection

Ira Cohen (Anodot)

The mobile world has so many moving parts that a simple change to one element can cause havoc somewhere else, resulting in issues that annoy users and cause revenue leaks. Ira Cohen outlines ways to use anomaly detection to track everything mobile, from the service and roaming to specific apps, to fully optimize your mobile offerings.

14:05-14:45 (40m) Data science and machine learning Data Integration and Data Pipelines sessions

Solving data cleaning and unification using human-guided machine learning

Ihab Ilyas (University of Waterloo)

Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution.

14:55-15:35 (40m) Data-driven business management, Strata Business Summit

Successful data cultures: Inclusivity, empathy, retention, and results

Kim Nilsson (Pivigo), Phil Harvey (Microsoft)

Our lives are being transformed by data, changing our understanding of work, play, and health. Every organization can take advantage of this resource, but something is holding us back: us. Kim Nilsson and Phil Harvey explain how to build a successful data culture that embeds data at the heart of every organization through people and delivers success through empathy, communication, and humanity.

16:35-17:15 (40m) Data-driven business management, Emerging technologies and case studies, Law, ethics, and governance, Strata Business Summit

Data Collaboratives

Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)

Jude McCorry and Mahmood Adil offer an overview of Data Collaboratives, a new form of collaboration beyond the public-private partnership model, in which participants from different sectors exchange data, skills, leadership, and knowledge to solve complex problems facing children in Scotland and worldwide.

17:25-18:05 (40m) Data-driven business management, Emerging technologies and case studies, Strata Business Summit

Blind men and elephants: What’s missing from your big data?

Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)

Big data analytics tends to focus on what is easily available, which is by and large data about what has already happened, the implicit assumption being that past behavior will predict future behavior. Organizations already possess data they aren’t exploiting. Barry Singleton and Richard Goyder explain how, with the right tools, it can be used to develop far more powerful predictive algorithms.

11:15-11:55 (40m) Executive Briefing, Law, ethics, and governance, Strata Business Summit Financial Services, Security and Privacy

Executive Briefing: GDPR—Getting your data ready for heavy, new EU privacy regulations

Mark Donsky (Okera), Syed Rafice (Cloudera)

In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Syed Rafice outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations.

12:05-12:45 (40m) Data-driven business management, Executive Briefing, Strata Business Summit

Executive Briefing: Becoming a data-driven enterprise—A maturity model

Teresa Tung (Accenture), Jean-Luc Chatelain (Accenture)

A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung and Jean-Luc Chatelain explain how to create a data-driven enterprise maturity model that spans technology and business requirements and walk you through use cases that bring the model to life.

14:05-14:45 (40m) Executive Briefing, Strata Business Summit

Executive Briefing: Lessons learned managing data science projects—Adopting a team data science process

Danielle Dean (iRobot)

Danielle Dean covers the basics of managing data science projects, including the data science lifecycle, and offers an overview of an internal approach at Microsoft called the Team Data Science Process (TDSP). Join in to learn more about the typical priorities of data science teams and the keys to success on engaging and creating value with data science.

14:55-15:35 (40m) Data-driven business management, Executive Briefing, Strata Business Summit, Streaming systems and real-time applications

Executive Briefing: What you need to know about fast data

Dean Wampler (Anyscale)

Streaming data systems, so called fast data, promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler outlines what you need to know to exploit fast data successfully.

16:35-17:15 (40m) Executive Briefing, Strata Business Summit

Executive Briefing: BI on big data

Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)

If your goal is to provide data to an analyst rather than a data scientist, what’s the best way to deliver analytics? There are 70+ BI tools in the market and a dozen or more SQL- or OLAP-on-Hadoop open source projects. Mark Madsen and Shant Hovsepian discuss the trade-offs between a number of architectures that provide self-service access to data.

17:25-18:05 (40m) Data-driven business management, Executive Briefing, Strata Business Summit Managing and Deploying Machine Learning

Executive Briefing: Why machine-learned models crash and burn in production and what to do about it

David Talby (Pacific AI)

Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries.

11:15-11:55 (40m) Data science and machine learning, Data-driven business management, Emerging technologies and case studies, Expo Hall Media, Advertising, Entertainment

Revolutionizing the newsroom with artificial intelligence

Dan Gilbert (News UK), Jonathan Leslie (Pivigo)

In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task.

12:05-12:45 (40m) Data science and machine learning, Expo Hall

Interpretable AI: Can we trust machine learning?

Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)

Konstantinos Georgatzis and Martha Imprialou explain how to interpret the predictions given by your black-box model and how machine learning is helping to drive decision making today.

14:05-14:45 (40m) Data engineering and architecture, Expo Hall Time Series and Graphs

Time for a new relation: Going from RDBMS to a graph database

Patrick McFadin (DataStax)

Graph databases are becoming mainstream. Patrick McFadin explains how to use the knowledge you have gained from your years of working with relational databases in this brave new world. There are many similarities but also some significant differences that can open up completely new use cases. If you're deciding whether to take the plunge into graph databases, this is the talk for you.

14:55-15:35 (40m) Data science and machine learning, Expo Hall, Streaming systems and real-time applications Managing and Deploying Machine Learning

Machine-learned model quality monitoring in fast data and streaming applications

Emre Velipasaoglu (Lightbend)

Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications.

16:35-17:15 (40m) Data engineering and architecture, Expo Hall

Data-driven ecosystems in the automotive industry

Tobias Burger (BMW Group), Joshua Goerner (BMW AG)

The BMW Group IT team drives the usage of data-driven technologies and forms the nucleus of a data-centric culture inside of the organization. Tobias Bürger and Joshua Görner discuss the E-to-E relationship of data and models and share best practices for scaling applications in real-world environments.

11:15-11:55 (40m) Sponsored

Putting AI to work for business: It's a journey. (sponsored by IBM)

CARLO APPUGLIESE (IBM)

What was once science fiction has now become reality as multiple AI consumer-based solutions have hit the market over last few years. In turn, consumers have become more comfortable interacting with AI. But has AI really lived up to the hype? For consumers, perhaps not yet. However, AI for business is a different (and more valuable) animal. Carlo Appugliese details how business can put AI to work.

14:05-14:45 (40m) Sponsored

A tale of two BI standards: Data warehouses and data lakes (sponsored by Arcadia Data)

Randy Lea (Arcadia Data)

Business intelligence (BI) and analytics on data lakes have had limited success. Data lakes often fall short because they are mostly used by data scientists and not by business users. Randy Lea explains why existing BI tools work well for data warehouses but not data lakes and why modern BI tools designed for data lakes should represent the second BI standard in enterprises today.

14:55-15:35 (40m) Sponsored

The IoT and AI for good (sponsored by Hitachi Vantara)

Wael Elrifai (Hitachi Vantara)

Wael Elrifai shares his experiences working in the IoT and AI spaces, covering complexities, pitfalls, and opportunities to explain why innovation isn’t just good for business—it's a societal imperative.

16:35-17:15 (40m) Data engineering and architecture

The eAGLE accelerator: How to speed up migrations from legacy ETL to big data implementations

Enric Biosca Trias (everis), Angel Valencia (everis)

Enric Biosca offers an overview of the eAGLE accelerator, which speeds up migration processes from legacy ETL to big data implementations by enabling auditing, lineage, and translation of legacy code for big data. Along the way, Enric demonstrates how graph and automatic translation technologies help companies reduce their migration times.

17:25-18:05 (40m) Data engineering and architecture

Batch and real-time processing in LINE's log analysis platform

Wataru Yukawa (LINE)

LINE—one of the most popular messaging applications in Asia—offers many services, such as its news application. These services sometimes depend on real-time processing. Wataru Yukawa offers an overview of LINE's web tracking system, which consists of the JavaScript SDK, NGINX Fluentd, Kafka, Elasticsearch, and Hadoop, and explains how it helps with batch and real-time processing.

11:15-11:55 (40m) Sponsored

Enabling data-driven development for autonomous driving at BMW (sponsored by BMW)

Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG)

The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry. Miha Pelko and Aleksandr Melkonyan outline these challenges and explain how BMW is overcoming them by adapting and reinventing existing big data solutions for autonomous driving.

12:05-12:45 (40m) Sponsored

Cloud-native data science with Anaconda, Docker, and Kubernetes (sponsored by Anaconda)

Mathew Lodge (Anaconda)

The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Mathew Lodge demonstrates that it's just as easy to deploy Python as it is Java, using containers and Kubernetes. Welcome to the future.

14:05-14:45 (40m) Sponsored

Operationalizing live data to benefit business (sponsored by WANdisco)

Steve Kilgore (WANdisco)

Today, every company is a data company. Business success depends on putting large volumes of live data to work to drive competitive advantage. Paul Phillips details how some of the world’s largest companies have achieved 100% uptime while moving massive live datasets and halving their hardware requirements.

14:55-15:35 (40m) Sponsored

Incorporating data sources inside and outside of the data center (sponsored by Cisco)

Chiang Yang (Cisco)

Han Yang explains how Cisco is leveraging big data and analytics and details how the company is helping customers to incorporate data sources from the internet of things and deploy machine learning at the edge and at the enterprise.

16:35-17:15 (40m) Sponsored

Fortune 100 lessons: Architecting data lakes for real-time analytics and AI (sponsored by Attunity)

Ted Orme (Attunity)

Modern analytics and AI initiatives require an adaptable data lake with a multistage architectural design to effectively ingest, stage, and provision specific datasets in real time. Ted Orme discusses his experience at Attunity creating a real-time data integration solution for Fortune 100 organizations and shares best practices and lessons learned along the way.

9:00-9:05 (5m)

Wednesday opening welcome

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes.

9:05-9:20 (15m)

Charting a data journey to the cloud

Mick Hollison (Cloudera), Sven Loeffler (Deutsche Telekom), Robert Neumann (Ultra Tendency)

What happens when you combine near-limitless data with on-demand access to powerful analytics and compute? For Deutsche Telekom, the results have been transformative. Mick Hollison, Sven Löffler, and Robert Neumann explain how Deutsche Telekom is harnessing machine learning and analytics in the cloud to build Europe’s largest and most powerful IoT data marketplace.

9:20-9:35 (15m)

Journey to GDPR compliance

Alison Howard (Microsoft)

May 25, the day the GDPR goes into effect, is an important milestone for data protection in the EU and elsewhere, but the journey to GDPR compliance neither begins nor ends there. Alison Howard explains how Microsoft, one of the world’s largest companies, with operations across the EU and around the globe, has prepared for May 25 and beyond.

9:35-9:45 (10m) Sponsored keynote

Humans and the machine: Machine learning in context (sponsored by IBM)

JEAN FRANCOIS PUGET (IBM Analytics)

On the way to active analytics for business, we have to answer two big questions: What must happen to data before running machine learning algorithms, and how should machine learning output be used to generate actual business value? Jean-François Puget demonstrates the vital role of human context in answering those questions.

9:45-9:55 (10m)

Building a stronger data ecosystem

Ben Lorica (O'Reilly)

To enable the machine learning applications of the future, there remain many interesting and challenging data problems we need to tackle as a community. Ben Lorica discusses some of the pressing problems we're facing as we collect and store data, particularly in an era when our machine learning models require huge amounts of labeled data.

9:55-10:10 (15m)

The Paradise Papers: Behind the scenes with the ICIJ

Pierre Romera (International Consortium of Investigative Journalists (ICIJ))

Last November, the International Consortium of Investigative Journalists (ICIJ) published the Paradise Papers, a yearlong investigation on the offshore dealings of multinational companies and the wealthy. Pierre Romera offers a behind-the-scenes look into the process and explores the challenges in handling 1.4 TB of data and making it available securely to journalists all over the world.

10:15-10:30 (15m)

Data protection and innovation

Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)

Keynote with Eva Kaili

8:15-8:45 (30m)

Speed Networking

Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees.

8:45-9:00 (15m)

Break: Coffee break sponsored by Confluent (7:30 - 9:00)

19:05-20:00 (55m)

Plenary

10:45-11:15 (30m)

Break: Morning break

12:45-14:05 (1h 20m)

Wednesday Topic Tables at lunch

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.

12:45-14:05 (1h 20m)

Wednesday Business Summit Lunch

Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers.

12:45-14:05 (1h 20m)

Women's Networking Lunch

If you’re looking to find like minds and make new professional connections, come to the Women's Networking Lunch on Wednesday.

15:35-16:35 (1h)

Break: Afternoon break sponsored by Airbus

18:05-19:05 (1h)

Expo Hall Reception

Unwind after a long day of sessions with small bites and drinks while networking with Strata attendees, exhibitors, and sponsors.

20:00-22:00 (2h)

Data After Dark: A Night in Shoreditch (sponsored by Domino and Cloudera)

Enjoy great food and drink at Data After Dark: A Night in Shoreditch. Be sure to take in the street art as you make your way between Zigfrid von Underbelly and Trapeze Bar.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

Schedule List ViewGrid View

Topics

Sponsorship Opportunities

Partner Opportunities

Contact Us

Schedule List View Grid View