Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK
 
S11A
S11B
11:15 Web analytics at scale with Druid at Naver Jason Heo (Naver), Dooyong Kim (Navercorp)
12:05 Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)
14:05 Audi's journey to an enterprise big data platform Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)
14:55 Elastic map matching using Cloudera Altus and Apache Spark Timo Graen (Volkswagen AG ), Robert Neumann (Ultra Tendency)
17:25 Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am Holden Karau (Independent), Rachel Warren (Salesforce Einstein)
Capital Suite 7
11:15 Architecting data platforms for cybersecurity Charaka Goonatilake (Panaseer)
12:05 Hadoop under attack: Securing data in a banking domain Federico Leven (ReactoData)
14:05 GPU-accelerated threat detection with GOAI Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)
16:35 How to protect big data in a containerized environment Thomas Phelan (HPE BlueData)
17:25 Security, governance, and cloud analytics, oh my! Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)
Capital Suite 8/9
Capital Suite 10/11
11:15 Data science survival and growth within the corporate jungle: An easyJet case study Alberto Rey Villaverde (easyJet), Grigorios Mingas (easyJet)
12:05 Risk-sharing pools: Winning zero-sum games through machine learning Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)
14:55 DataOps: Nine steps to transform your data science impact Harvinder Atwal (Moneysupermarket)
16:35 Narrative extraction: Analyzing the world’s narratives through natural language understanding Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)
Capital Suite 12
11:15 How will the GDPR impact machine learning? Steven Touw (Immuta)
14:05 StreamDM: Advanced data science with Spark Streaming Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
16:35 Correlation analysis on live data streams Arun Kejariwal (Independent), Francois Orsini (MZ)
Capital Suite 13
11:15 Distributed training of deep learning models Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)
12:05 Deep learning for recommender systems Nick Pentreath (IBM)
14:05 Real-time deep learning on video streams eran avidan (Intel)
14:55 Deep computer vision for manufacturing Aurélien Géron (Kiwisoft)
16:35 Using Siamese CNNs for removing duplicate entries from real estate listing databases Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)
17:25 Using LSTMs to aid professional translators Darren Cook (QQ Trend)
Capital Suite 14
11:15 Finding bias in social media recommendations Guillaume Chaslot (AlgoTransparency)
12:05 Data visualization in a big data world Jeff Fletcher (Cloudera)
14:05 Designing ethical artificial intelligence Jivan Virdee (Fjord), Hollie Lubbock (Fjord)
16:35 Architectural design for interactive visualization Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)
17:25 Democratizing data within your organization Mark Grover (Lyft), Deepak Tiwari (Lyft)
Capital Suite 15/16
11:15 Leveraging public-private partnerships using data analytics for economic insights Audrey Lobo-Pulo (Phoensight), Nicholas O'Donnell (LinkedIn)
14:55 Successful data cultures: Inclusivity, empathy, retention, and results Kim Nilsson (Pivigo), Phil Harvey (Microsoft)
16:35 Data Collaboratives Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)
17:25 Blind men and elephants: What’s missing from your big data? Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)
Capital Suite 17
12:05 Executive Briefing: Becoming a data-driven enterprise—A maturity model Teresa Tung (Accenture), Jean-Luc Chatelain (Accenture)
16:35 Executive Briefing: BI on big data Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)
Expo Hall
11:15 Revolutionizing the newsroom with artificial intelligence Dan Gilbert (News UK), Jonathan Leslie (Pivigo)
12:05 Interpretable AI: Can we trust machine learning? Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)
16:35 Data-driven ecosystems in the automotive industry Tobias Burger (BMW Group), Joshua Goerner (BMW AG)
Capital Suite 2/3
14:55 The IoT and AI for good (sponsored by Hitachi Vantara) Wael Elrifai (Hitachi Vantara)
Capital Suite 4
11:15 Enabling data-driven development for autonomous driving at BMW (sponsored by BMW) Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG)
Auditorium
9:00 Wednesday opening welcome Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
9:05 Charting a data journey to the cloud Mick Hollison (Cloudera), Sven Loeffler (Deutsche Telekom), Robert Neumann (Ultra Tendency)
9:20 Journey to GDPR compliance Alison Howard (Microsoft)
9:45 Building a stronger data ecosystem Ben Lorica (O'Reilly)
9:55 The Paradise Papers: Behind the scenes with the ICIJ Pierre Romera (International Consortium of Investigative Journalists (ICIJ))
10:15 Data protection and innovation Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)
8:15 Speed Networking | Room: Auditorium Foyer
8:45 Coffee break sponsored by Confluent (7:30 - 9:00) | Room: Auditorium Foyer
19:05
10:45 Morning break | Room: Expo Hall (Capital Hall 24)
12:45 Lunch sponsored by IBM Wednesday Topic Tables at lunch | Room: Expo Hall (Capital Hall 24)
12:45 Wednesday Business Summit Lunch | Room: Expo Hall - SBS lunch (Capital Hall 24)
12:45 Women's Networking Lunch | Room: S11A
15:35 Afternoon break sponsored by Airbus | Room: Expo Hall (Capital Hall 24)
18:05 Expo Hall Reception | Room: Expo Hall (Capital Hall 24)
11:15-11:55 (40m) Big data and data science in the cloud, Data engineering and architecture
The cloud is expensive, so build your own redundant Hadoop clusters.
Stuart Pook (Criteo)
Criteo has a production cluster of 2K nodes running over 300K jobs a day in the company's own data centers. These clusters were meant to provide a redundant solution to Criteo's storage and compute needs. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo's progress in building another cluster to survive the loss of a full DC.
12:05-12:45 (40m) Data engineering and architecture, Streaming systems and real-time applications
Using a global data fabric to run a mixed cloud deployment
Jim Scott (NVIDIA)
Creating a business solution is a lot of work. Instead of building to run on a single cloud provider, it is far more cost effective to leverage the cloud as infrastructure as a service (IaaS). Jim Scott explains why a global data fabric is a requirement for running on all cloud providers simultaneously.
14:05-14:45 (40m) Big data and data science in the cloud, Data engineering and architecture
Analytics in the cloud: Building a modern cloud-based big data warehouse
Greg Rahn (Cloudera)
For many organizations, the next big data warehouse will be in the cloud. Greg Rahn shares considerations for evaluating the cloud for analytics and big data warehousing, including different architectural approaches to optimize price and performance.
14:55-15:35 (40m) Big data and data science in the cloud, Data engineering and architecture
Data science across data sources with Apache Arrow
Tomer Shiran (Dremio)
It's often impractical for organizations to physically consolidate all data into one system. Tomer Shiran offers an overview of Apache Arrow, an open source columnar, in-memory data representation that enables analytical systems and data sources to exchange and process data in real time, simplifying and accelerating data access without having to copy all data into one location.
16:35-17:15 (40m) Big data and data science in the cloud, Data engineering and architecture
Making stateless containers reliable and available even with stateful applications
Paul Curtis (Weaveworks)
The flexibility advantage conferred by containers depends on their ephemeral nature, so it’s useful to keep containers stateless. However, many applications require state—access to a scalable persistence layer that supports real mutable files, tables, and streams. Paul Curtis demonstrates how to make containerized applications reliable, available, and performant, even with stateful applications.
17:25-18:05 (40m) Big data and data science in the cloud, Data engineering and architecture
Practical advice for driving down the cost of cloud big data platforms
Christopher Royles (Cloudera)
Big data and cloud deployments return huge benefits in flexibility and economics but can also result in runaway costs and failed projects. Drawing on his production experience, Christopher Royles shares tips and best practices for determining initial sizing, strategic planning, and longer-term operation, helping you deliver an efficient platform, reduce costs, and implement a successful project.
11:15-11:55 (40m) Data engineering and architecture Data Platforms, Media, Advertising, Entertainment
Web analytics at scale with Druid at Naver
Jason Heo (Naver), Dooyong Kim (Navercorp)
Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid.
12:05-12:45 (40m) Data engineering and architecture, Data-driven business management Data Platforms, E-commerce and Retail, Transportation and Logistics
Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks
Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)
Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average.
14:05-14:45 (40m) Data engineering and architecture Data Platforms, Transportation and Logistics
Audi's journey to an enterprise big data platform
Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)
Carsten Herbe and Matthias Graunitz detail Audi's journey from a Hadoop proof of concept to a multitenant enterprise platform, sharing lessons learned, the decisions Audi made, and how a number of use cases are implemented using the platform.
14:55-15:35 (40m) Data engineering and architecture Transportation and Logistics
Elastic map matching using Cloudera Altus and Apache Spark
Timo Graen (Volkswagen AG ), Robert Neumann (Ultra Tendency)
Map-matching applications exist in almost every telematics use case and are therefore crucial to all car manufacturers. Timo Graen and Robert Neumann detail the architecture behind Volkswagen Commercial Vehicle’s Altus-based map-matching application and lead a live demo featuring a map matching job in Altus.
16:35-17:15 (40m) Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications Text and Language processing and analysis
Improving DevOps and QA efficiency using machine learning and NLP methods
Ran Taig (Dell), Omer Sagi (Dell)
DevOps and QA engineers spend a significant amount of time investigating reoccurring issues. These issues are often represented by large configuration and log files, so the process of investigating whether two issues are duplicates can be a very tedious task. Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues.
17:25-18:05 (40m) Data engineering and architecture, Streaming systems and real-time applications
Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am
Holden Karau (Independent), Rachel Warren (Salesforce Einstein)
Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads.
11:15-11:55 (40m) Data engineering and architecture Security and Privacy
Architecting data platforms for cybersecurity
Charaka Goonatilake (Panaseer)
Data is becoming a crucial weapon to secure an organization against cyber threats. Charaka Goonatilake shares strategies for designing effective data platforms for cybersecurity using big data technologies, such as Spark and Hadoop, and explains how these platforms are being used in real-world examples of data-driven security.
12:05-12:45 (40m) Data engineering and architecture, Law, ethics, and governance, Platform security and cybersecurity Security and Privacy
Hadoop under attack: Securing data in a banking domain
Federico Leven (ReactoData)
The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, particularly in the Cloudera stack. Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented.
14:05-14:45 (40m) Big data and data science in the cloud, Data engineering and architecture, Data-driven business management, Emerging technologies and case studies, Platform security and cybersecurity, Streaming systems and real-time applications Security and Privacy
GPU-accelerated threat detection with GOAI
Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)
Joshua Patterson and Mike Wendt explain how NVIDIA used GPU-accelerated open source technologies to improve its cyberdefense platforms by leveraging software from the GPU Open Analytics Initiative (GOAI) and how the company accelerated anomaly detection with more efficient machine learning models, faster deployment, and more granular data exploration.
14:55-15:35 (40m) Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications
The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense
Lee Blum (Verint Systems)
Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results.
16:35-17:15 (40m) Data engineering and architecture, Platform security and cybersecurity Security and Privacy
How to protect big data in a containerized environment
Thomas Phelan (HPE BlueData)
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE), but TDE can be difficult to configure and manage—issues that are only compounded when running on Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them.
17:25-18:05 (40m) Big data and data science in the cloud, Data engineering and architecture, Law, ethics, and governance, Platform security and cybersecurity Security and Privacy
Security, governance, and cloud analytics, oh my!
Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)
Having so many cloud-based analytics services available is a dream come true. However, it's a nightmare to manage proper security and governance across all those different services. Nikki Rouda and Nick Curcuru share advice on how to minimize the risk and effort in protecting and managing data for multidisciplinary analytics and explain how to avoid the hassle and extra cost of siloed approaches.
11:15-11:55 (40m) Data engineering and architecture, Streaming systems and real-time applications
Processing fast data with Apache Spark: A tale of two APIs
Gerard Maas (Lightbend)
Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences in key aspects of a streaming application, from the API user experience to dealing with time and with state and machine learning capabilities, and shares practical guidance on picking one or combining both to implement resilient streaming pipelines.
12:05-12:45 (40m) Data engineering and architecture Telecom
How BT delivers better broadband and TV using Spark and Kafka
Phillip Radley (BT)
In the past year, British Telecom has added a streaming network analytics use case to its multitenant data platform. Phillip Radley demonstrates how the solution works and explains how it delivers better broadband and TV services, using Kafka and Spark on YARN and HDFS encryption.
14:05-14:45 (40m) Data engineering and architecture, Streaming systems and real-time applications
Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka
Michael Noll (Confluent)
Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL.
14:55-15:35 (40m) Data engineering and architecture, Law, ethics, and governance, Streaming systems and real-time applications
Multi-data center and multitenant durable messaging with Apache Pulsar
Ivan Kelly (Streamlio)
Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access.
16:35-17:15 (40m) Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications
Kafka in jail: Running Kafka in container-orchestrated clusters
Sean Glover (Lightbend)
Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters.
17:25-18:05 (40m) Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Stream processing for the practitioner: Blueprints for common stream processing use cases with Apache Flink
Aljoscha Krettek (Ververica)
Aljoscha Krettek offers an overview of the modern stream processing space, details the challenges posed by stateful and event-time-aware stream processing, and shares core archetypes ("application blueprints”) for stream processing drawn from real-world use cases with Apache Flink.
11:15-11:55 (40m) Data science and machine learning, Data-driven business management Transportation and Logistics
Data science survival and growth within the corporate jungle: An easyJet case study
Alberto Rey Villaverde (easyJet), Grigorios Mingas (easyJet)
Because in-house data science teams work with a range of business functions, traditional data science processes are often too abstract to cope with the complexity of these environments. Alberto Rey Villaverde and Grigorios Mingas share case studies from easyJet that highlight some unpredictable hurdles related to requirements, data, infrastructure, and deployment and explain how they solved them.
12:05-12:45 (40m) Data science and machine learning Financial Services
Risk-sharing pools: Winning zero-sum games through machine learning
Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)
Risk-sharing pools allow insurers to get rid of risks they are forced to insure in highly regulated markets. Insurers thus cede both the risk and its premium. But are they ceding the right risk or simply giving up premium? Baiju Devani and Étienne Chassé St-Laurent share an applied machine learning approach that leverages an ensemble of models to gain a distinctive market advantage.
14:05-14:45 (40m) Data science and machine learning, Data-driven business management, Emerging technologies and case studies
Building a healthcare decision support system for ICD10/HCC coding through deep learning
Manas Ranjan Kar (Episource)
Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work.
14:55-15:35 (40m) Data science and machine learning
DataOps: Nine steps to transform your data science impact
Harvinder Atwal (Moneysupermarket)
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, and shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, and more.
16:35-17:15 (40m) Data science and machine learning Text and Language processing and analysis
Narrative extraction: Analyzing the world’s narratives through natural language understanding
Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)
Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication.
17:25-18:05 (40m) Data science and machine learning, Emerging technologies and case studies, Law, ethics, and governance
Rent, rain, and regulations: Leveraging structure in big data to predict criminal activity
Jorie Koster-Hale (Dataiku)
Because crime is affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future.
11:15-11:55 (40m) Data science and machine learning, Law, ethics, and governance Security and Privacy
How will the GDPR impact machine learning?
Steven Touw (Immuta)
The Strata Data conference in London takes place during one of the most important weeks in the history of data regulation, as GDPR begins to be enforced. Steve Touw explores the effects of the GDPR on deploying machine learning models in the EU.
12:05-12:45 (40m) Data science and machine learning Media, Advertising, Entertainment, Security and Privacy
Fairness and diversity in online social systems
Elisa Celis (EPFL)
There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization.
14:05-14:45 (40m) Data science and machine learning, Streaming systems and real-time applications Telecom, Time Series and Graphs
StreamDM: Advanced data science with Spark Streaming
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noah’s Ark Lab and Télécom ParisTech.
14:55-15:35 (40m) Data science and machine learning E-commerce and Retail, Financial Services, Time Series and Graphs
Machine learning for time series: What works and what doesn't
Mikio Braun (Zalando)
Time series data has many applications in industry, in particular predicting the future based on historical data. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn't.
16:35-17:15 (40m) Data science and machine learning Time Series and Graphs
Correlation analysis on live data streams
Arun Kejariwal (Independent), Francois Orsini (MZ)
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
17:25-18:05 (40m) Data science and machine learning Security and Privacy, Time Series and Graphs
Code Property Graph: A modern, queryable data storage for source code
Fabian Yamaguchi (ShiftLeft)
Fabian Yamaguchi offers an overview of Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed.
11:15-11:55 (40m) Data science and machine learning
Distributed training of deep learning models
Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)
Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud, using a ResNet network trained on the ImageNet dataset as an example. You'll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for.
12:05-12:45 (40m) Data science and machine learning E-commerce and Retail, Media, Advertising, Entertainment
Deep learning for recommender systems
Nick Pentreath (IBM)
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice.
14:05-14:45 (40m) Data science and machine learning, Streaming systems and real-time applications Security and Privacy
Real-time deep learning on video streams
eran avidan (Intel)
Deep learning is revolutionizing many domains within computer vision, but doing real-time analysis is challenging. Eran Avidan offers an overview of a novel architecture based on Redis, Docker, and TensorFlow that enables real-time analysis of high-resolution streaming video.
14:55-15:35 (40m) Data science and machine learning, Emerging technologies and case studies
Deep computer vision for manufacturing
Aurélien Géron (Kiwisoft)
Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact on manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way.
16:35-17:15 (40m) Big data and data science in the cloud, Data science and machine learning Data Integration and Data Pipelines sessions, Media, Advertising, Entertainment
Using Siamese CNNs for removing duplicate entries from real estate listing databases
Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)
Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Sergey Ermolin and Olga Ermolin detail an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages a transfer learning Siamese architecture based on VGG-16 CNN topology.
17:25-18:05 (40m) Data science and machine learning, Emerging technologies and case studies Text and Language processing and analysis
Using LSTMs to aid professional translators
Darren Cook (QQ Trend)
Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English.
11:15-11:55 (40m) Data science and machine learning, Law, ethics, and governance Media, Advertising, Entertainment, Security and Privacy
Finding bias in social media recommendations
Guillaume Chaslot (AlgoTransparency)
An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines.
12:05-12:45 (40m) Data science and machine learning, Visualization and user experience Visualization, Design, and UX
Data visualization in a big data world
Jeff Fletcher (Cloudera)
As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models.
14:05-14:45 (40m) Data science and machine learning, Data-driven business management, Law, ethics, and governance
Designing ethical artificial intelligence
Jivan Virdee (Fjord), Hollie Lubbock (Fjord)
Artificial intelligence systems are powerful agents of change in our society, but as this technology becomes increasingly prevalent—transforming our understanding of ourselves and our society—issues around ethics and regulation will arise. Jivan Virdee and Hollie Lubbock explore how to address fairness, accountability, and the long-term effects on our society when designing with data.
14:55-15:35 (40m) Data science and machine learning, Data-driven business management, Visualization and user experience Visualization, Design, and UX
The business leader’s guide to designing indispensable analytics solutions and data products
Brian O'Neill (Designing for Analytics)
Gartner says 85%+ of big data projects will fail. Your own company may have even spent millions on a recent project that isn’t really delivering the value or UX everyone hoped for. Brian O'Neill explains why CDOs, PMs, and business leaders who leverage design to prioritize utility, usability, and customer value will realize the best ROIs and demonstrates how to start evaluating your UX.
16:35-17:15 (40m) Data science and machine learning, Visualization and user experience Visualization, Design, and UX
Architectural design for interactive visualization
Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)
Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity.
17:25-18:05 (40m) Data science and machine learning
Democratizing data within your organization
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data.
11:15-11:55 (40m) Strata Business Summit Financial Services
Leveraging public-private partnerships using data analytics for economic insights
Audrey Lobo-Pulo (Phoensight), Nicholas O'Donnell (LinkedIn)
In October 2017, LinkedIn and the Australian Treasury teamed up to gain a deeper understanding of the Australian labor market through new data insights, which may inform economic policy and directly benefit society. Audrey Lobo-Pulo and Nick O'Donnell share some of the discoveries from this collaboration as well as the practicalities of working in a public-private partnership.
12:05-12:45 (40m) Data-driven business management, Strata Business Summit, Streaming systems and real-time applications Telecom, Time Series and Graphs
The app trap: Why every mobile app and mobile operator needs anomaly detection
Ira Cohen (Anodot)
The mobile world has so many moving parts that a simple change to one element can cause havoc somewhere else, resulting in issues that annoy users and cause revenue leaks. Ira Cohen outlines ways to use anomaly detection to track everything mobile, from the service and roaming to specific apps, to fully optimize your mobile offerings.
14:05-14:45 (40m) Data science and machine learning Data Integration and Data Pipelines sessions
Solving data cleaning and unification using human-guided machine learning
Ihab Ilyas (University of Waterloo)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution.
14:55-15:35 (40m) Data-driven business management, Strata Business Summit
Successful data cultures: Inclusivity, empathy, retention, and results
Kim Nilsson (Pivigo), Phil Harvey (Microsoft)
Our lives are being transformed by data, changing our understanding of work, play, and health. Every organization can take advantage of this resource, but something is holding us back: us. Kim Nilsson and Phil Harvey explain how to build a successful data culture that embeds data at the heart of every organization through people and delivers success through empathy, communication, and humanity.
16:35-17:15 (40m) Data-driven business management, Emerging technologies and case studies, Law, ethics, and governance, Strata Business Summit
Data Collaboratives
Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)
Jude McCorry and Mahmood Adil offer an overview of Data Collaboratives, a new form of collaboration beyond the public-private partnership model, in which participants from different sectors  exchange data, skills, leadership, and knowledge to solve complex problems facing children in Scotland and worldwide.
17:25-18:05 (40m) Data-driven business management, Emerging technologies and case studies, Strata Business Summit
Blind men and elephants: What’s missing from your big data?
Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)
Big data analytics tends to focus on what is easily available, which is by and large data about what has already happened, the implicit assumption being that past behavior will predict future behavior. Organizations already possess data they aren’t exploiting. Barry Singleton and Richard Goyder explain how, with the right tools, it can be used to develop far more powerful predictive algorithms.
11:15-11:55 (40m) Executive Briefing, Law, ethics, and governance, Strata Business Summit Financial Services, Security and Privacy
Executive Briefing: GDPR—Getting your data ready for heavy, new EU privacy regulations
Mark Donsky (Okera), Syed Rafice (Cloudera)
In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Syed Rafice outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations.
12:05-12:45 (40m) Data-driven business management, Executive Briefing, Strata Business Summit
Executive Briefing: Becoming a data-driven enterprise—A maturity model
Teresa Tung (Accenture), Jean-Luc Chatelain (Accenture)
A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung and Jean-Luc Chatelain explain how to create a data-driven enterprise maturity model that spans technology and business requirements and walk you through use cases that bring the model to life.
14:05-14:45 (40m) Executive Briefing, Strata Business Summit
Executive Briefing: Lessons learned managing data science projects—Adopting a team data science process
Danielle Dean (iRobot)
Danielle Dean covers the basics of managing data science projects, including the data science lifecycle, and offers an overview of an internal approach at Microsoft called the Team Data Science Process (TDSP). Join in to learn more about the typical priorities of data science teams and the keys to success on engaging and creating value with data science.
14:55-15:35 (40m) Data-driven business management, Executive Briefing, Strata Business Summit, Streaming systems and real-time applications
Executive Briefing: What you need to know about fast data
Dean Wampler (Anyscale)
Streaming data systems, so called fast data, promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler outlines what you need to know to exploit fast data successfully.
16:35-17:15 (40m) Executive Briefing, Strata Business Summit
Executive Briefing: BI on big data
Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)
If your goal is to provide data to an analyst rather than a data scientist, what’s the best way to deliver analytics? There are 70+ BI tools in the market and a dozen or more SQL- or OLAP-on-Hadoop open source projects. Mark Madsen and Shant Hovsepian discuss the trade-offs between a number of architectures that provide self-service access to data.
17:25-18:05 (40m) Data-driven business management, Executive Briefing, Strata Business Summit Managing and Deploying Machine Learning
Executive Briefing: Why machine-learned models crash and burn in production and what to do about it
David Talby (Pacific AI)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries.
11:15-11:55 (40m) Data science and machine learning, Data-driven business management, Emerging technologies and case studies, Expo Hall Media, Advertising, Entertainment
Revolutionizing the newsroom with artificial intelligence
Dan Gilbert (News UK), Jonathan Leslie (Pivigo)
In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task.
12:05-12:45 (40m) Data science and machine learning, Expo Hall
Interpretable AI: Can we trust machine learning?
Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)
Konstantinos Georgatzis and Martha Imprialou explain how to interpret the predictions given by your black-box model and how machine learning is helping to drive decision making today.
14:05-14:45 (40m) Data engineering and architecture, Expo Hall Time Series and Graphs
Time for a new relation: Going from RDBMS to a graph database
Patrick McFadin (DataStax)
Graph databases are becoming mainstream. Patrick McFadin explains how to use the knowledge you have gained from your years of working with relational databases in this brave new world. There are many similarities but also some significant differences that can open up completely new use cases. If you're deciding whether to take the plunge into graph databases, this is the talk for you.
14:55-15:35 (40m) Data science and machine learning, Expo Hall, Streaming systems and real-time applications Managing and Deploying Machine Learning
Machine-learned model quality monitoring in fast data and streaming applications
Emre Velipasaoglu (Lightbend)
Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications.
16:35-17:15 (40m) Data engineering and architecture, Expo Hall
Data-driven ecosystems in the automotive industry
Tobias Burger (BMW Group), Joshua Goerner (BMW AG)
The BMW Group IT team drives the usage of data-driven technologies and forms the nucleus of a data-centric culture inside of the organization. Tobias Bürger and Joshua Görner discuss the E-to-E relationship of data and models and share best practices for scaling applications in real-world environments.
11:15-11:55 (40m) Sponsored
Putting AI to work for business: It's a journey. (sponsored by IBM)
CARLO APPUGLIESE (IBM)
What was once science fiction has now become reality as multiple AI consumer-based solutions have hit the market over last few years. In turn, consumers have become more comfortable interacting with AI. But has AI really lived up to the hype? For consumers, perhaps not yet. However, AI for business is a different (and more valuable) animal. Carlo Appugliese details how business can put AI to work.
14:05-14:45 (40m) Sponsored
A tale of two BI standards: Data warehouses and data lakes (sponsored by Arcadia Data)
Randy Lea (Arcadia Data)
Business intelligence (BI) and analytics on data lakes have had limited success. Data lakes often fall short because they are mostly used by data scientists and not by business users. Randy Lea explains why existing BI tools work well for data warehouses but not data lakes and why modern BI tools designed for data lakes should represent the second BI standard in enterprises today.
14:55-15:35 (40m) Sponsored
The IoT and AI for good (sponsored by Hitachi Vantara)
Wael Elrifai (Hitachi Vantara)
Wael Elrifai shares his experiences working in the IoT and AI spaces, covering complexities, pitfalls, and opportunities to explain why innovation isn’t just good for business—it's a societal imperative.
16:35-17:15 (40m) Data engineering and architecture
The eAGLE accelerator: How to speed up migrations from legacy ETL to big data implementations
Enric Biosca Trias (everis), Angel Valencia (everis)
Enric Biosca offers an overview of the eAGLE accelerator, which speeds up migration processes from legacy ETL to big data implementations by enabling auditing, lineage, and translation of legacy code for big data. Along the way, Enric demonstrates how graph and automatic translation technologies help companies reduce their migration times.
17:25-18:05 (40m) Data engineering and architecture
Batch and real-time processing in LINE's log analysis platform
Wataru Yukawa (LINE)
LINE—one of the most popular messaging applications in Asia—offers many services, such as its news application. These services sometimes depend on real-time processing. Wataru Yukawa offers an overview of LINE's web tracking system, which consists of the JavaScript SDK, NGINX Fluentd, Kafka, Elasticsearch, and Hadoop, and explains how it helps with batch and real-time processing.
11:15-11:55 (40m) Sponsored
Enabling data-driven development for autonomous driving at BMW (sponsored by BMW)
Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG)
The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry. Miha Pelko and Aleksandr Melkonyan outline these challenges and explain how BMW is overcoming them by adapting and reinventing existing big data solutions for autonomous driving.
12:05-12:45 (40m) Sponsored
Cloud-native data science with Anaconda, Docker, and Kubernetes (sponsored by Anaconda)
Mathew Lodge (Anaconda)
The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Mathew Lodge demonstrates that it's just as easy to deploy Python as it is Java, using containers and Kubernetes. Welcome to the future.
14:05-14:45 (40m) Sponsored
Operationalizing live data to benefit business (sponsored by WANdisco)
Steve Kilgore (WANdisco)
Today, every company is a data company. Business success depends on putting large volumes of live data to work to drive competitive advantage. Paul Phillips details how some of the world’s largest companies have achieved 100% uptime while moving massive live datasets and halving their hardware requirements.
14:55-15:35 (40m) Sponsored
Incorporating data sources inside and outside of the data center (sponsored by Cisco)
Chiang Yang (Cisco)
Han Yang explains how Cisco is leveraging big data and analytics and details how the company is helping customers to incorporate data sources from the internet of things and deploy machine learning at the edge and at the enterprise.
16:35-17:15 (40m) Sponsored
Fortune 100 lessons: Architecting data lakes for real-time analytics and AI (sponsored by Attunity)
Ted Orme (Attunity)
Modern analytics and AI initiatives require an adaptable data lake with a multistage architectural design to effectively ingest, stage, and provision specific datasets in real time. Ted Orme discusses his experience at Attunity creating a real-time data integration solution for Fortune 100 organizations and shares best practices and lessons learned along the way.
9:00-9:05 (5m)
Wednesday opening welcome
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes.
9:05-9:20 (15m)
Charting a data journey to the cloud
Mick Hollison (Cloudera), Sven Loeffler (Deutsche Telekom), Robert Neumann (Ultra Tendency)
What happens when you combine near-limitless data with on-demand access to powerful analytics and compute? For Deutsche Telekom, the results have been transformative. Mick Hollison, Sven Löffler, and Robert Neumann explain how Deutsche Telekom is harnessing machine learning and analytics in the cloud to build Europe’s largest and most powerful IoT data marketplace.
9:20-9:35 (15m)
Journey to GDPR compliance
Alison Howard (Microsoft)
May 25, the day the GDPR goes into effect, is an important milestone for data protection in the EU and elsewhere, but the journey to GDPR compliance neither begins nor ends there. Alison Howard explains how Microsoft, one of the world’s largest companies, with operations across the EU and around the globe, has prepared for May 25 and beyond.
9:35-9:45 (10m) Sponsored keynote
Humans and the machine: Machine learning in context (sponsored by IBM)
JEAN FRANCOIS PUGET (IBM Analytics)
On the way to active analytics for business, we have to answer two big questions: What must happen to data before running machine learning algorithms, and how should machine learning output be used to generate actual business value? Jean-François Puget demonstrates the vital role of human context in answering those questions.
9:45-9:55 (10m)
Building a stronger data ecosystem
Ben Lorica (O'Reilly)
To enable the machine learning applications of the future, there remain many interesting and challenging data problems we need to tackle as a community. Ben Lorica discusses some of the pressing problems we're facing as we collect and store data, particularly in an era when our machine learning models require huge amounts of labeled data.
9:55-10:10 (15m)
The Paradise Papers: Behind the scenes with the ICIJ
Pierre Romera (International Consortium of Investigative Journalists (ICIJ))
Last November, the International Consortium of Investigative Journalists (ICIJ) published the Paradise Papers, a yearlong investigation on the offshore dealings of multinational companies and the wealthy. Pierre Romera offers a behind-the-scenes look into the process and explores the challenges in handling 1.4 TB of data and making it available securely to journalists all over the world.
10:15-10:30 (15m)
Data protection and innovation
Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)
Keynote with Eva Kaili
8:15-8:45 (30m)
Speed Networking
Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees.
8:45-9:00 (15m)
Break: Coffee break sponsored by Confluent (7:30 - 9:00)
19:05-20:00 (55m)
Plenary
10:45-11:15 (30m)
Break: Morning break
12:45-14:05 (1h 20m)
Wednesday Topic Tables at lunch
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
12:45-14:05 (1h 20m)
Wednesday Business Summit Lunch
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers.
12:45-14:05 (1h 20m)
Women's Networking Lunch
If you’re looking to find like minds and make new professional connections, come to the Women's Networking Lunch on Wednesday.
15:35-16:35 (1h)
Break: Afternoon break sponsored by Airbus
18:05-19:05 (1h)
Expo Hall Reception
Unwind after a long day of sessions with small bites and drinks while networking with Strata attendees, exhibitors, and sponsors.
20:00-22:00 (2h)
Data After Dark: A Night in Shoreditch (sponsored by Domino and Cloudera)
Enjoy great food and drink at Data After Dark: A Night in Shoreditch. Be sure to take in the street art as you make your way between Zigfrid von Underbelly and Trapeze Bar.