Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Monday, 21/05/2018

9:00

Add to your personal schedule
9:00–17:00 Monday, 21/05/2018
Location: Capital Suite 1
behzad bordbar (Cloudera)
Behzad Bordbar demonstrates how to implement typical data science workflows using Apache Spark. You'll learn how to wrangle and explore data using Spark SQL DataFrames and how to build, evaluate, and tune machine learning models using Spark MLlib. Read more.
Add to your personal schedule
9:00–17:00 Monday, 21/05/2018
Location: Capital Suite 7
Zachary Glassman (The Data Incubator)
Zachary Glassman offers a foundation in building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline, from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets. Read more.
Add to your personal schedule
9:00–17:00 Monday, 21/05/2018
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
Add to your personal schedule
9:00–17:00 Monday, 21/05/2018
Location: Capital Suite 17
Dana Mastropole (The Data Incubator)
The TensorFlow library provides for the use of data flow graphs for numerical computations, with automatic parallelization across several CPUs or GPUs. This architecture makes it ideal for implementing neural networks and other machine learning algorithms. Dana Mastropole details TensorFlow's capabilities through its Python interface. Read more.
Add to your personal schedule
9:00–17:00 Monday, 21/05/2018
Location: London Suite 2
Angie Ma (ASI)
Angie Ma offers a condensed introduction to key data science and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization. Read more.

10:30

10:30–11:00 Monday, 21/05/2018
Location: Capital Suite Foyer
Coffee break (30m)

12:30

12:30–13:30 Monday, 21/05/2018
Location: Capital Suite Foyer
Lunch (1h)

15:00

15:00–15:30 Monday, 21/05/2018
Location: Capital Suite Foyer
Afternoon break (30m)

Tuesday, 22/05/2018

9:00

Add to your personal schedule
9:00–17:00 Tuesday, 22/05/2018
Location: Capital Suite 2/3
Dan Jeavons (Shell), Hollie Lubbock (Fjord), Jivan Virdee (Fjord), Fausto Morales (Arundo), Marty Cochrane (Arundo), Jane McConnell (Teradata), Paul Ibberson (Teradata), Kevin Parent (Conduce), Javier Esplugas (DHL Supply Chain), Viola Melis (Typeform), Dave Fitch (The Data Lab), Federica Mutti (Data Reply ), Maria Assunta Palmieri (Data Reply ), Niranjan Thomas (Dow Jones), Erik Elgersma (FrieslandCampina)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 22/05/2018
Location: Capital Suite 4
Paul Lashmet (Arcadia Data), Olaf Hein (ORDIX AG), Konrad Sippel (Deutsche Börse), Paul Damien Lynn (Nordea), Mikheil Nadareishvili (TBC Bank), Anthony Culligan (SETL), Robert Passarella (Alpha Features), Louise Beaumont (Publicis Groupe | techUK | NPSO), Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 8 Level: Non-technical
Secondary topics:  Visualization, Design, and UX
Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product)
These days it’s easy for companies to say, "We measure everything!” The problem is, most popular metrics may not be appropriate or relevant for your business. Measurement isn’t free and should be done strategically. Radhika Dutt, Geordie Kaytes, and Nidhi Aggarwal explain how to align measurement with your product strategy so you can measure what matters for your business. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018
Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 9 Level: Non-technical
Secondary topics:  Security and Privacy
Aurélie Pols (Mind Your Privacy)
Aurélie Pols walks you through a "5+5 pillars" framework for GDPR readiness, explaining what the GDPR means to data-fueled businesses. You'll learn how to attribute responsibility to assure compliance and build toward ethical data practices, minimizing risk for your company while fostering trust with your clients. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018 Secondary topics:  Text and Language processing and analysis
Barbara Fusinska (Google)
Natural language processing techniques help address tasks like text classification, information extraction, and content generation. Barbara Fusinska offers an overview of natural language processing and walks you through building a bag-of-words representation, using Python and its machine learning libraries, and then using it for text classification. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 22/05/2018
Big data and data science in the cloud
Location: Capital Suite 11 Level: Intermediate
Concepcion Diaz walks you through building a complete machine learning pipeline from ingest, exploration, training, and evaluation to deployment and prediction. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018
Arun Kejariwal (MZ), Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio)
The need for instant data-driven insights has led the proliferation of messaging and streaming frameworks. Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming frameworks, algorithms, and architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018
Data engineering and architecture
Location: Capital Suite 13 Level: Intermediate
Mala Ramakrishnan (Cloudera), Eugene Fratkin (Cloudera), Mark Samson (Cloudera), Vinithra Varadharajan (Cloudera), Jason Wang (Cloudera)
The cloud enables the delivery of solutions to single multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand computing. Mala Ramakrishnan, Eugene Fratkin, and Mark Samson detail new paradigms to effectively run production-level pipelines with minimal operational overhead. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018
Data engineering and architecture
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Data Platforms
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 22/05/2018
Data science and machine learning
Location: Capital Suite 15 Level: Intermediate
Vartika Singh (Cloudera), Juan Yu (Cloudera)
Vartika Singh and Juan Yu outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.

10:30

10:30–11:00 Tuesday, 22/05/2018
Location: Capital Suite Foyer
Morning break (30m)

12:30

12:30–13:30 Tuesday, 22/05/2018
Location: N11
Lunch (1h)

13:30

Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Strata Business Summit
Location: Capital Suite 8 Level: Intermediate
Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending: accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Nick Elprin outlines a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Visualization and user experience
Location: Capital Suite 9 Level: Non-technical
Secondary topics:  Visualization, Design, and UX
Danyel Fisher (Microsoft Research), Miriah Meyer (University of Utah)
Danyel Fisher and Miriah Meyer explore the human side of data analysis and visualization, covering operationalization, the process of reducing vague problems to specific tasks, and how to choose a visual representation that addresses those tasks. Along the way, they also discuss single views and explain how to link them into multiple views. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Data science and machine learning
Location: Capital Suite 10 Level: Beginner
Secondary topics:  E-commerce and Retail, Media, Advertising, Entertainment
Neejole Patel (Virginia Tech)
Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Neejole Patel walks you through using PyTorch to build a content recommendation model. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services)
Natural language processing is a key component in many data science systems. David Talby and Claudiu Branzan lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Data engineering and architecture
Location: Capital Suite 13 Level: Advanced
Secondary topics:  Data Platforms
Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Along the way, Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to chose them instead. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 22/05/2018
Law, ethics, and governance, Platform security and cybersecurity
Location: Capital Suite 15 Level: Intermediate
Secondary topics:  Security and Privacy
Mark Donsky (Cloudera), Steffen Maerkl (Cloudera)
Hybrid big data deployments present significant new security risks. Security admins must ensure a consistently secured and governed experience for end users and administrators across multiple workloads that span on-premises, private cloud, multicloud, and hybrid cloud deployments. Mark Donsky shares best practices for meeting these challenges as he walks you through securing a Hadoop cluster. Read more.

15:00

15:00–15:30 Tuesday, 22/05/2018
Location: Capital Suite Foyer
Afternoon break (30m)

17:00

Add to your personal schedule
17:00–18:00 Tuesday, 22/05/2018
Location: Expo Hall (Capital Hall 24)
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors. Read more.

19:00

Add to your personal schedule
19:00–21:00 Tuesday, 22/05/2018
Location: Various locations
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine London has to offer. Read more.

Wednesday, 23/05/2018

8:15

Add to your personal schedule
8:15–8:45 Wednesday, 23/05/2018
Location: TBD
Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

8:45

8:45–9:00 Wednesday, 23/05/2018
Location: Auditorium Foyer
Coffee break (8:30 - 9:00) (15m)

9:00

Add to your personal schedule
9:00–9:05 Wednesday, 23/05/2018
Location: Auditorium
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:25

Add to your personal schedule
9:25–9:40 Wednesday, 23/05/2018
Location: Auditorium
Alison Howard (Microsoft)
Alison Howard, Attorney, Microsoft Read more.

10:00

Add to your personal schedule
10:00–10:15 Wednesday, 23/05/2018
Location: Auditorium
Pierre Romera (International Consortium of Investigative Journalists (ICIJ))
Last November, the International Consortium of Investigative Journalists (ICIJ) published the Paradise Papers, a yearlong investigation on the offshore dealings of multinational companies and the wealthy. Pierre Romera offers a behind-the-scenes look into the process and explores the challenges in handling 1.4 TB of data and making it available securely to journalists all over the world. Read more.

10:20

Add to your personal schedule
10:20–10:35 Wednesday, 23/05/2018
Location: Auditorium
Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)
Keynote with Eva Kaili Read more.

10:45

10:45–11:15 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Morning break (30m)

11:15

Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data engineering and architecture
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Security and Privacy
Charaka Goonatilake (Panaseer)
Data is becoming a crucial weapon to secure an organization against cyber threats. Charaka Goonatilake shares strategies for designing effective data platforms for cybersecurity using big data technologies, such as Spark and Hadoop, and explains how these platforms are being used in real-world examples of data-driven security. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate
Secondary topics:  Financial Services, Security and Privacy
Mark Donsky (Cloudera), Steven Ross (Cloudera)
In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky outlines the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning, Law, ethics, and governance
Location: Capital Suite 12 Level: Non-technical
Secondary topics:  Security and Privacy
Andrew Burt (Immuta)
The Strata Data conference in London takes place during one of the most important weeks in the history of data regulation, as GDPR begins to be enforced. Andrew Burt explores the effects of the GDPR on deploying machine learning models in the EU. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Advanced
Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)
Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud, using a ResNet network trained on the ImageNet dataset as an example. You'll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Visualization, Design, and UX
Jeff Fletcher (Cloudera)
As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Stuart Pook (Criteo)
Criteo has a production cluster of 2K nodes running over 300K jobs a day in the company's own data centers. These clusters were meant to provide a redundant solution to Criteo's storage and compute needs. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo's progress in building another cluster to survive the loss of a full DC. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Platforms, Media, Advertising, Entertainment
Jason Heo (Naver), Dooyong Kim (Navercorp)
naver.com is the largest search engine in Korea, which shares 70% of the Korean search market. Speaker's team handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid. They outline the architecture, share techniques for speedup, explain how they implement Spark Druid Connector, and how to use it, and detail how... Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate
Gerard Maas (Lightbend)
Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences in key aspects of a streaming application, from the API user experience to dealing with time and with state and machine learning capabilities, and shares practical guidance on picking one or combining both to implement resilient streaming pipelines. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions
Ihab Ilyas (University of Waterloo | Tamr)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018
Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Secondary topics:  Financial Services
Audrey Lobo-Pulo (The Australian Treasury), Nick O'Donnell (LinkedIn)
In October 2017, LinkedIn and the Australian Treasury teamed up to gain a deeper understanding of the Australian labor market through new data insights, which may inform economic policy and directly benefit society. Audrey Lobo-Pulo and Nick O'Donnell share some of the discoveries from this collaboration as well as the practicalities of working in a public-private partnership. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 23/05/2018 Secondary topics:  Media, Advertising, Entertainment
Daniel Gilbert (News UK), Jonathan Leslie (Pivigo)
In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to help in this task. Read more.

12:05

Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Federico Leven (ReactoData)
The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, particularly in Cloudera Stack. Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Non-technical
Teresa Tung (Accenture Labs), Jean-Luc Chatelain (Accenture)
A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung and Jean-Luc Chatelain explain how to create a data-driven enterprise maturity model that spans technology and business requirements and walk you through use cases that bring the model to life. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12
Secondary topics:  Media, Advertising, Entertainment, Security and Privacy
Elisa Celis (EPFL)
There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  E-commerce and Retail, Media, Advertising, Entertainment
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Media, Advertising, Entertainment, Security and Privacy
Guillaume Chaslot (AlgoTransparency)
An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Jim Scott (MapR Technologies)
Creating a business solution is a lot of work. Instead of building to run on a single cloud provider, it is far more cost effective to leverage the cloud as infrastructure as a service (IaaS). Jim Scott explains why a global data fabric is a requirement for running on all cloud providers simultaneously. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Data Platforms, E-commerce and Retail, Transportation and Logistics
mao baolong (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)
Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Data engineering and architecture
Location: Capital Suite 8/9 Level: Beginner
Secondary topics:  Telecom
In the past year, British Telecom has added a streaming network analytics use case to its multitenant data platform. Phillip Radley demonstrates how the solution works and explains how it delivers better broadband and TV services, using Kafka and Spark on YARN and HDFS encryption. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Financial Services
Baiju Devani (Aviva Canada), Étienne Chassé St-Laurent (Aviva Canada)
Risk-sharing pools allow insurers to get rid of risks they are forced to insure in highly regulated markets. Insurers thus cede both the risk and its premium. But are they ceding the right risk or simply giving up premium? Baiju Devani and Étienne Chassé St-Laurent share an applied machine learning approach that leverages an ensemble of models to gain a distinctive market advantage. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Telecom, Time Series and Graphs
Ira Cohen (Anodot)
The mobile world has so many moving parts that a simple change to one element can cause havoc somewhere else, resulting in issues that annoy users and cause revenue leaks. Ira Cohen outlines ways to use anomaly detection to track everything mobile, from the service and roaming to specific apps, to fully optimize your mobile offerings. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Expo Hall Level: Intermediate
Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)
Konstantinos Georgatzis and Martha Imprialou explain how to interpret the predictions given by your black-box model and how machine learning is helping to drive decision making today. Read more.

12:45

Add to your personal schedule
12:45–14:05 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
Add to your personal schedule
12:45–14:05 Wednesday, 23/05/2018
Location: Expo Hall - SBS lunch (Capital Hall 24)
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.

14:05

Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Joshua Patterson (NVIDIA), Mike Wendt (NVIDIA)
Joshua Patterson and Mike Wendt explain how NVIDIA used GPU-accelerated open source technologies to improve its cyberdefense platforms by leveraging software from the GPU Open Analytics Initiative (GOAI) and how the company accelerated anomaly detection with more efficient machine learning models, faster deployment, and more granular data exploration. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Beginner
Danielle Dean (Microsoft)
Danielle Dean covers the basics of managing data science projects, including the data science lifecycle, and offers an overview of an internal approach at Microsoft called the Team Data Science Process (TDSP). Join in to learn more about the typical priorities of data science teams and the keys to success on engaging and creating value with data science. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Paul Phillips (WANdisco)
Today, every company is a data company. Business success depends on putting large volumes of live data to work to drive competitive advantage. Paul Phillips details how some of the world’s largest companies have achieved 100% uptime while moving massive live data sets and halving their hardware requirements. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Telecom, Time Series and Graphs
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Huawei)
Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei Noah’s Ark Lab and Telecom ParisTech. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Eran Avidan (Intel)
Deep learning is revolutionizing many domains within computer vision, but doing real-time analysis is challenging. Eran Avidan offers an overview of a novel architecture based on Redis, Docker, and TensorFlow that enables real-time analysis of high-resolution streaming video. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Jivan Virdee (Fjord), Hollie Lubbock (Fjord)
Artificial intelligence systems are powerful agents of change in our society, but as this technology becomes increasingly prevalent—transforming our understanding of ourselves and our society—issues around ethics and regulation will arise. Jivan Virdee and Hollie Lubbock explore how to address fairness, accountability, and the long-term effects on our society when designing with data. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Greg Rahn (Cloudera)
For many organizations, the next big data warehouse will be in the cloud. Greg Rahn shares considerations for evaluating the cloud for analytics and big data warehousing, including different architectural approaches to optimize price and performance. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Platforms, Transportation and Logistics
Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)
Carsten Herbe and Matthias Graunitz detail Audi's journey from a Hadoop proof of concept to a multitenant enterprise platform, sharing lessons learned, the decisions Audi made, and how a number of use cases are implemented using the platform. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Michael Noll (Confluent)
Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Manas Ranjan Kar (Episource)
Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Beginner
Secondary topics:  Transportation and Logistics
Because in-house data science teams work with a range of business functions, traditional data science processes are often too abstract to cope with the complexity of these environments. Alberto Rey Villaverde and Grigorios Mingas share case studies from easyJet that highlight some unpredictable hurdles related to requirements, data, infrastructure, and deployment and explain how they solved them. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Managing and Deploying Machine Learning
Emre Velipasaoglu (Lightbend)
Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications. Read more.

14:55

Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Lee Blum (Verint Systems)
Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Dean Wampler (Lightbend)
Streaming data systems, so called fast data, promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler outlines what you need to know to exploit fast data successfully. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Han Yang (Cisco Systems)
Han Yang explains how Cisco is leveraging big data and analytics and details how the company is helping customers to incorporate data sources from the internet of things and deploy machine learning at the edge and at the enterprise. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  E-commerce and Retail, Financial Services, Time Series and Graphs
Mikio Braun (Zalando SE)
Time series data has many applications in industry, in particular predicting the future based on historical data. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn't. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Aurélien Géron (Kiwisoft)
Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact in manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018 Secondary topics:  Visualization, Design, and UX
Brian O'Neill (Designing for Analytics)
Gartner says 85%+ of big data projects will fail. Your own company may have even spent millions on a recent project that isn’t really delivering the value or UX everyone hoped for. Brian O'Neill explains why CDOs, PMs, and business leaders who leverage design to prioritize utility, usability, and customer value will realize the best ROIs and demonstrates how to start evaluating your UX. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Tomer Shiran (Dremio)
It's often impractical for organizations to physically consolidate all data into one system. Tomer Shiran offers an overview of Apache Arrow, an open source columnar, in-memory data representation that enables analytical systems and data sources to exchange and process data in real time, simplifying and accelerating data access without having to copy all data into one location. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Data engineering and architecture
Location: S11B Level: Beginner
Secondary topics:  Transportation and Logistics
Dr.-Ing. Michael Nolting (Volkswagen Commercial Vehicles)
Map matching applications exist in almost every telematics use case and are therefore crucial to all car manufacturers. Michael Nolting details the architecture behind Volkswagen Commercial Vehicle’s Altus-based map matching application and leads a live demo featuring a map matching job in Altus. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Ivan Kelly (Streamlio)
Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Location: Capital Suite 10/11
Harvinder Atwal (Moneysupermarket)
This talk will demonstrate experience-based solutions for increasing your velocity of value creation including: agile prioritisation and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction; self-service tools giving data scientists freedom from bottlenecks, and more. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Kim Nilsson (Pivigo), Phil Harvey (Microsoft)
Our lives are being transformed by data, changing our understanding of work, play, and health. Every organization can take advantage of this resource, but something is holding us back: us. Kim Nilsson and Phil Harvey explain how to build a successful data culture that embeds data at the heart of every organization through people and delivers success through empathy, communication, and humanity. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 23/05/2018
Data engineering and architecture
Location: Expo Hall Level: Intermediate
Tobias Bürger (BMW Group)
The BMW Group IT team drives the usage of data-driven technologies and forms the nucleus of a data-centric culture inside of the organization. Tobias Bürger discusses the E-to-E relationship of data and models and shares best practices for scaling applications in real-world environments. Read more.

15:35

15:35–16:35 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Afternoon break (1h)

16:35

Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Data engineering and architecture, Platform security and cybersecurity
Location: Capital Suite 7 Level: Non-technical
Secondary topics:  Security and Privacy
Thomas Phelan (BlueData)
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE), but TDE can be difficult to configure and manage—issues that are only compounded when running on Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Beginner
Mark Madsen (Think Big Analytics), Shant Hovsepian (Arcadia Data)
If your goal is to provide data to an analyst rather than a data scientist, what’s the best way to deliver analytics? There are 70+ BI tools in the market and a dozen or more SQL- or OLAP-on-Hadoop open source projects. Mark Madsen and Shant Hovsepian discuss the trade-offs between a number of architectures that provide self-service access to data. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Time Series and Graphs
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions, Media, Advertising, Entertainment
Olga Ermolin (MLS Listings)
Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Olga Ermolin details an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages transfer learning Siamese architecture based on VGG-16 CNN topology. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Beginner
Secondary topics:  Visualization, Design, and UX
Bargava Subramanian (Impel Labs), Amit Kapoor (narrativeVIZ Consulting)
Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Paul Curtis (MapR Technologies)
The flexibility advantage conferred by containers depends on their ephemeral nature, so it’s useful to keep containers stateless. However, many applications require state—access to a scalable persistence layer that supports real mutable files, tables, and streams. Paul Curtis demonstrates how to make containerized applications reliable, available, and performant, even with stateful applications. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018 Secondary topics:  Text and Language processing and analysis
Ran Taig (Dell), Omer Sagi (Dell)
DevOps and QA engineers devote significant amount of time to investigate reoccurring issues. These issues are often represented by large configuration and log files, so the process of investigating whether two issues are duplicates can be a very tedious task. Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Sean Glover (Lightbend)
Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container orchestrated clusters. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Non-technical
Secondary topics:  Text and Language processing and analysis
Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)
Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 23/05/2018
Jude Mccorry (The Data Lab)
Jude Mccorry offers an overview of Data Collaboratives, a new form of collaboration beyond the public-private partnership model, in which participants from different sectors  exchange data, skills, leadership, and knowledge to solve complex problems facing children in Scotland and worldwide. Read more.

17:25

Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)
Having so many cloud-based analytics services available is a dream come true. However, it's a nightmare to manage proper security and governance across all those different services. Nikki Rouda and Nick Curcuru share advice on how to minimize the risk and effort in protecting and managing data for multidisciplinary analytics and explain how to avoid the hassle and extra cost of siloed approaches. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate
Secondary topics:  Managing and Deploying Machine Learning
David Talby (Pacific AI)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Security and Privacy, Time Series and Graphs
Fabian Yamaguchi (ShiftLeft)
Fabian Yamaguchi offers an overview of Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
Darren Cook (QQ Trend)
Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 14 Level: Intermediate
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover explains how to make your team and your larger organization more productive when it comes to consuming data. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Christopher Royles (Cloudera)
Big data and cloud deployments return huge benefits in flexibility and economics but can also result in runaway costs and failed projects. Drawing on his production experience, Christopher Royles shares tips and best practices for determining initial sizing, strategic planning, and longer term operation, helping you deliver an efficient platform, reduce costs, and implement a successful project. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Holden Karau (Google), Rachel Warren (Salesforce Einstein), Anya Bida (Salesforce)
Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Aljoscha Krettek (data Artisans)
Aljoscha Krettek offers an overview of modern stream processing space, details the challenges posed by stateful and event-time-aware stream processing, and shares core archetypes ("application blueprints”) for stream processing drawn from real-world use cases with Apache Flink. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Jorie Koster-Hale (Dataiku)
Because it's affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 23/05/2018
Stuart Sherman (IMC Business Architecture), Richard Goyder (IMC Business Architecture | Scaled Insights)
Big data analytics tends to focus on what is easily available, which is by and large data about what has already happened, the implicit assumption being that past behavior will predict future behavior. Organizations already possess data they aren’t exploiting. Stuart Sherman and Richard Goyder explain how, with the right tools, it can be used to develop far more powerful predictive algorithms. Read more.

18:05

Add to your personal schedule
18:05–19:05 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Unwind after a long day of sessions with small bites and drinks while networking with Strata attendees, exhibitors, and sponsors. Read more.

19:05

19:05–20:00 Wednesday, 23/05/2018
Location: TBD
TBC

20:00

Add to your personal schedule
20:00–22:00 Wednesday, 23/05/2018
Location: TBD (DAD)
Enjoy great food and drink at Data After Dark: Pub Crawl while admiring street art and making your way between Zigfrid Von Underbelly and Trapeze bars. Read more.

Thursday, 24/05/2018

8:15

Add to your personal schedule
8:15–8:45 Thursday, 24/05/2018
Location: TBD
Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

8:45

8:45–9:00 Thursday, 24/05/2018
Location: Auditorium Foyer
Coffee break (8:30 - 9:00) (15m)

9:00

Add to your personal schedule
9:00–9:05 Thursday, 24/05/2018
Location: Auditorium
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

9:05

Add to your personal schedule
9:05–9:20 Thursday, 24/05/2018
Location: Auditorium
Christine Hung (Spotify)
Keynote with Christine Hung Read more.

9:25

Add to your personal schedule
9:25–9:35 Thursday, 24/05/2018
Location: Auditorium
Mikio Braun (Zalando SE)
Keynote with Mikio Braun Read more.

9:50

Add to your personal schedule
9:50–10:00 Thursday, 24/05/2018
Location: Auditorium
Zubin Siganporia (QED Analytics)
Some industries have been exploring the power of data analytics and machine learning for several years. Others, such as healthcare and law, have made relatively little use of such techniques. Using these two industries as case studies, Zubin Siganporia will share recent examples of how machine learning has provided large and immediate improvements over traditional approaches. Read more.

10:10

Add to your personal schedule
10:10–10:20 Thursday, 24/05/2018
Location: Auditorium
Christine Foster (The Alan Turing Institute)
Keynote with Christine Foster Read more.

10:25

Add to your personal schedule
10:25–10:40 Thursday, 24/05/2018
Location: Auditorium
Martha Lane Fox, founder and chair, doteveryone.org.uk Read more.

10:45

10:45–11:15 Thursday, 24/05/2018
Location: Expo Hall (Capital Hall 24)
Morning break (30m)

11:15

Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018 Secondary topics:  Data Platforms, Managing and Deploying Machine Learning, Media, Advertising, Entertainment
Kinnary Jangla (Pinterest)
Having trouble coordinating development of your production ML system between a team of developers? Microservices drifting and causing problems debugging? Kinnary Jangla explains how Pinterest Dockerized the services powering its Home Feed and how it impacted the engineering productivity of its ML teams while increasing uptime and ease of deployment. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17
Mike Olson (Cloudera)
Mike Olson shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Beginner
Jeroen Janssens (Data Science Workshops)
"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Beginner
Secondary topics:  Managing and Deploying Machine Learning
Ramesh Sridharan (Captricity)
Most uses of deep learning involve models trained with large datasets. Ramesh Sridharan explains how Captricity uses deep learning with tiny datasets at scale, training thousands of models using tens to hundreds of examples each. These models are dynamically trained using a automatic deployment framework, and carefully chosen metrics further exploit error properties of the resulting models. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Managing and Deploying Machine Learning
Ted Dunning (MapR Technologies)
Ted Dunning offers an overview of the rendezvous architecture, geared to deal with much of the complexity involved in deploying models to production, thus allowing more time to be spent thinking and doing real data science. Ted covers the ideas behind the architecture, practical scenarios, and advantages and disadvantages of the architecture. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018 Secondary topics:  Data Platforms, E-commerce and Retail
Neelesh Srinivas Salian offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions, Data Platforms, Media, Advertising, Entertainment
Irene Gonzálvez (Spotify)
Irene Gonzálvez shares Spotify's process for ensuring data quality, covering why and how the company became aware of its importance, the products it has developed, and future strategy. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018 Secondary topics:  Visualization, Design, and UX
Erin Recachinas (Zoomdata)
The value of real-time streaming analytics with historical data is immense. Big data application Zoomdata updates historical dashboards in real time without complex reaggregations, but streaming in the age of the IoT requires handling of data in volumes not seen in traditional feeds. Erin Recachinas explains how Zoomdata moved to a scalable microservice architecture for streaming sources. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Non-technical
Secondary topics:  Text and Language processing and analysis
Radim Řehůřek (RARE Technologies Ltd.)
Radim Řehůřek shares lessons learned and tips for successful R&D in applied data science. You'll learn the primary gaps between the academic and industry skill sets, what businesses should look out for when applying cutting-edge research in practice, what researchers can do to increase the impact of their research, and what companies can do to promote, reward, and nurture good quality ML research. Read more.
11:15–11:55 Thursday, 24/05/2018
Location: Capital Suite 15/16
TBC
Add to your personal schedule
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Expo Hall Level: Beginner
Secondary topics:  Time Series and Graphs
Jared Lander (Lander Analytics)
Temporal data is being produced in ever greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.

12:05

Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Managing and Deploying Machine Learning
Nanda Vijaydev (BlueData), Thomas Phelan (BlueData)
In the past, you needed a high-end proprietary stack for advanced machine learning, but today, you can use open source machine learning and deep learning algorithms available with distributed computing technologies like Apache Spark and GPUs. Nanda Vijaydev and Thomas Phelan demonstrate how to deploy a TensorFlow and Spark with NVIDIA CUDA stack on Docker containers in a multitenant environment. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17
Nicolaus Henke (McKinsey & Company)
After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we’re still early in the cycle of adoption. Nicholas Henke explains where investment is going, patterns of AI adoption and value capture by enterprises, and how the value potential of AI across sectors and business functions is beginning to emerge. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Financial Services
Calum Murray (Intuit)
Machine learning-based applications are becoming the new norm. Calum Murray shares five use cases at Intuit that use the data of over 60 million users to create delightful experiences for customers by solving repetitive tasks, freeing them up to spend time more productively or solving very complex tasks with simplicity and elegance. Read more.
12:05–12:45 Thursday, 24/05/2018
Location: Capital Suite 13
TBC
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018
Jacques Nadeau (Dremio)
Jacques Nadeau offers an overview of a new Apache-licensed lightweight distributed in-memory cache that allows multiple applications to consume Arrow directly using the Arrow RPC and IPC protocols. You'll explore the system design and deployment architecture, learn how data science, analytical, and custom applications can all leverage the cache simultaneously, and see a live demo. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Transportation and Logistics
Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment)
Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018
Big data and data science in the cloud, Data engineering and architecture
Location: Capital Suite 8/9 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions
Adesh Rao (Qubole), Abhishek Somani (Qubole)
Adesh Rao and Abhishek Somani share a framework for materialized views in SQL-On-Hadoop engines that automatically suggests, creates, uses, invalidates, and refreshes views created on top of data for optimal performance and strict correctness. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Financial Services
Mike Lee Williams (Cloudera Fast Forward Labs)
Interpretable models result in more accurate, safer and more profitable machine learning products, but interpretability can be hard to ensure. Michael Lee Williams examines the growing business case for interpretability, explores concrete applications including churn, finance and healthcare, and demonstrates the use of LIME, an open source, model-agnostic tool you can apply to your models today. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate
Martin Goodson (Evolution AI)
How can AI become part of our business processes? Should we entrust critical decisions to completely autonomous systems? Drawing on projects from businesses and UK government agencies, Martin Goodson explains how to increase confidence in AI systems and manage the transition to an AI-driven organization. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Time Series and Graphs
Erik Nordström (Timescale)
Erik Nordström explains how and why to use PostgreSQL as a Prometheus backend to support complex questions (and get a proper SQL interface), offers an overview of pg_prometheus, a custom Prometheus datatype, and prometheus-postgresql-adapter, a remote storage adaptor for PostgreSQL, and shares his experience with TimescaleDB, which enables PostgreSQL to scale for classic monitoring volumes. Read more.

12:45

Add to your personal schedule
12:45–14:05 Thursday, 24/05/2018
Location: Expo Hall (Capital Hall 24)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
Add to your personal schedule
12:45–14:05 Thursday, 24/05/2018
Location: Expo Hall - SBS lunch (Capital Hall 24)
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.

14:05

Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Data engineering and architecture
Location: Capital Suite 7 Level: Beginner
Secondary topics:  Managing and Deploying Machine Learning
Guillaume Salou shares OVH's approach to continuous deployment of machine learning models, which involved building a full stack of automated machine learning. Automated machine learning allows the company to rebuild models efficiently and keep models up to date with fresh data brought by its data convergence tool. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Beginner
Secondary topics:  Security and Privacy, Telecom
Alasdair Allan (Babilim Light Industries)
The increasing ubiquity of the internet of things has put a new focus on data privacy. Big data is all very well when it's harvested quietly and stealthily, but when your things tattle on you behind your back, it's a very different matter altogether. Alasdair Allan explains why the internet of things brings with it a whole new set of big data problems that can't be ignored. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Data science and machine learning, Data-driven business management
Location: Capital Suite 12 Level: Beginner
Kaylea Haynes (Peak )
Deciding how much stock to hold is a challenge for hire businesses. There is a fine balance between holding enough stock to fulfill hires and not holding too much stock so that overall utilization is too low to achieve the return on investment. Kaylea Haynes shares a case study on forecasting the demand for thousands of assets across multiple locations. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018 Secondary topics:  Data Platforms, Managing and Deploying Machine Learning
Moty Fania (Intel)
Moty Fania explains how Intel implemented an AI inference platform to enable internal visual inspection use cases and shares lessons learned along the way. The platform is based on open source technologies and was designed for real-time streaming and online actuation. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Thomas Dinsmore (DataRobot)
Data science transforms organizations, but executives often struggle to build a culture of open data science and transition from legacy commercial analytic tools. However, there are clear best practices to accelerate adoption and success with open data science. Thomas Dinsmore shares a model to help organizations begin the journey, build momentum, and reduce reliance on legacy software. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018 Secondary topics:  Data Platforms, Time Series and Graphs
Tony Xing (Microsoft), Bixiong Xu (Microsoft)
Tony Xing and Bixiong Xu offer an overview of Project Kensho, Microsoft's one stop shop for business incident monitoring and automated insights. Tony and Bixiong cover the technology's evolution, the architecture, the algorithms, and the benefits and the trade-offs. Along the way, they share a case study on Bing ads key metrics monitoring and automated diagnostic insights. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Kostas Kloudas (data Artisans)
Complex event processing (CEP) helps detect patterns over continuous streams of data. DNA sequencing, fraud detection, shipment tracking with specific characteristics (e.g., contaminated goods), and user activity analysis fall into this category. Kostas Kloudas offers an overview of Flink's CEP library and explains the benefits of its integration with Flink. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Beginner
Paco Nathan (O'Reilly Media)
Human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. Such systems are mostly automated, with exceptions referred to human experts, who help train the machines further. Paco Nathan offers an overview of HITL from the perspective of a business manager, focusing on use cases within O'Reilly Media. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Beginner
Michael Li (The Data Incubator), Sol Rashidi (Royal Caribbean Cruise Lines), Philipp Diesinger (Boehringer Ingelheim), Julie Shin (Citigroup)
What are the latest initiatives and use cases around data and AI? How are data and AI reshaping industries? How do we foster a culture of data and innovation within a larger enterprise? What are some of the challenges of implementing AI within the enterprise setting? Michael Li moderates a panel of four experts in different industries to answer these questions and more. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 24/05/2018
Data science and machine learning
Location: Expo Hall Level: Intermediate
Secondary topics:  Financial Services, Text and Language processing and analysis
David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)
Spark NLP is an open source library that natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was not possible to date. David Talby explains how Spark NLP was used to augment the Recognos smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. Read more.

14:55

Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Data engineering and architecture, Data-driven business management
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Financial Services, Managing and Deploying Machine Learning
Hope Wang (Intuit)
A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. Hope Wang explains how to manage various artifacts and their associations, automate deployment to support the lifecycle of a model, and build a cohesive machine learning platform. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Non-technical
Secondary topics:  Security and Privacy
Kate Vang (DataKind UK), Christine Henry (DataKind UK)
Not a day goes by without reading headlines about the fear of AI or how technology seems to be dividing us more than bringing us together. DataKind UK is passionate about using machine learning and artificial intelligence for social good. Kate Vang and Christine Henry explain what socially conscious AI looks like and what DataKind is doing to make it a reality. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)
Cox Automotive is the world’s largest automotive service organization, which means it can combine data from across the entire vehicle lifecycle. Cox is on a journey to turn this data into insights. David Asboth and Shaun McGirr share their experience building up a data science team at Cox and scaling the company's data science process from laptop to Hadoop cluster. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Financial Services, Time Series and Graphs
Francesca Lazzeri (Microsoft), Jaya Mathew (Microsoft)
Advancements in computing technologies and ecommerce platforms have amplified the risk of online fraud, which results in billions of dollars of loss for the financial industry. This trend has urged companies to consider AI techniques, including deep learning, for fraud detection. Francesca Lazzeri and Jaya Mathew explain how to operationalize deep learning models with Azure ML to prevent fraud. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Data engineering and architecture
Location: S11A Level: Intermediate
Secondary topics:  Time Series and Graphs
Jim Webber (Neo4j)
Jim Webber details how Neo4j mixes the strongly consistent Raft protocol with async log shipping and provides a strong consistency guarantee: causal, which means you can always at least read your writes even in very large multidata center clusters. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018 Secondary topics:  Data Platforms
Alvin HEIB (Cloudera), Guy Leroux (Atos)
Alvin Heib and Guy Leroux offer an overview of ClickFox, a platform able to cope with high-performance analytical needs, from bits and bytes to solving a customer needs, covering the platform's virtualization, big data, and analytical layers. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018 Secondary topics:  Data Integration and Data Pipelines sessions
Eugene Kirpichov (Google)
Apache Beam offers users a novel programming model in which the classic batch-streaming dichotomy is erased and ships with a rich set of I/O connectors to popular storage systems. Eugene Kirpichov explains why Beam has made these connectors flexible and modular—a key component of which is Splittable DoFn, a novel programming model primitive that unifies data ingestion between batch and streaming. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft)
Olivia Klose and Elena Terenzi offer an overview of a collaboration between Microsoft and the Royal Holloway University that applied deep learning to locate illegal small-scale mines in Ghana using satellite imagery, scaled training using Kubernetes, and investigated the mines' impact on surrounding populations and environment. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Secondary topics:  Data Platforms, Managing and Deploying Machine Learning
Simon Chan (Salesforce)
The promises of AI are great, but taking the steps to implement AI within an enterprise is challenging. The secret behind enterprise AI success often traces back to the underlying platform that accelerates AI development at scale. Based on years of experience helping executives establish AI product strategies, Simon Chan helps you discover the AI platform journey that is right for your business. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 24/05/2018
Data engineering and architecture
Location: Expo Hall Level: Beginner
Secondary topics:  Data Integration and Data Pipelines sessions, Data Platforms
Stamatis Stefanakos (D ONE AG)
Switzerland-based startup WinJi capitalizes on two current megatrends: big data and renewable energy. Stamatis Stefanakos offers an overview of WinJi's TruePower Asset Management Platform, covering the overall architecture and the motivation behind it, the physics behind the data, and the business case. Read more.

15:35

15:35–16:35 Thursday, 24/05/2018
Location: Expo Hall (Capital Hall 24)
Afternoon break (1h)

16:35

Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Giuseppe D'alessio (ING Group)
Giuseppe Dalessio details ING's DevOps journey, covering its impact on people, processes and tools, best practices, and pitfalls. Giuseppe concludes with a concrete example of using analytics and streaming technology on real-time applications. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate
Kevin Sigliano (IE Business School )
Financial and consumer ROI demands that business leaders understand the drivers and dynamics of digital transformation and big data. Kevin Sigliano explains why disrupting value propositions and continuous innovation are critical if you wish to dramatically improve the way your company engages customers and creates value and maximize financial results. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Telecom
Sven Löffler (T-Systems)
Sven Löffler offers an overview of the Data Intelligence Hub, T-Systems's implementation of the Fraunhofer Industrial Data Space: a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 13 Level: Intermediate
Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Impel Labs)
Amit Kapoor and Bargava Subramanian lead three live demos of deep learning (DL) done in the browser—building explorable explanations to aid insight, building model inference applications, and rapid prototyping and training an ML model—using the emerging client-side JavaScript libraries for DL. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Jason Bell (MastodonC)
Jason Bell offers an overview of a self-learning knowledge system that uses Apache Kafka and Deeplearning4j to accept data, apply training to a neural network, and output predictions. Jason covers the system design and the rationale behind it and the implications of using a streaming data with deep learning and artificial intelligence. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018 Secondary topics:  Data Platforms
Naghman Waheed (Monsanto), Brian Arnold (Monsanto)
There are a number of tools that make it easy to implement a data lake. However, most lack the essential features that prevent your data lake from turning into a data swamp. Naghman Waheed and Brian Arnold offer an overview of Monsanto's Data Historian platform, which can ingest, store, access datasets without compromising ease of use, governance, or security. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Flavio Junqueira (Dell EMC)
Stream processing is in the spotlight. Enabling low-latency insights and actions out of continuously generated data is compelling to a number of application domains, and the ability to adapt to workload variations is critical to many applications. Flavio Junqueira explores Pravega, a stream store that scales streams automatically and enables applications to scale downstream by signaling changes. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 10/11 Level: Beginner
Secondary topics:  Financial Services
Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)
One major challenge to social housing is determining how best to target interventions when tenants fall behind on rent payments. Jonathan Leslie and Tom Harrison discuss a recent project in which a team of data scientist trainees helped Hackney Council devise a more efficient, targeted strategy to detect and prioritize such situations. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate
Tags: us
Quantitative measurement is the key to scaling businesses, processes, and products and making them better. It sounds easy: just pick a number and improve it. However, actually choosing a metric is an exploration of a many-dimensional space with no map and no guide. Until now. Join Ketan Gangatirkar to learn how to choose the right metrics so you can build a better product and a better business. Read more.