Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Monday, 21/05/2018

7:30

7:30–9:00 Monday, 21/05/2018
Location: Capital Suite Foyer
Coffee break (1h 30m)

9:00

SOLD OUT
9:00–17:00 Monday, 21/05/2018
Location: Capital Suite 1
Behzad Bordbar (Cloudera)
Behzad Bordbar demonstrates how to implement typical data science workflows using Apache Spark. You'll learn how to wrangle and explore data using Spark SQL DataFrames and how to build, evaluate, and tune machine learning models using Spark MLlib. Read more.
9:00–17:00 Monday, 21/05/2018
Location: Capital Suite 7
Zachary Glassman (The Data Incubator)
Zachary Glassman offers a foundation in building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline, from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets. Read more.
9:00–17:00 Monday, 21/05/2018
Jesse Anderson (Big Data Institute)
Average rating: *****
(5.00, 1 rating)
To handle real-time big data, you need to solve two difficult problems: How do you ingest that much data, and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
9:00–17:00 Monday, 21/05/2018
Location: Capital Suite 17
Dana Mastropole (The Data Incubator)
The TensorFlow library enables the use of data flow graphs for numerical computations, with automatic parallelization across several CPUs or GPUs. This architecture makes it ideal for implementing neural networks and other machine learning algorithms. Dana Mastropole details TensorFlow's capabilities through its Python interface. Read more.
9:00–17:00 Monday, 21/05/2018
Location: London Suite 2
Jean Innes (ASI Data Science), Matthew Ward (ASI Data Science)
Jean Innes, Matthew Ward, Emanuele Haerens, and Alli Paget lead a condensed introduction to key data science and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization. Read more.

10:30

10:30–11:00 Monday, 21/05/2018
Location: Capital Suite Foyer
Coffee break (30m)

12:30

12:30–13:30 Monday, 21/05/2018
Location: Capital Suite Foyer
Lunch (1h)

15:00

15:00–15:30 Monday, 21/05/2018
Location: Capital Suite Foyer
Afternoon break (30m)

Tuesday, 22/05/2018

7:30

7:30–9:00 Tuesday, 22/05/2018
Location: Auditorium Foyer
Coffee break sponsored by Redis Lab (1h 30m)

9:00

9:00–17:00 Tuesday, 22/05/2018
Location: Capital Suite 2/3
Dan Jeavons (Shell), Hollie Lubbock (Fjord), Jivan Virdee (Fjord), fausto morales (Arundo), Marty Cochrane (Arundo), Jane McConnell (Teradata), Paul Ibberson (Teradata), Michael Troughton (Conduce), Jonathan Genah (DHL Supply Chain), Allison Nau (Cox Automotive UK), Dave Fitch (The Data Lab), Maria Assunta Palmieri (Data Reply ), Niranjan Thomas (Dow Jones), Erik Elgersma (FrieslandCampina), Viola Melis (Typeform), carme artigas (Synergic Partners), Nuria Bombardo (Pepsico)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
9:00–17:00 Tuesday, 22/05/2018
Location: Capital Suite 4
Paul Lashmet (Arcadia Data), Anthony Culligan (SETL), Konrad Sippel (Deutsche Börse), Paul Lynn (Nordea), Mikheil Nadareishvili (TBC Bank), Olaf Hein (ORDIX AG), Robert Passarella (Alpha Features), Louise Beaumont (Publicis Groupe | techUK | NPSO), Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Christina Erlwein-Sayer (OptiRisk Systems), Angelique Mohring (GainX), Saeed Amen (Cuemacro), Gisele Frederick (Zingr.io)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
9:00–12:30 Tuesday, 22/05/2018
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)
Average rating: ***..
(3.67, 3 ratings)
The need for instant data-driven insights has led the proliferation of messaging and streaming frameworks. Karthik Ramasamy, Arun Kejariwal, and Ivan Kelly walk you through state-of-the-art streaming frameworks, algorithms, and architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
9:00–12:30 Tuesday, 22/05/2018
Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 9 Level: Non-technical
Secondary topics:  Security and Privacy
Aurélie Pols (Mind Your Privacy)
Average rating: *****
(5.00, 1 rating)
Aurélie Pols walks you through a "5+5 pillars" framework for GDPR readiness, explaining what the GDPR means to data-fueled businesses. You'll learn how to attribute responsibility to assure compliance and build toward ethical data practices, minimizing risk for your company while fostering trust with your clients. Read more.
9:00–12:30 Tuesday, 22/05/2018 Secondary topics:  Text and Language processing and analysis
Barbara Fusinska (Google)
Average rating: ****.
(4.33, 3 ratings)
Natural language processing techniques help address tasks like text classification, information extraction, and content generation. Barbara Fusinska offers an overview of natural language processing and walks you through building a bag-of-words representation, using Python and its machine learning libraries, and then using it for text classification. Read more.
9:00–17:00 Tuesday, 22/05/2018
Big data and data science in the cloud
Location: Capital Suite 11 Level: Intermediate
Carl Osipov (Google)
Carl Osipov walks you through building a complete machine learning pipeline from ingest, exploration, training, and evaluation to deployment and prediction. Read more.
9:00–12:30 Tuesday, 22/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 12 Level: Non-technical
Secondary topics:  Visualization, Design, and UX
Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product)
Average rating: ****.
(4.00, 2 ratings)
These days it’s easy for companies to say, "We measure everything!” The problem is, most popular metrics may not be appropriate or relevant for your business. Measurement isn’t free and should be done strategically. Radhika Dutt, Geordie Kaytes, and Nidhi Aggarwal explain how to align measurement with your product strategy so you can measure what matters for your business. Read more.
9:00–12:30 Tuesday, 22/05/2018
Data engineering and architecture
Location: Capital Suite 13 Level: Intermediate
Eugene Fratkin (Cloudera), Vinithra Varadharajan (Cloudera), Mael Ropars (Cloudera), Jason Wang (Cloudera)
Average rating: *****
(5.00, 1 rating)
Vinithra Varadharajan, Jason Wang, Eugene Fratkin, and Mael Ropars detail new paradigms to effectively run production-level pipelines with minimal operational overhead. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control. Read more.
9:00–12:30 Tuesday, 22/05/2018
Data engineering and architecture
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Data Platforms
Mark Madsen (Teradata), Todd Walter (Archimedata)
Average rating: ****.
(4.29, 7 ratings)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
9:00–12:30 Tuesday, 22/05/2018
Data science and machine learning
Location: Capital Suite 15 Level: Intermediate
Vartika Singh (Cloudera), Juan Yu (Cloudera), Marton Balassi (Cloudera), Steven Totman (Cloudera)
Average rating: ***..
(3.75, 4 ratings)
Vartika Singh, Marton Balassi, Steven Totman, and Juan Yu outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.

10:30

10:30–11:00 Tuesday, 22/05/2018
Location: Capital Suite Foyer
Morning break (30m)

12:30

12:30–13:30 Tuesday, 22/05/2018
Location: N11
Lunch sponsored by IBM (1h)

13:30

13:30–17:00 Tuesday, 22/05/2018
Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
Average rating: ***..
(3.25, 4 ratings)
Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Along the way, Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to chose them instead. Read more.
13:30–17:00 Tuesday, 22/05/2018
Visualization and user experience
Location: Capital Suite 9 Level: Non-technical
Secondary topics:  Visualization, Design, and UX
Danyel Fisher (Honeycomb.io), Miriah Meyer (University of Utah)
Average rating: ****.
(4.00, 4 ratings)
Danyel Fisher and Miriah Meyer explore the human side of data analysis and visualization, covering operationalization, the process of reducing vague problems to specific tasks, and how to choose a visual representation that addresses those tasks. Along the way, they also discuss single views and explain how to link them into multiple views. Read more.
13:30–17:00 Tuesday, 22/05/2018
Location: Capital Suite 10
TBC
SOLD OUT
13:30–17:00 Tuesday, 22/05/2018
Data engineering and architecture
Location: Capital Suite 12 Level: Advanced
Secondary topics:  Data Platforms
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Average rating: ****.
(4.33, 3 ratings)
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics. Read more.
13:30–17:00 Tuesday, 22/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
David Talby (Pacific AI), Claudiu Branzan (Accenture)
Average rating: ****.
(4.33, 3 ratings)
Natural language processing is a key component in many data science systems. David Talby and Claudiu Branzan lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
13:30–17:00 Tuesday, 22/05/2018
Strata Business Summit
Location: Capital Suite 14 Level: Intermediate
Dan Enthoven (Domino Data Lab)
The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Dan Enthoven outlines a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
13:30–17:00 Tuesday, 22/05/2018
Law, ethics, and governance, Platform security and cybersecurity
Location: Capital Suite 15 Level: Intermediate
Secondary topics:  Security and Privacy
Mark Donsky (Okera), Steffen Maerkl (Cloudera), Andre Araujo (Cloudera)
Hybrid big data deployments present significant new security risks. Security admins must ensure a consistently secured and governed experience for end users and administrators across multiple workloads. Mark Donsky, Steffen Maerkl, and André Araujo share best practices for meeting these challenges as they walk you through securing a Hadoop cluster. Read more.

15:00

15:00–15:30 Tuesday, 22/05/2018
Location: Capital Suite Foyer
Afternoon break (30m)

17:00

17:00–18:00 Tuesday, 22/05/2018
Location: Expo Hall (Capital Hall 24)
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors. Read more.

19:00

19:00–21:00 Tuesday, 22/05/2018
Location: Various locations
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine London has to offer. Read more.

Wednesday, 23/05/2018

8:15

8:15–8:45 Wednesday, 23/05/2018
Location: Auditorium Foyer
Average rating: *****
(5.00, 1 rating)
Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

8:45

8:45–9:00 Wednesday, 23/05/2018
Location: Auditorium Foyer
Coffee break sponsored by Confluent (7:30 - 9:00) (15m)

9:00

9:00–9:05 Wednesday, 23/05/2018
Location: Auditorium
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: *....
(1.00, 1 rating)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:05

9:05–9:20 Wednesday, 23/05/2018
Location: Auditorium
Mick Hollison (Cloudera), Sven Loeffler (Deutsche Telekom), Robert Neumann (Ultra Tendency)
Average rating: **...
(2.65, 20 ratings)
What happens when you combine near-limitless data with on-demand access to powerful analytics and compute? For Deutsche Telekom, the results have been transformative. Mick Hollison, Sven Löffler, and Robert Neumann explain how Deutsche Telekom is harnessing machine learning and analytics in the cloud to build Europe’s largest and most powerful IoT data marketplace. Read more.

9:20

9:20–9:35 Wednesday, 23/05/2018
Location: Auditorium
Alison Howard (Microsoft)
Average rating: **...
(2.62, 16 ratings)
May 25, the day the GDPR goes into effect, is an important milestone for data protection in the EU and elsewhere, but the journey to GDPR compliance neither begins nor ends there. Alison Howard explains how Microsoft, one of the world’s largest companies, with operations across the EU and around the globe, has prepared for May 25 and beyond. Read more.

9:35

9:35–9:45 Wednesday, 23/05/2018
Location: Auditorium
JEAN FRANCOIS PUGET (IBM Analytics)
Average rating: ***..
(3.50, 16 ratings)
On the way to active analytics for business, we have to answer two big questions: What must happen to data before running machine learning algorithms, and how should machine learning output be used to generate actual business value? Jean-François Puget demonstrates the vital role of human context in answering those questions. Read more.

9:45

9:45–9:55 Wednesday, 23/05/2018
Location: Auditorium Level: Non-technical
Ben Lorica (O'Reilly)
Average rating: **...
(2.91, 11 ratings)
To enable the machine learning applications of the future, there remain many interesting and challenging data problems we need to tackle as a community. Ben Lorica discusses some of the pressing problems we're facing as we collect and store data, particularly in an era when our machine learning models require huge amounts of labeled data. Read more.

9:55

9:55–10:10 Wednesday, 23/05/2018
Location: Auditorium
Pierre Romera (International Consortium of Investigative Journalists (ICIJ))
Average rating: ****.
(4.73, 26 ratings)
Last November, the International Consortium of Investigative Journalists (ICIJ) published the Paradise Papers, a yearlong investigation on the offshore dealings of multinational companies and the wealthy. Pierre Romera offers a behind-the-scenes look into the process and explores the challenges in handling 1.4 TB of data and making it available securely to journalists all over the world. Read more.

10:15

10:15–10:30 Wednesday, 23/05/2018
Location: Auditorium
Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)
Average rating: ***..
(3.19, 21 ratings)
Keynote with Eva Kaili Read more.

10:45

10:45–11:15 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Morning break (30m)

11:15

11:15–11:55 Wednesday, 23/05/2018
Data engineering and architecture
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Security and Privacy
Charaka Goonatilake (Panaseer)
Average rating: ****.
(4.50, 2 ratings)
Data is becoming a crucial weapon to secure an organization against cyber threats. Charaka Goonatilake shares strategies for designing effective data platforms for cybersecurity using big data technologies, such as Spark and Hadoop, and explains how these platforms are being used in real-world examples of data-driven security. Read more.
11:15–11:55 Wednesday, 23/05/2018
Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate
Secondary topics:  Financial Services, Security and Privacy
Mark Donsky (Okera), Syed Rafice (Cloudera)
Average rating: ****.
(4.00, 1 rating)
In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Syed Rafice outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
11:15–11:55 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 2/3
Average rating: ****.
(4.50, 2 ratings)
What was once science fiction has now become reality as multiple AI consumer-based solutions have hit the market over last few years. In turn, consumers have become more comfortable interacting with AI. But has AI really lived up to the hype? For consumers, perhaps not yet. However, AI for business is a different (and more valuable) animal. Carlo Appugliese details how business can put AI to work. Read more.
11:15–11:55 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG)
Average rating: *****
(5.00, 4 ratings)
The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry. Miha Pelko and Aleksandr Melkonyan outline these challenges and explain how BMW is overcoming them by adapting and reinventing existing big data solutions for autonomous driving. Read more.
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning, Law, ethics, and governance
Location: Capital Suite 12 Level: Non-technical
Secondary topics:  Security and Privacy
Steven Touw (Immuta)
Average rating: ****.
(4.25, 4 ratings)
The Strata Data conference in London takes place during one of the most important weeks in the history of data regulation, as GDPR begins to be enforced. Steve Touw explores the effects of the GDPR on deploying machine learning models in the EU. Read more.
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Advanced
Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)
Average rating: ****.
(4.00, 5 ratings)
Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud, using a ResNet network trained on the ImageNet dataset as an example. You'll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for. Read more.
11:15–11:55 Wednesday, 23/05/2018 Secondary topics:  Media, Advertising, Entertainment, Security and Privacy
Guillaume Chaslot (AlgoTransparency)
Average rating: ****.
(4.17, 6 ratings)
An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines. Read more.
11:15–11:55 Wednesday, 23/05/2018
Stuart Pook (Criteo)
Average rating: ****.
(4.40, 5 ratings)
Criteo has a production cluster of 2K nodes running over 300K jobs a day in the company's own data centers. These clusters were meant to provide a redundant solution to Criteo's storage and compute needs. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo's progress in building another cluster to survive the loss of a full DC. Read more.
11:15–11:55 Wednesday, 23/05/2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Platforms, Media, Advertising, Entertainment
Jason Heo (Naver), Dooyong Kim (Navercorp)
Average rating: ***..
(3.00, 1 rating)
Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid. Read more.
11:15–11:55 Wednesday, 23/05/2018
Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate
Gerard Maas (Lightbend)
Average rating: ****.
(4.00, 13 ratings)
Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences in key aspects of a streaming application, from the API user experience to dealing with time and with state and machine learning capabilities, and shares practical guidance on picking one or combining both to implement resilient streaming pipelines. Read more.
11:15–11:55 Wednesday, 23/05/2018
Data science and machine learning, Data-driven business management
Location: Capital Suite 10/11 Level: Beginner
Secondary topics:  Transportation and Logistics
Average rating: ****.
(4.45, 11 ratings)
Because in-house data science teams work with a range of business functions, traditional data science processes are often too abstract to cope with the complexity of these environments. Alberto Rey Villaverde and Grigorios Mingas share case studies from easyJet that highlight some unpredictable hurdles related to requirements, data, infrastructure, and deployment and explain how they solved them. Read more.
11:15–11:55 Wednesday, 23/05/2018
Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Secondary topics:  Financial Services
Audrey Lobo-Pulo (Phoensight), Nicholas O'Donnell (LinkedIn)
In October 2017, LinkedIn and the Australian Treasury teamed up to gain a deeper understanding of the Australian labor market through new data insights, which may inform economic policy and directly benefit society. Audrey Lobo-Pulo and Nick O'Donnell share some of the discoveries from this collaboration as well as the practicalities of working in a public-private partnership. Read more.
11:15–11:55 Wednesday, 23/05/2018 Secondary topics:  Media, Advertising, Entertainment
Dan Gilbert (News UK), Jonathan Leslie (Pivigo)
Average rating: ***..
(3.75, 4 ratings)
In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task. Read more.

12:05

12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Federico Leven (ReactoData)
Average rating: **...
(2.67, 3 ratings)
The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, particularly in the Cloudera stack. Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Non-technical
Teresa Tung (Accenture), Jean-Luc Chatelain (Accenture)
Average rating: ***..
(3.12, 8 ratings)
A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung and Jean-Luc Chatelain explain how to create a data-driven enterprise maturity model that spans technology and business requirements and walk you through use cases that bring the model to life. Read more.
12:05–12:45 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Mathew Lodge (Anaconda)
Average rating: ****.
(4.50, 4 ratings)
The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Mathew Lodge demonstrates that it's just as easy to deploy Python as it is Java, using containers and Kubernetes. Welcome to the future. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12
Secondary topics:  Media, Advertising, Entertainment, Security and Privacy
Elisa Celis (EPFL)
Average rating: ****.
(4.25, 4 ratings)
There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  E-commerce and Retail, Media, Advertising, Entertainment
Average rating: ****.
(4.43, 7 ratings)
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Visualization, Design, and UX
Jeff Fletcher (Cloudera)
Average rating: ****.
(4.73, 11 ratings)
As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models. Read more.
12:05–12:45 Wednesday, 23/05/2018
Jim Scott (NVIDIA)
Average rating: ****.
(4.00, 2 ratings)
Creating a business solution is a lot of work. Instead of building to run on a single cloud provider, it is far more cost effective to leverage the cloud as infrastructure as a service (IaaS). Jim Scott explains why a global data fabric is a requirement for running on all cloud providers simultaneously. Read more.
12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Data Platforms, E-commerce and Retail, Transportation and Logistics
Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)
Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data engineering and architecture
Location: Capital Suite 8/9 Level: Beginner
Secondary topics:  Telecom
Average rating: ***..
(3.67, 3 ratings)
In the past year, British Telecom has added a streaming network analytics use case to its multitenant data platform. Phillip Radley demonstrates how the solution works and explains how it delivers better broadband and TV services, using Kafka and Spark on YARN and HDFS encryption. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Financial Services
Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)
Average rating: ***..
(3.33, 3 ratings)
Risk-sharing pools allow insurers to get rid of risks they are forced to insure in highly regulated markets. Insurers thus cede both the risk and its premium. But are they ceding the right risk or simply giving up premium? Baiju Devani and Étienne Chassé St-Laurent share an applied machine learning approach that leverages an ensemble of models to gain a distinctive market advantage. Read more.
12:05–12:45 Wednesday, 23/05/2018 Secondary topics:  Telecom, Time Series and Graphs
Ira Cohen (Anodot)
The mobile world has so many moving parts that a simple change to one element can cause havoc somewhere else, resulting in issues that annoy users and cause revenue leaks. Ira Cohen outlines ways to use anomaly detection to track everything mobile, from the service and roaming to specific apps, to fully optimize your mobile offerings. Read more.
12:05–12:45 Wednesday, 23/05/2018
Data science and machine learning, Expo Hall
Location: Expo Hall Level: Intermediate
Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)
Konstantinos Georgatzis and Martha Imprialou explain how to interpret the predictions given by your black-box model and how machine learning is helping to drive decision making today. Read more.

12:45

12:45–14:05 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
12:45–14:05 Wednesday, 23/05/2018
Location: S11A
Average rating: *****
(5.00, 1 rating)
If you’re looking to find like minds and make new professional connections, come to the Women's Networking Lunch on Wednesday. Read more.
12:45–14:05 Wednesday, 23/05/2018
Location: Expo Hall - SBS lunch (Capital Hall 24)
Average rating: ***..
(3.00, 2 ratings)
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.

14:05

14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)
Joshua Patterson and Mike Wendt explain how NVIDIA used GPU-accelerated open source technologies to improve its cyberdefense platforms by leveraging software from the GPU Open Analytics Initiative (GOAI) and how the company accelerated anomaly detection with more efficient machine learning models, faster deployment, and more granular data exploration. Read more.
14:05–14:45 Wednesday, 23/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Beginner
Danielle Dean (iRobot)
Average rating: ****.
(4.80, 5 ratings)
Danielle Dean covers the basics of managing data science projects, including the data science lifecycle, and offers an overview of an internal approach at Microsoft called the Team Data Science Process (TDSP). Join in to learn more about the typical priorities of data science teams and the keys to success on engaging and creating value with data science. Read more.
14:05–14:45 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 2/3
Randy Lea (Arcadia Data)
Average rating: ***..
(3.62, 8 ratings)
Business intelligence (BI) and analytics on data lakes have had limited success. Data lakes often fall short because they are mostly used by data scientists and not by business users. Randy Lea explains why existing BI tools work well for data warehouses but not data lakes and why modern BI tools designed for data lakes should represent the second BI standard in enterprises today. Read more.
14:05–14:45 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Steve Kilgore (WANdisco)
Today, every company is a data company. Business success depends on putting large volumes of live data to work to drive competitive advantage. Paul Phillips details how some of the world’s largest companies have achieved 100% uptime while moving massive live datasets and halving their hardware requirements. Read more.
14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Telecom, Time Series and Graphs
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Average rating: ***..
(3.00, 1 rating)
Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noah’s Ark Lab and Télécom ParisTech. Read more.
14:05–14:45 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
eran avidan (Intel)
Average rating: ****.
(4.50, 2 ratings)
Deep learning is revolutionizing many domains within computer vision, but doing real-time analysis is challenging. Eran Avidan offers an overview of a novel architecture based on Redis, Docker, and TensorFlow that enables real-time analysis of high-resolution streaming video. Read more.
14:05–14:45 Wednesday, 23/05/2018
Jivan Virdee (Fjord), Hollie Lubbock (Fjord)
Average rating: *****
(5.00, 2 ratings)
Artificial intelligence systems are powerful agents of change in our society, but as this technology becomes increasingly prevalent—transforming our understanding of ourselves and our society—issues around ethics and regulation will arise. Jivan Virdee and Hollie Lubbock explore how to address fairness, accountability, and the long-term effects on our society when designing with data. Read more.
14:05–14:45 Wednesday, 23/05/2018
Greg Rahn (Cloudera)
Average rating: ***..
(3.29, 7 ratings)
For many organizations, the next big data warehouse will be in the cloud. Greg Rahn shares considerations for evaluating the cloud for analytics and big data warehousing, including different architectural approaches to optimize price and performance. Read more.
14:05–14:45 Wednesday, 23/05/2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Platforms, Transportation and Logistics
Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)
Average rating: ****.
(4.33, 3 ratings)
Carsten Herbe and Matthias Graunitz detail Audi's journey from a Hadoop proof of concept to a multitenant enterprise platform, sharing lessons learned, the decisions Audi made, and how a number of use cases are implemented using the platform. Read more.
14:05–14:45 Wednesday, 23/05/2018
Michael Noll (Confluent)
Average rating: ****.
(4.67, 6 ratings)
Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. Read more.
14:05–14:45 Wednesday, 23/05/2018
Manas Ranjan Kar (Episource)
Average rating: ***..
(3.00, 3 ratings)
Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work. Read more.
14:05–14:45 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 15/16 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions
Ihab Ilyas (University of Waterloo)
Average rating: ****.
(4.40, 5 ratings)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution. Read more.
14:05–14:45 Wednesday, 23/05/2018
Data engineering and architecture, Expo Hall
Location: Expo Hall Level: Intermediate
Secondary topics:  Time Series and Graphs
Tags: us
Patrick McFadin (DataStax)
Average rating: *****
(5.00, 2 ratings)
Graph databases are becoming mainstream. Patrick McFadin explains how to use the knowledge you have gained from your years of working with relational databases in this brave new world. There are many similarities but also some significant differences that can open up completely new use cases. If you're deciding whether to take the plunge into graph databases, this is the talk for you. Read more.

14:55

14:55–15:35 Wednesday, 23/05/2018
Lee Blum (Verint Systems)
Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. Read more.
14:55–15:35 Wednesday, 23/05/2018
Dean Wampler (Anyscale)
Average rating: ****.
(4.00, 2 ratings)
Streaming data systems, so called fast data, promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler outlines what you need to know to exploit fast data successfully. Read more.
14:55–15:35 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 2/3
Wael Elrifai (Hitachi Vantara)
Wael Elrifai shares his experiences working in the IoT and AI spaces, covering complexities, pitfalls, and opportunities to explain why innovation isn’t just good for business—it's a societal imperative. Read more.
14:55–15:35 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Chiang Yang (Cisco)
Han Yang explains how Cisco is leveraging big data and analytics and details how the company is helping customers to incorporate data sources from the internet of things and deploy machine learning at the edge and at the enterprise. Read more.
14:55–15:35 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  E-commerce and Retail, Financial Services, Time Series and Graphs
Mikio Braun (Zalando)
Average rating: ****.
(4.40, 15 ratings)
Time series data has many applications in industry, in particular predicting the future based on historical data. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn't. Read more.
14:55–15:35 Wednesday, 23/05/2018
Aurélien Géron (Kiwisoft)
Average rating: ***..
(3.67, 3 ratings)
Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact on manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way. Read more.
14:55–15:35 Wednesday, 23/05/2018 Secondary topics:  Visualization, Design, and UX
Brian O'Neill (Designing for Analytics)
Average rating: ****.
(4.00, 2 ratings)
Gartner says 85%+ of big data projects will fail. Your own company may have even spent millions on a recent project that isn’t really delivering the value or UX everyone hoped for. Brian O'Neill explains why CDOs, PMs, and business leaders who leverage design to prioritize utility, usability, and customer value will realize the best ROIs and demonstrates how to start evaluating your UX. Read more.
14:55–15:35 Wednesday, 23/05/2018
Tomer Shiran (Dremio)
Average rating: ***..
(3.50, 2 ratings)
It's often impractical for organizations to physically consolidate all data into one system. Tomer Shiran offers an overview of Apache Arrow, an open source columnar, in-memory data representation that enables analytical systems and data sources to exchange and process data in real time, simplifying and accelerating data access without having to copy all data into one location. Read more.
14:55–15:35 Wednesday, 23/05/2018
Data engineering and architecture
Location: S11B Level: Beginner
Secondary topics:  Transportation and Logistics
Timo Graen (Volkswagen AG ), Robert Neumann (Ultra Tendency)
Average rating: ***..
(3.50, 2 ratings)
Map-matching applications exist in almost every telematics use case and are therefore crucial to all car manufacturers. Timo Graen and Robert Neumann detail the architecture behind Volkswagen Commercial Vehicle’s Altus-based map-matching application and lead a live demo featuring a map matching job in Altus. Read more.
14:55–15:35 Wednesday, 23/05/2018
Ivan Kelly (Streamlio)
Average rating: ***..
(3.00, 2 ratings)
Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access. Read more.
14:55–15:35 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 10/11
Harvinder Atwal (Moneysupermarket)
Average rating: *****
(5.00, 4 ratings)
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, and shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, and more. Read more.
14:55–15:35 Wednesday, 23/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Kim Nilsson (Pivigo), Phil Harvey (Microsoft)
Average rating: ****.
(4.67, 9 ratings)
Our lives are being transformed by data, changing our understanding of work, play, and health. Every organization can take advantage of this resource, but something is holding us back: us. Kim Nilsson and Phil Harvey explain how to build a successful data culture that embeds data at the heart of every organization through people and delivers success through empathy, communication, and humanity. Read more.
14:55–15:35 Wednesday, 23/05/2018 Secondary topics:  Managing and Deploying Machine Learning
Emre Velipasaoglu (Lightbend)
Average rating: ***..
(3.67, 3 ratings)
Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications. Read more.

15:35

15:35–16:35 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Afternoon break sponsored by Airbus (1h)

16:35

16:35–17:15 Wednesday, 23/05/2018
Data engineering and architecture, Platform security and cybersecurity
Location: Capital Suite 7 Level: Non-technical
Secondary topics:  Security and Privacy
Thomas Phelan (HPE BlueData)
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE), but TDE can be difficult to configure and manage—issues that are only compounded when running on Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them. Read more.
16:35–17:15 Wednesday, 23/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Beginner
Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)
Average rating: ****.
(4.33, 6 ratings)
If your goal is to provide data to an analyst rather than a data scientist, what’s the best way to deliver analytics? There are 70+ BI tools in the market and a dozen or more SQL- or OLAP-on-Hadoop open source projects. Mark Madsen and Shant Hovsepian discuss the trade-offs between a number of architectures that provide self-service access to data. Read more.
16:35–17:15 Wednesday, 23/05/2018
Data engineering and architecture
Location: Capital Suite 2/3 Level: Intermediate
Enric Biosca Trias (everis), Angel Valencia (everis)
Average rating: **...
(2.00, 2 ratings)
Enric Biosca offers an overview of the eAGLE accelerator, which speeds up migration processes from legacy ETL to big data implementations by enabling auditing, lineage, and translation of legacy code for big data. Along the way, Enric demonstrates how graph and automatic translation technologies help companies reduce their migration times. Read more.
16:35–17:15 Wednesday, 23/05/2018
Sponsored
Location: Capital Suite 4
Ted Orme (Attunity)
Average rating: ****.
(4.00, 3 ratings)
Modern analytics and AI initiatives require an adaptable data lake with a multistage architectural design to effectively ingest, stage, and provision specific datasets in real time. Ted Orme discusses his experience at Attunity creating a real-time data integration solution for Fortune 100 organizations and shares best practices and lessons learned along the way. Read more.
16:35–17:15 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Time Series and Graphs
Arun Kejariwal (Independent), Francois Orsini (MZ)
Average rating: ***..
(3.14, 7 ratings)
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.
16:35–17:15 Wednesday, 23/05/2018
Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions, Media, Advertising, Entertainment
Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)
Average rating: ****.
(4.00, 1 rating)
Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Sergey Ermolin and Olga Ermolin detail an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages a transfer learning Siamese architecture based on VGG-16 CNN topology. Read more.
16:35–17:15 Wednesday, 23/05/2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Beginner
Secondary topics:  Visualization, Design, and UX
Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)
Average rating: *****
(5.00, 1 rating)
Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity. Read more.
16:35–17:15 Wednesday, 23/05/2018
Paul Curtis (Weaveworks)
Average rating: ****.
(4.00, 2 ratings)
The flexibility advantage conferred by containers depends on their ephemeral nature, so it’s useful to keep containers stateless. However, many applications require state—access to a scalable persistence layer that supports real mutable files, tables, and streams. Paul Curtis demonstrates how to make containerized applications reliable, available, and performant, even with stateful applications. Read more.
16:35–17:15 Wednesday, 23/05/2018 Secondary topics:  Text and Language processing and analysis
Ran Taig (Dell), Omer Sagi (Dell)
Average rating: **...
(2.00, 1 rating)
DevOps and QA engineers spend a significant amount of time investigating reoccurring issues. These issues are often represented by large configuration and log files, so the process of investigating whether two issues are duplicates can be a very tedious task. Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues. Read more.
16:35–17:15 Wednesday, 23/05/2018
Sean Glover (Lightbend)
Average rating: **...
(2.50, 2 ratings)
Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters. Read more.
16:35–17:15 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Non-technical
Secondary topics:  Text and Language processing and analysis
Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)
Average rating: ***..
(3.67, 3 ratings)
Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication. Read more.
16:35–17:15 Wednesday, 23/05/2018
Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)
Average rating: *****
(5.00, 2 ratings)
Jude McCorry and Mahmood Adil offer an overview of Data Collaboratives, a new form of collaboration beyond the public-private partnership model, in which participants from different sectors  exchange data, skills, leadership, and knowledge to solve complex problems facing children in Scotland and worldwide. Read more.
16:35–17:15 Wednesday, 23/05/2018
Data engineering and architecture, Expo Hall
Location: Expo Hall Level: Intermediate
Tobias Burger (BMW Group), Joshua Goerner (BMW AG)
Average rating: *****
(5.00, 1 rating)
The BMW Group IT team drives the usage of data-driven technologies and forms the nucleus of a data-centric culture inside of the organization. Tobias Bürger and Joshua Görner discuss the E-to-E relationship of data and models and share best practices for scaling applications in real-world environments. Read more.

17:25

17:25–18:05 Wednesday, 23/05/2018 Secondary topics:  Security and Privacy
Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)
Average rating: ****.
(4.00, 2 ratings)
Having so many cloud-based analytics services available is a dream come true. However, it's a nightmare to manage proper security and governance across all those different services. Nikki Rouda and Nick Curcuru share advice on how to minimize the risk and effort in protecting and managing data for multidisciplinary analytics and explain how to avoid the hassle and extra cost of siloed approaches. Read more.
17:25–18:05 Wednesday, 23/05/2018
Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate
Secondary topics:  Managing and Deploying Machine Learning
David Talby (Pacific AI)
Average rating: ****.
(4.00, 1 rating)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
17:25–18:05 Wednesday, 23/05/2018
Data engineering and architecture
Location: Capital Suite 2/3 Level: Beginner
Wataru Yukawa (LINE)
LINE—one of the most popular messaging applications in Asia—offers many services, such as its news application. These services sometimes depend on real-time processing. Wataru Yukawa offers an overview of LINE's web tracking system, which consists of the JavaScript SDK, NGINX Fluentd, Kafka, Elasticsearch, and Hadoop, and explains how it helps with batch and real-time processing. Read more.
17:25–18:05 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Security and Privacy, Time Series and Graphs
Fabian Yamaguchi (ShiftLeft)
Average rating: ****.
(4.33, 3 ratings)
Fabian Yamaguchi offers an overview of Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. Read more.
17:25–18:05 Wednesday, 23/05/2018
Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
Darren Cook (QQ Trend)
Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English. Read more.
17:25–18:05 Wednesday, 23/05/2018
Data science and machine learning
Location: Capital Suite 14 Level: Intermediate
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Average rating: ***..
(3.83, 6 ratings)
Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data. Read more.
17:25–18:05 Wednesday, 23/05/2018
Christopher Royles (Cloudera)
Average rating: ****.
(4.00, 1 rating)
Big data and cloud deployments return huge benefits in flexibility and economics but can also result in runaway costs and failed projects. Drawing on his production experience, Christopher Royles shares tips and best practices for determining initial sizing, strategic planning, and longer-term operation, helping you deliver an efficient platform, reduce costs, and implement a successful project. Read more.
17:25–18:05 Wednesday, 23/05/2018
Holden Karau (Independent), Rachel Warren (Salesforce Einstein)
Average rating: ****.
(4.00, 2 ratings)
Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads. Read more.
17:25–18:05 Wednesday, 23/05/2018
Aljoscha Krettek (Ververica)
Average rating: ****.
(4.67, 3 ratings)
Aljoscha Krettek offers an overview of the modern stream processing space, details the challenges posed by stateful and event-time-aware stream processing, and shares core archetypes ("application blueprints”) for stream processing drawn from real-world use cases with Apache Flink. Read more.
17:25–18:05 Wednesday, 23/05/2018
Jorie Koster-Hale (Dataiku)
Average rating: *****
(5.00, 3 ratings)
Because crime is affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future. Read more.
17:25–18:05 Wednesday, 23/05/2018
Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)
Average rating: ***..
(3.60, 5 ratings)
Big data analytics tends to focus on what is easily available, which is by and large data about what has already happened, the implicit assumption being that past behavior will predict future behavior. Organizations already possess data they aren’t exploiting. Barry Singleton and Richard Goyder explain how, with the right tools, it can be used to develop far more powerful predictive algorithms. Read more.

18:05

18:05–19:05 Wednesday, 23/05/2018
Location: Expo Hall (Capital Hall 24)
Average rating: **...
(2.00, 2 ratings)
Unwind after a long day of sessions with small bites and drinks while networking with Strata attendees, exhibitors, and sponsors. Read more.

19:05

19:05–20:00 Wednesday, 23/05/2018
Location: TBD
TBC

20:00

20:00–22:00 Wednesday, 23/05/2018
Location: Shoreditch
Average rating: ****.
(4.00, 2 ratings)
Enjoy great food and drink at Data After Dark: A Night in Shoreditch. Be sure to take in the street art as you make your way between Zigfrid von Underbelly and Trapeze Bar. Read more.

Thursday, 24/05/2018

8:15

8:15–8:45 Thursday, 24/05/2018
Location: Auditorium Foyer
Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

8:45

8:45–9:00 Thursday, 24/05/2018
Location: Auditorium Foyer
Coffee break sponsored by Data Artisans (8:00 - 9:00) (15m)

9:00

9:00–9:05 Thursday, 24/05/2018
Location: Auditorium
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: *....
(1.00, 1 rating)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

9:05

9:05–9:20 Thursday, 24/05/2018
Location: Auditorium
Louise Beaumont (Publicis Groupe | techUK | NPSO)
Average rating: **...
(2.93, 15 ratings)
Louise Beaumont explores the five characteristics of companies that choose to succeed. Read more.

9:20

9:20–9:35 Thursday, 24/05/2018
Location: Auditorium
Mikio Braun (Zalando)
Average rating: ***..
(3.65, 17 ratings)
Mikio Braun has worked in both research and industry and draws on this experience to share insights on how these two areas are the same (and how they are different). He then details how deep learning might change the game again. Read more.

9:35

9:35–9:45 Thursday, 24/05/2018
Location: Auditorium
Amr Awadallah (Cloudera), Ankit Tharwani (Barclays UK), Bala Chandrasekaran (Barclays)
Average rating: ***..
(3.29, 14 ratings)
Imagine the value you could drive in your business if you could accelerate your journey to machine learning and analytics. Amr Awadallah, Ankit Tharwani, and Bala Chandrasekaran explain how Barclays has driven innovation in real-time analytics and machine learning with Apache Kudu, accelerating the time to value across multiple business initiatives, including marketing, fraud prevention, and more. Read more.

9:50

9:50–10:00 Thursday, 24/05/2018
Location: Auditorium
Zubin Siganporia (QED Analytics)
Average rating: ****.
(4.24, 17 ratings)
The KISS principle tells us to "Keep it simple, stupid." As machine learning techniques become more sophisticated, the need to KISS only becomes greater. Zubin Siganporia discusses the role that simplicity plays in approaching a problem and then convincing end users to adopt data-driven solutions to their challenges. Read more.

10:00

10:00–10:10 Thursday, 24/05/2018
Location: Auditorium
Tom Grey (Google)
Average rating: **...
(2.86, 14 ratings)
The history of data analytics has been marked by an environment of scarcity. The way we approach data analytics is only just catching up. Tom Grey explains why we are on the cusp of a golden age of analytics and machine learning. Read more.

10:10

10:10–10:25 Thursday, 24/05/2018
Location: Auditorium
Christine Foster (The Alan Turing Institute)
Average rating: ***..
(3.31, 13 ratings)
There is a common conception that artificial intelligence will change business. But as researchers at the Alan Turing Institute (the national center for data science and AI) well know, a new algorithm alone does not change the world. Christine Foster explores how businesses and researchers can find common ground and how today’s academic papers turn into tomorrow’s data science. Read more.

10:25

10:25–10:40 Thursday, 24/05/2018
Location: Auditorium
Average rating: ****.
(4.26, 19 ratings)
Keynote with Martha Lane Fox Read more.

10:45

10:45–11:15 Thursday, 24/05/2018
Location: Expo Hall (Capital Hall 24)
Morning break (30m)

11:15

11:15–11:55 Thursday, 24/05/2018 Secondary topics:  Data Platforms, Managing and Deploying Machine Learning, Media, Advertising, Entertainment
Kinnary Jangla (Pinterest)
Average rating: ***..
(3.00, 5 ratings)
Having trouble coordinating development of your production ML system between a team of developers? Microservices drifting and causing problems debugging? Kinnary Jangla explains how Pinterest dockerized the services powering its home feed and how it impacted the engineering productivity of its ML teams while increasing uptime and ease of deployment. Read more.
11:15–11:55 Thursday, 24/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17
Mick Hollison (Cloudera)
Average rating: **...
(2.00, 1 rating)
Mick Hollison shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them. Read more.
11:15–11:55 Thursday, 24/05/2018
Sponsored
Location: Capital Suite 2/3
Ryan Lippert (Google Cloud)
If your company isn’t good at analytics, it’s not ready for AI. Ryan Lippert explains how the right data strategy can set you up for success in machine learning and artificial intelligence—the new ground for gaining competitive edge and creating business value. Read more.
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 12 Level: Beginner
Jeroen Janssens (Data Science Workshops)
Average rating: ***..
(3.00, 2 ratings)
"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Beginner
Secondary topics:  Managing and Deploying Machine Learning
Ramesh Sridharan (Captricity)
Average rating: ****.
(4.00, 1 rating)
Most uses of deep learning involve models trained with large datasets. Ramesh Sridharan explains how Captricity uses deep learning with tiny datasets at scale, training thousands of models using tens to hundreds of examples each. These models are dynamically trained using an automatic deployment framework, and carefully chosen metrics further exploit error properties of the resulting models. Read more.
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Managing and Deploying Machine Learning
Ted Dunning (MapR, now part of HPE)
Average rating: *****
(5.00, 1 rating)
Ted Dunning offers an overview of the rendezvous architecture, which is geared to deal with much of the complexity involved in deploying models to production, thus allowing more time to be spent thinking and doing real data science. Ted covers the ideas behind the architecture, practical scenarios, and advantages and disadvantages of the architecture. Read more.
11:15–11:55 Thursday, 24/05/2018 Secondary topics:  Data Platforms, E-commerce and Retail
Neelesh Salian (Stitch Fix)
Average rating: *....
(1.00, 1 rating)
Neelesh Srinivas Salian offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way. Read more.
11:15–11:55 Thursday, 24/05/2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions, Data Platforms, Media, Advertising, Entertainment
Irene Gonzálvez (Spotify)
Average rating: ***..
(3.88, 8 ratings)
Irene Gonzálvez shares Spotify's process for ensuring data quality, covering why and how the company became aware of its importance, the products it has developed, and future strategy. Read more.
11:15–11:55 Thursday, 24/05/2018 Secondary topics:  Visualization, Design, and UX
Erin Recachinas (Zoomdata)
Average rating: ****.
(4.00, 2 ratings)
The value of real-time streaming analytics with historical data is immense. Big data application Zoomdata updates historical dashboards in real time without complex reaggregations, but streaming in the age of the IoT requires handling of data in volumes not seen in traditional feeds. Erin Recachinas explains how Zoomdata moved to a scalable microservice architecture for streaming sources. Read more.
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Beginner
Average rating: **...
(2.50, 2 ratings)
Tuning a Spark ML model using cross-validation involves a computationally expensive search over a large parameter space. Nick Pentreath and Bryan Cutler explain how enabling Spark to evaluate models in parallel can significantly reduce the time to complete this process for large workloads and share best practices for choosing the right configuration to achieve optimal resource usage. Read more.
11:15–11:55 Thursday, 24/05/2018
Strata Business Summit
Location: Capital Suite 15/16
Saeed Amen (Cuemacro)
Average rating: ****.
(4.40, 5 ratings)
Saeed Amen explores Python libraries that can be used at the various stages of financial analysis, including time series analysis, visualization, structuring data, and storing market data. Read more.
11:15–11:55 Thursday, 24/05/2018
Data science and machine learning, Expo Hall
Location: Expo Hall Level: Beginner
Secondary topics:  Time Series and Graphs
Jared Lander (Lander Analytics)
Average rating: ****.
(4.00, 2 ratings)
Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.

12:05

12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Managing and Deploying Machine Learning
Nanda Vijaydev (BlueData), Thomas Phelan (HPE BlueData)
Average rating: ****.
(4.17, 6 ratings)
In the past, you needed a high-end proprietary stack for advanced machine learning, but today, you can use open source machine learning and deep learning algorithms available with distributed computing technologies like Apache Spark and GPUs. Nanda Vijaydev and Thomas Phelan demonstrate how to deploy a TensorFlow and Spark with NVIDIA CUDA stack on Docker containers in a multitenant environment. Read more.
12:05–12:45 Thursday, 24/05/2018
Executive Briefing, Strata Business Summit
Location: Capital Suite 17
Louise Herring (McKinsey & Company)
Average rating: *****
(5.00, 1 rating)
After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we’re still early in the cycle of adoption. Louise Herring explains where investment is going, patterns of AI adoption and value capture by enterprises, and how the value potential of AI across sectors and business functions is beginning to emerge. Read more.
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Financial Services
Calum Murray (Intuit)
Average rating: *....
(1.50, 2 ratings)
Machine learning-based applications are becoming the new norm. Calum Murray shares five use cases at Intuit that use the data of over 60 million users to create delightful experiences for customers by solving repetitive tasks, freeing them up to spend time more productively or solving very complex tasks with simplicity and elegance. Read more.
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Data Platforms, Managing and Deploying Machine Learning
Moty Fania (Intel)
Moty Fania explains how Intel implemented an AI inference platform to enable internal visual inspection use cases and shares lessons learned along the way. The platform is based on open source technologies and was designed for real-time streaming and online actuation. Read more.
12:05–12:45 Thursday, 24/05/2018
Ask Me Anything
Location: Capital Suite 14
Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)
Average rating: ***..
(3.33, 6 ratings)
Join Mark Madsen and Shant Hovsepian to discuss analytics strategy and planning, data architecture, data management, and BI on big data. Read more.
12:05–12:45 Thursday, 24/05/2018
Jacques Nadeau (Dremio)
Average rating: ****.
(4.00, 3 ratings)
Jacques Nadeau offers an overview of a new Apache-licensed lightweight distributed in-memory cache that allows multiple applications to consume Arrow directly using the Arrow RPC and IPC protocols. You'll explore the system design and deployment architecture, learn how data science, analytical, and custom applications can all leverage the cache simultaneously, and see a live demo. Read more.
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Transportation and Logistics
Mark Grover (Lyft), Ted Malaska (Capital One)
Average rating: *****
(5.00, 6 ratings)
Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed. Read more.
12:05–12:45 Thursday, 24/05/2018
Big data and data science in the cloud, Data engineering and architecture
Location: Capital Suite 8/9 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions
Adesh Rao (Qubole), Abhishek Somani (Qubole)
Average rating: ***..
(3.00, 2 ratings)
Adesh Rao and Abhishek Somani share a framework for materialized views in SQL-on-Hadoop engines that automatically suggests, creates, uses, invalidates, and refreshes views created on top of data for optimal performance and strict correctness. Read more.
12:05–12:45 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Financial Services
Mike Lee Williams (Cloudera Fast Forward Labs)
Average rating: *****
(5.00, 2 ratings)
Interpretable models result in more accurate, safer, and more profitable machine learning products, but interpretability can be hard to ensure. Michael Lee Williams examines the growing business case for interpretability, explores concrete applications including churn, finance, and healthcare, and demonstrates the use of LIME, an open source, model-agnostic tool you can apply to your models today. Read more.
12:05–12:45 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate
Martin Goodson (Evolution AI)
Average rating: ****.
(4.25, 4 ratings)
How can AI become part of our business processes? Should we entrust critical decisions to completely autonomous systems? Drawing on projects from businesses and UK government agencies, Martin Goodson explains how to increase confidence in AI systems and manage the transition to an AI-driven organization. Read more.
12:05–12:45 Thursday, 24/05/2018 Secondary topics:  Time Series and Graphs
Erik Nordström (Timescale)
Erik Nordström explains how and why to use PostgreSQL as a Prometheus backend to support complex questions (and get a proper SQL interface), offers an overview of pg_prometheus, a custom Prometheus datatype, and prometheus-postgresql-adapter, a remote storage adaptor for PostgreSQL, and shares his experience with TimescaleDB, which enables PostgreSQL to scale for classic monitoring volumes. Read more.

12:45

12:45–14:05 Thursday, 24/05/2018
Location: Expo Hall (Capital Hall 24)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
12:45–14:05 Thursday, 24/05/2018
Location: Expo Hall - SBS lunch (Capital Hall 24)
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.

14:05

14:05–14:45 Thursday, 24/05/2018
Data engineering and architecture
Location: Capital Suite 7 Level: Beginner
Secondary topics:  Managing and Deploying Machine Learning
Average rating: ***..
(3.00, 5 ratings)
Guillaume Salou shares OVH's approach to continuous deployment of machine learning models, which involved building a full stack of automated machine learning. Automated machine learning allows the company to rebuild models efficiently and keep models up to date with fresh data brought by its data convergence tool. Read more.
14:05–14:45 Thursday, 24/05/2018
Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Beginner
Secondary topics:  Security and Privacy, Telecom
Alasdair Allan (Babilim Light Industries)
The increasing ubiquity of the internet of things has put a new focus on data privacy. Big data is all very well when it's harvested quietly and stealthily, but when your things tattle on you behind your back, it's a very different matter altogether. Alasdair Allan explains why the internet of things brings with it a whole new set of big data problems that can't be ignored. Read more.
14:05–14:45 Thursday, 24/05/2018
Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 2/3 Level: Intermediate
Secondary topics:  Telecom
Sven Loeffler (Deutsche Telekom)
Average rating: **...
(2.00, 1 rating)
Sven Löffler offers an overview of the Data Intelligence Hub, T-Systems's implementation of the Fraunhofer Industrial Data Space: a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things. Read more.
14:05–14:45 Thursday, 24/05/2018
Data science and machine learning, Data-driven business management
Location: Capital Suite 12 Level: Beginner
Kaylea Haynes (Peak )
Deciding how much stock to hold is a challenge for hire businesses. There is a fine balance between holding enough stock to fulfill hires and not holding too much stock so that overall utilization is too low to achieve the return on investment. Kaylea Haynes shares a case study on forecasting the demand for thousands of assets across multiple locations. Read more.
14:05–14:45 Thursday, 24/05/2018
Data engineering and architecture
Location: Capital Suite 13 Level: Beginner
Jim Dowling (Logical Clocks)
Average rating: *****
(5.00, 2 ratings)
Distributed deep learning can increase the productivity of AI practitioners and reduce time to market for training models. Hadoop can fulfill a crucial role as a unified feature store and resource management platform for distributed deep learning. Jim Dowling offers an introduction to writing distributed DL applications, covering TensorFlow and Apache Spark frameworks that make distribution easy. Read more.
14:05–14:45 Thursday, 24/05/2018
Ask Me Anything
Location: Capital Suite 14
Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
Join Dean Wampler and Boris Lublinsky to discuss all things streaming: architecture, implementation, streaming engines and frameworks, techniques for serving machine learning models in production, traditional big data systems (dying or still relevant?), and general software architecture and data systems. Read more.
14:05–14:45 Thursday, 24/05/2018
Data engineering and architecture
Location: S11A Level: Intermediate
haikal haikal (GRAKN.AI)
Average rating: ***..
(3.50, 2 ratings)
Haikal Pribadi explains why knowledge graphs (KGs) are important for AI systems in the finance sector and details how they are being used to detect and uncover new knowledge, specifically for risk analysis, fraud detection, and GDPR use cases. Read more.
14:05–14:45 Thursday, 24/05/2018 Secondary topics:  Data Platforms, Time Series and Graphs
Tony Xing (Microsoft), Bixiong Xu (Microsoft)
Average rating: **...
(2.00, 1 rating)
Tony Xing and Bixiong Xu offer an overview of Project Kensho, Microsoft's one-stop shop for business incident monitoring and automated insights. Tony and Bixiong cover the technology's evolution, the architecture, the algorithms, and the benefits and the trade-offs. Along the way, they share a case study on Bing ads key metrics monitoring and automated diagnostic insights. Read more.
14:05–14:45 Thursday, 24/05/2018
Kostas Kloudas (data Artisans)
Average rating: **...
(2.25, 4 ratings)
Complex event processing (CEP) helps detect patterns over continuous streams of data. DNA sequencing, fraud detection, shipment tracking with specific characteristics (e.g., contaminated goods), and user activity analysis fall into this category. Kostas Kloudas offers an overview of Flink's CEP library and explains the benefits of its integration with Flink. Read more.
14:05–14:45 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Beginner
Paco Nathan (derwen.ai)
Average rating: ****.
(4.50, 2 ratings)
Human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. Such systems are mostly automated, with exceptions referred to human experts, who help train the machines further. Paco Nathan offers an overview of HITL from the perspective of a business manager, focusing on use cases within O'Reilly Media. Read more.
14:05–14:45 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Beginner
Michael Li (The Data Incubator), Philipp Diesinger (Boehringer Ingelheim), Julie Shin (Citigroup)
Average rating: *****
(5.00, 1 rating)
What are the latest initiatives and use cases around data and AI? How are data and AI reshaping industries? How do we foster a culture of data and innovation within a larger enterprise? What are some of the challenges of implementing AI within the enterprise setting? Michael Li moderates a panel of experts in different industries to answer these questions and more. Read more.
14:05–14:45 Thursday, 24/05/2018
Data science and machine learning, Expo Hall
Location: Expo Hall Level: Intermediate
Secondary topics:  Financial Services, Text and Language processing and analysis
David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)
Average rating: ****.
(4.50, 4 ratings)
Spark NLP natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was not possible to date. David Talby, Saif Addin Ellafi, and Paul Parau explain how Spark NLP was used to augment the Recognos smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. Read more.

14:55

14:55–15:35 Thursday, 24/05/2018
Data engineering and architecture, Data-driven business management
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Financial Services, Managing and Deploying Machine Learning
Hope Wang (Intuit)
Average rating: ****.
(4.00, 3 ratings)
A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. Hope Wang explains how to manage various artifacts and their associations, automate deployment to support the lifecycle of a model, and build a cohesive machine learning platform. Read more.
14:55–15:35 Thursday, 24/05/2018
Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Non-technical
Secondary topics:  Security and Privacy
Kate Vang (DataKind UK), Christine Henry (DataKind UK)
Not a day goes by without reading headlines about the fear of AI or how technology seems to be dividing us more than bringing us together. DataKind UK is passionate about using machine learning and artificial intelligence for social good. Kate Vang and Christine Henry explain what socially conscious AI looks like and what DataKind is doing to make it a reality. Read more.
14:55–15:35 Thursday, 24/05/2018
Data engineering and architecture
Location: Capital Suite 2/3 Level: Intermediate
Marton Balassi (Cloudera), Mirko Kämpf (Cloudera), Jan Kunigk (Cloudera)
Average rating: *****
(5.00, 2 ratings)
Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Read more.
14:55–15:35 Thursday, 24/05/2018
David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)
Average rating: ****.
(4.60, 5 ratings)
Cox Automotive is the world’s largest automotive service organization, which means it can combine data from across the entire vehicle lifecycle. Cox is on a journey to turn this data into insights. David Asboth and Shaun McGirr share their experience building up a data science team at Cox and scaling the company's data science process from laptop to Hadoop cluster. Read more.
14:55–15:35 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Financial Services, Time Series and Graphs
Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)
Average rating: ****.
(4.00, 2 ratings)
Advancements in computing technologies and ecommerce platforms have amplified the risk of online fraud, which results in billions of dollars of loss for the financial industry. This trend has urged companies to consider AI techniques, including deep learning, for fraud detection. Francesca Lazzeri and Jaya Mathew explain how to operationalize deep learning models with Azure ML to prevent fraud. Read more.
14:55–15:35 Thursday, 24/05/2018
Data science and machine learning, Data-driven business management
Location: Capital Suite 14 Level: Intermediate
Chen Salomon (Playbuzz)
Average rating: ****.
(4.00, 1 rating)
A/B testing is the foundation of data-driven decision making. In today's world, advertising is crucial to a website's revenue, so it is even more important to measure the effects of changes correctly. Chen Salomon demonstrates how to correctly design and implement an advertisement A/B testing and shares pitfalls, potential biases related to advertisement metrics, and possible mitigations. Read more.
14:55–15:35 Thursday, 24/05/2018
Data engineering and architecture
Location: S11A Level: Intermediate
Secondary topics:  Time Series and Graphs
Jim Webber (Neo4j)
Average rating: *****
(5.00, 3 ratings)
Jim Webber details how Neo4j mixes the strongly consistent Raft protocol with async log shipping and provides a strong consistency guarantee: causal, which means you can always at least read your writes even in very large multidata center clusters. Read more.
14:55–15:35 Thursday, 24/05/2018 Secondary topics:  Data Platforms
Alvin HEIB (Cloudera), guy le roux (Atos)
Alvin Heib and Guy Leroux offer an overview of ClickFox, a platform able to cope with high-performance analytical needs, from bits and bytes to solving a customer needs, covering the platform's virtualization, big data, and analytical layers. Read more.
14:55–15:35 Thursday, 24/05/2018 Secondary topics:  Data Integration and Data Pipelines sessions
Eugene Kirpichov (Google)
Average rating: ****.
(4.50, 2 ratings)
Apache Beam offers users a novel programming model in which the classic batch-streaming dichotomy is erased and ships with a rich set of I/O connectors to popular storage systems. Eugene Kirpichov explains why Beam has made these connectors flexible and modular—a key component of which is Splittable DoFn, a novel programming model primitive that unifies data ingestion between batch and streaming. Read more.
14:55–15:35 Thursday, 24/05/2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate
Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft)
Average rating: ****.
(4.00, 3 ratings)
Michael Lanzetta and Elena Terenzi offer an overview of a collaboration between Microsoft and the Royal Holloway University that applied deep learning to locate illegal small-scale mines in Ghana using satellite imagery, scaled training using Kubernetes, and investigated the mines' impact on surrounding populations and environment. Read more.
14:55–15:35 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Secondary topics:  Data Platforms, Managing and Deploying Machine Learning
Simon Chan (Salesforce)
Average rating: ****.
(4.00, 1 rating)
The promises of AI are great, but taking the steps to implement AI within an enterprise is challenging. The secret behind enterprise AI success often traces back to the underlying platform that accelerates AI development at scale. Based on years of experience helping executives establish AI product strategies, Simon Chan helps you discover the AI platform journey that is right for your business. Read more.
14:55–15:35 Thursday, 24/05/2018
Data science and machine learning, Expo Hall
Location: Expo Hall Level: Beginner
Secondary topics:  Data Integration and Data Pipelines sessions, Data Platforms
Stamatis Stefanakos (D ONE AG)
Average rating: ****.
(4.33, 3 ratings)
Switzerland-based startup WinJi capitalizes on two current megatrends: big data and renewable energy. Stamatis Stefanakos offers an overview of WinJi's TruePower Asset Management Platform, covering the overall architecture and the motivation behind it, the physics behind the data, and the business case. Read more.

15:35

15:35–16:35 Thursday, 24/05/2018
Location: Expo Hall (Capital Hall 24)
Afternoon break (1h)

16:35

16:35–17:15 Thursday, 24/05/2018
Giuseppe D'alessio (ING Group)
Average rating: ***..
(3.25, 4 ratings)
Giuseppe D'alessio details ING's DevOps journey, covering its impact on people, processes and tools, best practices, and pitfalls. Giuseppe concludes with a concrete example of using analytics and streaming technology on real-time applications. Read more.
16:35–17:15 Thursday, 24/05/2018
Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate
Kevin Sigliano (IE Business School )
Average rating: *****
(5.00, 1 rating)
Financial and consumer ROI demands that business leaders understand the drivers and dynamics of digital transformation and big data. Kevin Sigliano explains why disrupting value propositions and continuous innovation are critical if you wish to dramatically improve the way your company engages customers and creates value and maximize financial results. Read more.
16:35–17:15 Thursday, 24/05/2018
Location: Capital Suite 12
TBC
16:35–17:15 Thursday, 24/05/2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 13 Level: Intermediate
Amit Kapoor (narrativeVIZ), Bargava Subramanian (Binaize)
Amit Kapoor and Bargava Subramanian lead three live demos of deep learning (DL) done in the browser—building explorable explanations to aid insight, building model inference applications, and rapid prototyping and training an ML model—using the emerging client-side JavaScript libraries for DL. Read more.
16:35–17:15 Thursday, 24/05/2018
Pascal Bugnion (ASI Data Science)
Jupyter widgets let you create lightweight, interactive graphical interfaces directly in Jupyter notebooks. Pascal Bugnion demonstrates how to use Jupyter widgets to implement human-in-the-loop machine learning with highly interactive user interfaces. Read more.
16:35–17:15 Thursday, 24/05/2018
Jason Bell (Independent Speaker)
Jason Bell offers an overview of a self-learning knowledge system that uses Apache Kafka and Deeplearning4j to accept data, apply training to a neural network, and output predictions. Jason covers the system design and the rationale behind it and the implications of using a streaming data with deep learning and artificial intelligence. Read more.
16:35–17:15 Thursday, 24/05/2018 Secondary topics:  Data Platforms
Naghman Waheed (Bayer Crop Science), Brian Arnold (Bayer)
Average rating: ****.
(4.50, 2 ratings)
There are a number of tools that make it easy to implement a data lake. However, most lack the essential features that prevent your data lake from turning into a data swamp. Naghman Waheed and Brian Arnold offer an overview of Monsanto's Data Historian platform, which can ingest, store, and access datasets without compromising ease of use, governance, or security. Read more.
16:35–17:15 Thursday, 24/05/2018
Flavio Junqueira (Dell EMC)
Stream processing is in the spotlight. Enabling low-latency insights and actions out of continuously generated data is compelling to a number of application domains, and the ability to adapt to workload variations is critical to many applications. Flavio Junqueira explores Pravega, a stream store that scales streams automatically and enables applications to scale downstream by signaling changes. Read more.
16:35–17:15 Thursday, 24/05/2018
Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 10/11 Level: Beginner
Secondary topics:  Financial Services
Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)
Average rating: *****
(5.00, 5 ratings)
One major challenge to social housing is determining how best to target interventions when tenants fall behind on rent payments. Jonathan Leslie, Maryam Qurashi, and Tom Harrison discuss a recent project in which a team of data scientist trainees helped Hackney Council devise a more efficient, targeted strategy to detect and prioritize such situations. Read more.
16:35–17:15 Thursday, 24/05/2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate
Tags: us
Average rating: *****
(5.00, 1 rating)
Quantitative measurement is the key to scaling businesses, processes, and products and making them better. It sounds easy: just pick a number and improve it. However, actually choosing a metric is an exploration of a many-dimensional space with no map and no guide. Until now. Join Ketan Gangatirkar to learn how to choose the right metrics so you can build a better product and a better business. Read more.