Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule List View Grid View

Monday, 21/05/2018

7:30

7:30–9:00 Monday, 21/05/2018

Location: Capital Suite Foyer

Coffee break (1h 30m)

9:00

Data science and machine learning with Apache Spark (SOLD OUT)

SOLD OUT

9:00–17:00 Monday, 21/05/2018

Location: Capital Suite 1

Behzad Bordbar (Cloudera)

Behzad Bordbar demonstrates how to implement typical data science workflows using Apache Spark. You'll learn how to wrangle and explore data using Spark SQL DataFrames and how to build, evaluate, and tune machine learning models using Spark MLlib. Read more.

Hands-on data science with Python

9:00–17:00 Monday, 21/05/2018

Location: Capital Suite 7

Zachary Glassman (The Data Incubator)

Zachary Glassman offers a foundation in building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline, from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets. Read more.

Real-time systems with Spark Streaming and Kafka

9:00–17:00 Monday, 21/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 16

Jesse Anderson (Big Data Institute)

Average rating:

(5.00, 1 rating)

To handle real-time big data, you need to solve two difficult problems: How do you ingest that much data, and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.

Machine learning with TensorFlow

9:00–17:00 Monday, 21/05/2018

Location: Capital Suite 17

Dana Mastropole (The Data Incubator)

The TensorFlow library enables the use of data flow graphs for numerical computations, with automatic parallelization across several CPUs or GPUs. This architecture makes it ideal for implementing neural networks and other machine learning algorithms. Dana Mastropole details TensorFlow's capabilities through its Python interface. Read more.

Data science for managers

9:00–17:00 Monday, 21/05/2018

Location: London Suite 2

Jean Innes (ASI Data Science), Matthew Ward (ASI Data Science)

Jean Innes, Matthew Ward, Emanuele Haerens, and Alli Paget lead a condensed introduction to key data science and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization. Read more.

10:30

10:30–11:00 Monday, 21/05/2018

Location: Capital Suite Foyer

Coffee break (30m)

12:30

12:30–13:30 Monday, 21/05/2018

Location: Capital Suite Foyer

Lunch (1h)

15:00

15:00–15:30 Monday, 21/05/2018

Location: Capital Suite Foyer

Afternoon break (30m)

Tuesday, 22/05/2018

7:30

7:30–9:00 Tuesday, 22/05/2018

Location: Auditorium Foyer

Coffee break sponsored by Redis Lab (1h 30m)

9:00

Data Case Studies

9:00–17:00 Tuesday, 22/05/2018

Location: Capital Suite 2/3

Dan Jeavons (Shell), Hollie Lubbock (Fjord), Jivan Virdee (Fjord), fausto morales (Arundo), Marty Cochrane (Arundo), Jane McConnell (Teradata), Paul Ibberson (Teradata), Michael Troughton (Conduce), Jonathan Genah (DHL Supply Chain), Allison Nau (Cox Automotive UK), Dave Fitch (The Data Lab), Maria Assunta Palmieri (Data Reply ), Niranjan Thomas (Dow Jones), Erik Elgersma (FrieslandCampina), Viola Melis (Typeform), carme artigas (Synergic Partners), Nuria Bombardo (Pepsico)

Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.

Findata Day

9:00–17:00 Tuesday, 22/05/2018

Location: Capital Suite 4

Paul Lashmet (Arcadia Data), Anthony Culligan (SETL), Konrad Sippel (Deutsche Börse), Paul Lynn (Nordea), Mikheil Nadareishvili (TBC Bank), Olaf Hein (ORDIX AG), Robert Passarella (Alpha Features), Louise Beaumont (Publicis Groupe | techUK | NPSO), Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Christina Erlwein-Sayer (OptiRisk Systems), Angelique Mohring (GainX), Saeed Amen (Cuemacro), Gisele Frederick (Zingr.io)

From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.

Modern real-time streaming architectures

9:00–12:30 Tuesday, 22/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8 Level: Intermediate

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Average rating:

(3.67, 3 ratings)

The need for instant data-driven insights has led the proliferation of messaging and streaming frameworks. Karthik Ramasamy, Arun Kejariwal, and Ivan Kelly walk you through state-of-the-art streaming frameworks, algorithms, and architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.

General Data Protection Regulation (GDPR) tutorial and ePrivacy introduction

9:00–12:30 Tuesday, 22/05/2018

Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 9 Level: Non-technical

Secondary topics: Security and Privacy

Aurélie Pols (Mind Your Privacy)

Average rating:

(5.00, 1 rating)

Aurélie Pols walks you through a "5+5 pillars" framework for GDPR readiness, explaining what the GDPR means to data-fueled businesses. You'll learn how to attribute responsibility to assure compliance and build toward ethical data practices, minimizing risk for your company while fostering trust with your clients. Read more.

Introduction to natural language processing with Python

9:00–12:30 Tuesday, 22/05/2018

Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 10 Level: Beginner

Secondary topics: Text and Language processing and analysis

Barbara Fusinska (Google)

Average rating:

(4.33, 3 ratings)

Natural language processing techniques help address tasks like text classification, information extraction, and content generation. Barbara Fusinska offers an overview of natural language processing and walks you through building a bag-of-words representation, using Python and its machine learning libraries, and then using it for text classification. Read more.

Serverless machine learning with TensorFlow

9:00–17:00 Tuesday, 22/05/2018

Big data and data science in the cloud
Location: Capital Suite 11 Level: Intermediate

Carl Osipov (Google)

Carl Osipov walks you through building a complete machine learning pipeline from ingest, exploration, training, and evaluation to deployment and prediction. Read more.

Measure what matters: How your measurement strategy can reduce opex

9:00–12:30 Tuesday, 22/05/2018

Data-driven business management, Strata Business Summit
Location: Capital Suite 12 Level: Non-technical

Secondary topics: Visualization, Design, and UX

Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product)

Average rating:

(4.00, 2 ratings)

These days it’s easy for companies to say, "We measure everything!” The problem is, most popular metrics may not be appropriate or relevant for your business. Measurement isn’t free and should be done strategically. Radhika Dutt, Geordie Kaytes, and Nidhi Aggarwal explain how to align measurement with your product strategy so you can measure what matters for your business. Read more.

Running data analytic workloads in the cloud

9:00–12:30 Tuesday, 22/05/2018

Data engineering and architecture
Location: Capital Suite 13 Level: Intermediate

Eugene Fratkin (Cloudera), Vinithra Varadharajan (Cloudera), Mael Ropars (Cloudera), Jason Wang (Cloudera)

Average rating:

(5.00, 1 rating)

Vinithra Varadharajan, Jason Wang, Eugene Fratkin, and Mael Ropars detail new paradigms to effectively run production-level pipelines with minimal operational overhead. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control. Read more.

Architecting a data platform for enterprise use

9:00–12:30 Tuesday, 22/05/2018

Data engineering and architecture
Location: Capital Suite 14 Level: Intermediate

Secondary topics: Data Platforms

Mark Madsen (Teradata), Todd Walter (Archimedata)

Average rating:

(4.29, 7 ratings)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.

Leveraging Spark and deep learning frameworks to understand data at scale

9:00–12:30 Tuesday, 22/05/2018

Data science and machine learning
Location: Capital Suite 15 Level: Intermediate

Vartika Singh (Cloudera), Juan Yu (Cloudera), Marton Balassi (Cloudera), Steven Totman (Cloudera)

Average rating:

(3.75, 4 ratings)

Vartika Singh, Marton Balassi, Steven Totman, and Juan Yu outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.

10:30

10:30–11:00 Tuesday, 22/05/2018

Location: Capital Suite Foyer

Morning break (30m)

12:30

12:30–13:30 Tuesday, 22/05/2018

Location: N11

Lunch sponsored by IBM (1h)

13:30

Kafka streaming microservices with Akka Streams and Kafka Streams

13:30–17:00 Tuesday, 22/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8 Level: Intermediate

Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

Average rating:

(3.25, 4 ratings)

Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Along the way, Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to chose them instead. Read more.

Making data visual: A practical session on using visualization for insight

13:30–17:00 Tuesday, 22/05/2018

Visualization and user experience
Location: Capital Suite 9 Level: Non-technical

Secondary topics: Visualization, Design, and UX

Danyel Fisher (Honeycomb.io), Miriah Meyer (University of Utah)

Average rating:

(4.00, 4 ratings)

Danyel Fisher and Miriah Meyer explore the human side of data analysis and visualization, covering operationalization, the process of reducing vague problems to specific tasks, and how to choose a visual representation that addresses those tasks. Along the way, they also discuss single views and explain how to link them into multiple views. Read more.

13:30–17:00 Tuesday, 22/05/2018

Location: Capital Suite 10

TBC

Architecting a next-generation data platform

SOLD OUT

13:30–17:00 Tuesday, 22/05/2018

Data engineering and architecture
Location: Capital Suite 12 Level: Advanced

Secondary topics: Data Platforms

Ted Malaska (Capital One), Jonathan Seidman (Cloudera)

Average rating:

(4.33, 3 ratings)

Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics. Read more.

Natural language understanding at scale with spaCy and Spark NLP

13:30–17:00 Tuesday, 22/05/2018

Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

Secondary topics: Text and Language processing and analysis

David Talby (Pacific AI), Claudiu Branzan (Accenture)

Average rating:

(4.33, 3 ratings)

Natural language processing is a key component in many data science systems. David Talby and Claudiu Branzan lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.

Managing data science in the enterprise

13:30–17:00 Tuesday, 22/05/2018

Strata Business Summit
Location: Capital Suite 14 Level: Intermediate

Dan Enthoven (Domino Data Lab)

The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprise's KPIs. Dan Enthoven outlines a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.

Securing and governing hybrid, cloud, and on-premises big data deployments, step by step

13:30–17:00 Tuesday, 22/05/2018

Law, ethics, and governance, Platform security and cybersecurity
Location: Capital Suite 15 Level: Intermediate

Secondary topics: Security and Privacy

Mark Donsky (Okera), Steffen Maerkl (Cloudera), Andre Araujo (Cloudera)

Hybrid big data deployments present significant new security risks. Security admins must ensure a consistently secured and governed experience for end users and administrators across multiple workloads. Mark Donsky, Steffen Maerkl, and André Araujo share best practices for meeting these challenges as they walk you through securing a Hadoop cluster. Read more.

15:00

15:00–15:30 Tuesday, 22/05/2018

Location: Capital Suite Foyer

Afternoon break (30m)

17:00

Opening Reception

17:00–18:00 Tuesday, 22/05/2018

Location: Expo Hall (Capital Hall 24)

Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors. Read more.

19:00

Strata Dine-Around

19:00–21:00 Tuesday, 22/05/2018

Location: Various locations

Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine London has to offer. Read more.

Wednesday, 23/05/2018

8:15

Speed Networking

8:15–8:45 Wednesday, 23/05/2018

Location: Auditorium Foyer

Average rating:

(5.00, 1 rating)

Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

8:45

8:45–9:00 Wednesday, 23/05/2018

Location: Auditorium Foyer

Coffee break sponsored by Confluent (7:30 - 9:00) (15m)

9:00

Wednesday opening welcome

9:00–9:05 Wednesday, 23/05/2018

Location: Auditorium

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Average rating:

(1.00, 1 rating)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:05

Charting a data journey to the cloud

9:05–9:20 Wednesday, 23/05/2018

Location: Auditorium

Mick Hollison (Cloudera), Sven Loeffler (Deutsche Telekom), Robert Neumann (Ultra Tendency)

Average rating:

(2.65, 20 ratings)

What happens when you combine near-limitless data with on-demand access to powerful analytics and compute? For Deutsche Telekom, the results have been transformative. Mick Hollison, Sven Löffler, and Robert Neumann explain how Deutsche Telekom is harnessing machine learning and analytics in the cloud to build Europe’s largest and most powerful IoT data marketplace. Read more.

9:20

Journey to GDPR compliance

9:20–9:35 Wednesday, 23/05/2018

Location: Auditorium

Alison Howard (Microsoft)

Average rating:

(2.62, 16 ratings)

May 25, the day the GDPR goes into effect, is an important milestone for data protection in the EU and elsewhere, but the journey to GDPR compliance neither begins nor ends there. Alison Howard explains how Microsoft, one of the world’s largest companies, with operations across the EU and around the globe, has prepared for May 25 and beyond. Read more.

9:35

Humans and the machine: Machine learning in context (sponsored by IBM)

9:35–9:45 Wednesday, 23/05/2018

Location: Auditorium

JEAN FRANCOIS PUGET (IBM Analytics)

Average rating:

(3.50, 16 ratings)

On the way to active analytics for business, we have to answer two big questions: What must happen to data before running machine learning algorithms, and how should machine learning output be used to generate actual business value? Jean-François Puget demonstrates the vital role of human context in answering those questions. Read more.

9:45

Building a stronger data ecosystem

9:45–9:55 Wednesday, 23/05/2018

Location: Auditorium Level: Non-technical

Ben Lorica (O'Reilly)

Average rating:

(2.91, 11 ratings)

To enable the machine learning applications of the future, there remain many interesting and challenging data problems we need to tackle as a community. Ben Lorica discusses some of the pressing problems we're facing as we collect and store data, particularly in an era when our machine learning models require huge amounts of labeled data. Read more.

9:55

The Paradise Papers: Behind the scenes with the ICIJ

9:55–10:10 Wednesday, 23/05/2018

Location: Auditorium

Pierre Romera (International Consortium of Investigative Journalists (ICIJ))

Average rating:

(4.73, 26 ratings)

Last November, the International Consortium of Investigative Journalists (ICIJ) published the Paradise Papers, a yearlong investigation on the offshore dealings of multinational companies and the wealthy. Pierre Romera offers a behind-the-scenes look into the process and explores the challenges in handling 1.4 TB of data and making it available securely to journalists all over the world. Read more.

10:15

Data protection and innovation

10:15–10:30 Wednesday, 23/05/2018

Location: Auditorium

Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel)

Average rating:

(3.19, 21 ratings)

Keynote with Eva Kaili Read more.

10:45

10:45–11:15 Wednesday, 23/05/2018

Location: Expo Hall (Capital Hall 24)

Morning break (30m)

11:15

Architecting data platforms for cybersecurity

11:15–11:55 Wednesday, 23/05/2018

Data engineering and architecture
Location: Capital Suite 7 Level: Intermediate

Secondary topics: Security and Privacy

Charaka Goonatilake (Panaseer)

Average rating:

(4.50, 2 ratings)

Data is becoming a crucial weapon to secure an organization against cyber threats. Charaka Goonatilake shares strategies for designing effective data platforms for cybersecurity using big data technologies, such as Spark and Hadoop, and explains how these platforms are being used in real-world examples of data-driven security. Read more.

Executive Briefing: GDPR—Getting your data ready for heavy, new EU privacy regulations

11:15–11:55 Wednesday, 23/05/2018

Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate

Secondary topics: Financial Services, Security and Privacy

Mark Donsky (Okera), Syed Rafice (Cloudera)

Average rating:

(4.00, 1 rating)

In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Syed Rafice outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.

Putting AI to work for business: It's a journey. (sponsored by IBM)

11:15–11:55 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 2/3

CARLO APPUGLIESE (IBM)

Average rating:

(4.50, 2 ratings)

What was once science fiction has now become reality as multiple AI consumer-based solutions have hit the market over last few years. In turn, consumers have become more comfortable interacting with AI. But has AI really lived up to the hype? For consumers, perhaps not yet. However, AI for business is a different (and more valuable) animal. Carlo Appugliese details how business can put AI to work. Read more.

Enabling data-driven development for autonomous driving at BMW (sponsored by BMW)

11:15–11:55 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 4

Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG)

Average rating:

(5.00, 4 ratings)

The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry. Miha Pelko and Aleksandr Melkonyan outline these challenges and explain how BMW is overcoming them by adapting and reinventing existing big data solutions for autonomous driving. Read more.

How will the GDPR impact machine learning?

11:15–11:55 Wednesday, 23/05/2018

Data science and machine learning, Law, ethics, and governance
Location: Capital Suite 12 Level: Non-technical

Secondary topics: Security and Privacy

Steven Touw (Immuta)

Average rating:

(4.25, 4 ratings)

The Strata Data conference in London takes place during one of the most important weeks in the history of data regulation, as GDPR begins to be enforced. Steve Touw explores the effects of the GDPR on deploying machine learning models in the EU. Read more.

Distributed training of deep learning models

11:15–11:55 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 13 Level: Advanced

Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)

Average rating:

(4.00, 5 ratings)

Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud, using a ResNet network trained on the ImageNet dataset as an example. You'll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for. Read more.

Finding bias in social media recommendations

11:15–11:55 Wednesday, 23/05/2018

Data science and machine learning, Law, ethics, and governance
Location: Capital Suite 14

Secondary topics: Media, Advertising, Entertainment, Security and Privacy

Guillaume Chaslot (AlgoTransparency)

Average rating:

(4.17, 6 ratings)

An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines. Read more.

The cloud is expensive, so build your own redundant Hadoop clusters.

11:15–11:55 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Intermediate

Stuart Pook (Criteo)

Average rating:

(4.40, 5 ratings)

Criteo has a production cluster of 2K nodes running over 300K jobs a day in the company's own data centers. These clusters were meant to provide a redundant solution to Criteo's storage and compute needs. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo's progress in building another cluster to survive the loss of a full DC. Read more.

Web analytics at scale with Druid at Naver

11:15–11:55 Wednesday, 23/05/2018

Data engineering and architecture
Location: S11B Level: Intermediate

Secondary topics: Data Platforms, Media, Advertising, Entertainment

Jason Heo (Naver), Dooyong Kim (Navercorp)

Average rating:

(3.00, 1 rating)

Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid. Read more.

Processing fast data with Apache Spark: A tale of two APIs

11:15–11:55 Wednesday, 23/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Gerard Maas (Lightbend)

Average rating:

(4.00, 13 ratings)

Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences in key aspects of a streaming application, from the API user experience to dealing with time and with state and machine learning capabilities, and shares practical guidance on picking one or combining both to implement resilient streaming pipelines. Read more.

Data science survival and growth within the corporate jungle: An easyJet case study

11:15–11:55 Wednesday, 23/05/2018

Data science and machine learning, Data-driven business management
Location: Capital Suite 10/11 Level: Beginner

Secondary topics: Transportation and Logistics

Alberto Rey Villaverde (easyJet), Grigorios Mingas (easyJet)

Average rating:

(4.45, 11 ratings)

Because in-house data science teams work with a range of business functions, traditional data science processes are often too abstract to cope with the complexity of these environments. Alberto Rey Villaverde and Grigorios Mingas share case studies from easyJet that highlight some unpredictable hurdles related to requirements, data, infrastructure, and deployment and explain how they solved them. Read more.

Leveraging public-private partnerships using data analytics for economic insights

11:15–11:55 Wednesday, 23/05/2018

Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical

Secondary topics: Financial Services

Audrey Lobo-Pulo (Phoensight), Nicholas O'Donnell (LinkedIn)

In October 2017, LinkedIn and the Australian Treasury teamed up to gain a deeper understanding of the Australian labor market through new data insights, which may inform economic policy and directly benefit society. Audrey Lobo-Pulo and Nick O'Donnell share some of the discoveries from this collaboration as well as the practicalities of working in a public-private partnership. Read more.

Revolutionizing the newsroom with artificial intelligence

11:15–11:55 Wednesday, 23/05/2018

Data science and machine learning, Data-driven business management, Emerging technologies and case studies, Expo Hall
Location: Expo Hall Level: Beginner

Secondary topics: Media, Advertising, Entertainment

Dan Gilbert (News UK), Jonathan Leslie (Pivigo)

Average rating:

(3.75, 4 ratings)

In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task. Read more.

12:05

Hadoop under attack: Securing data in a banking domain

12:05–12:45 Wednesday, 23/05/2018

Data engineering and architecture, Law, ethics, and governance, Platform security and cybersecurity
Location: Capital Suite 7 Level: Intermediate

Secondary topics: Security and Privacy

Federico Leven (ReactoData)

Average rating:

(2.67, 3 ratings)

The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, particularly in the Cloudera stack. Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented. Read more.

Executive Briefing: Becoming a data-driven enterprise—A maturity model

12:05–12:45 Wednesday, 23/05/2018

Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Non-technical

Teresa Tung (Accenture), Jean-Luc Chatelain (Accenture)

Average rating:

(3.12, 8 ratings)

A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung and Jean-Luc Chatelain explain how to create a data-driven enterprise maturity model that spans technology and business requirements and walk you through use cases that bring the model to life. Read more.

Cloud-native data science with Anaconda, Docker, and Kubernetes (sponsored by Anaconda)

12:05–12:45 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 4

Mathew Lodge (Anaconda)

Average rating:

(4.50, 4 ratings)

The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Mathew Lodge demonstrates that it's just as easy to deploy Python as it is Java, using containers and Kubernetes. Welcome to the future. Read more.

Fairness and diversity in online social systems

12:05–12:45 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 12

Secondary topics: Media, Advertising, Entertainment, Security and Privacy

Elisa Celis (EPFL)

Average rating:

(4.25, 4 ratings)

There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization. Read more.

Deep learning for recommender systems

12:05–12:45 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

Secondary topics: E-commerce and Retail, Media, Advertising, Entertainment

Nick Pentreath (IBM)

Average rating:

(4.43, 7 ratings)

In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.

Data visualization in a big data world

12:05–12:45 Wednesday, 23/05/2018

Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Intermediate

Secondary topics: Visualization, Design, and UX

Jeff Fletcher (Cloudera)

Average rating:

(4.73, 11 ratings)

As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models. Read more.

Using a global data fabric to run a mixed cloud deployment

12:05–12:45 Wednesday, 23/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: S11A Level: Beginner

Jim Scott (NVIDIA)

Average rating:

(4.00, 2 ratings)

Creating a business solution is a lot of work. Instead of building to run on a single cloud provider, it is far more cost effective to leverage the cloud as infrastructure as a service (IaaS). Jim Scott explains why a global data fabric is a requirement for running on all cloud providers simultaneously. Read more.

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

12:05–12:45 Wednesday, 23/05/2018

Data engineering and architecture, Data-driven business management
Location: S11B Level: Beginner

Secondary topics: Data Platforms, E-commerce and Retail, Transportation and Logistics

Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)

Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.

How BT delivers better broadband and TV using Spark and Kafka

12:05–12:45 Wednesday, 23/05/2018

Data engineering and architecture
Location: Capital Suite 8/9 Level: Beginner

Secondary topics: Telecom

Phillip Radley (BT)

Average rating:

(3.67, 3 ratings)

In the past year, British Telecom has added a streaming network analytics use case to its multitenant data platform. Phillip Radley demonstrates how the solution works and explains how it delivers better broadband and TV services, using Kafka and Spark on YARN and HDFS encryption. Read more.

Risk-sharing pools: Winning zero-sum games through machine learning

12:05–12:45 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate

Secondary topics: Financial Services

Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)

Average rating:

(3.33, 3 ratings)

Risk-sharing pools allow insurers to get rid of risks they are forced to insure in highly regulated markets. Insurers thus cede both the risk and its premium. But are they ceding the right risk or simply giving up premium? Baiju Devani and Étienne Chassé St-Laurent share an applied machine learning approach that leverages an ensemble of models to gain a distinctive market advantage. Read more.

The app trap: Why every mobile app and mobile operator needs anomaly detection

12:05–12:45 Wednesday, 23/05/2018

Data-driven business management, Strata Business Summit, Streaming systems and real-time applications
Location: Capital Suite 15/16 Level: Intermediate

Secondary topics: Telecom, Time Series and Graphs

Ira Cohen (Anodot)

The mobile world has so many moving parts that a simple change to one element can cause havoc somewhere else, resulting in issues that annoy users and cause revenue leaks. Ira Cohen outlines ways to use anomaly detection to track everything mobile, from the service and roaming to specific apps, to fully optimize your mobile offerings. Read more.

Interpretable AI: Can we trust machine learning?

12:05–12:45 Wednesday, 23/05/2018

Data science and machine learning, Expo Hall
Location: Expo Hall Level: Intermediate

Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)

Konstantinos Georgatzis and Martha Imprialou explain how to interpret the predictions given by your black-box model and how machine learning is helping to drive decision making today. Read more.

12:45

Wednesday Topic Tables at lunch

12:45–14:05 Wednesday, 23/05/2018

Location: Expo Hall (Capital Hall 24)

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

Women's Networking Lunch

12:45–14:05 Wednesday, 23/05/2018

Location: S11A

Average rating:

(5.00, 1 rating)

If you’re looking to find like minds and make new professional connections, come to the Women's Networking Lunch on Wednesday. Read more.

Wednesday Business Summit Lunch

12:45–14:05 Wednesday, 23/05/2018

Location: Expo Hall - SBS lunch (Capital Hall 24)

Average rating:

(3.00, 2 ratings)

Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.

14:05

GPU-accelerated threat detection with GOAI

14:05–14:45 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture, Data-driven business management, Emerging technologies and case studies, Platform security and cybersecurity, Streaming systems and real-time applications
Location: Capital Suite 7 Level: Intermediate

Secondary topics: Security and Privacy

Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)

Joshua Patterson and Mike Wendt explain how NVIDIA used GPU-accelerated open source technologies to improve its cyberdefense platforms by leveraging software from the GPU Open Analytics Initiative (GOAI) and how the company accelerated anomaly detection with more efficient machine learning models, faster deployment, and more granular data exploration. Read more.

Executive Briefing: Lessons learned managing data science projects—Adopting a team data science process

14:05–14:45 Wednesday, 23/05/2018

Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Beginner

Danielle Dean (iRobot)

Average rating:

(4.80, 5 ratings)

Danielle Dean covers the basics of managing data science projects, including the data science lifecycle, and offers an overview of an internal approach at Microsoft called the Team Data Science Process (TDSP). Join in to learn more about the typical priorities of data science teams and the keys to success on engaging and creating value with data science. Read more.

A tale of two BI standards: Data warehouses and data lakes (sponsored by Arcadia Data)

14:05–14:45 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 2/3

Randy Lea (Arcadia Data)

Average rating:

(3.62, 8 ratings)

Business intelligence (BI) and analytics on data lakes have had limited success. Data lakes often fall short because they are mostly used by data scientists and not by business users. Randy Lea explains why existing BI tools work well for data warehouses but not data lakes and why modern BI tools designed for data lakes should represent the second BI standard in enterprises today. Read more.

Operationalizing live data to benefit business (sponsored by WANdisco)

14:05–14:45 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 4

Steve Kilgore (WANdisco)

Today, every company is a data company. Business success depends on putting large volumes of live data to work to drive competitive advantage. Paul Phillips details how some of the world’s largest companies have achieved 100% uptime while moving massive live datasets and halving their hardware requirements. Read more.

StreamDM: Advanced data science with Spark Streaming

14:05–14:45 Wednesday, 23/05/2018

Data science and machine learning, Streaming systems and real-time applications
Location: Capital Suite 12 Level: Intermediate

Secondary topics: Telecom, Time Series and Graphs

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

Average rating:

(3.00, 1 rating)

Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noah’s Ark Lab and Télécom ParisTech. Read more.

Real-time deep learning on video streams

14:05–14:45 Wednesday, 23/05/2018

Data science and machine learning, Streaming systems and real-time applications
Location: Capital Suite 13 Level: Intermediate

Secondary topics: Security and Privacy

eran avidan (Intel)

Average rating:

(4.50, 2 ratings)

Deep learning is revolutionizing many domains within computer vision, but doing real-time analysis is challenging. Eran Avidan offers an overview of a novel architecture based on Redis, Docker, and TensorFlow that enables real-time analysis of high-resolution streaming video. Read more.

Designing ethical artificial intelligence

14:05–14:45 Wednesday, 23/05/2018

Data science and machine learning, Data-driven business management, Law, ethics, and governance
Location: Capital Suite 14 Level: Non-technical

Jivan Virdee (Fjord), Hollie Lubbock (Fjord)

Average rating:

(5.00, 2 ratings)

Artificial intelligence systems are powerful agents of change in our society, but as this technology becomes increasingly prevalent—transforming our understanding of ourselves and our society—issues around ethics and regulation will arise. Jivan Virdee and Hollie Lubbock explore how to address fairness, accountability, and the long-term effects on our society when designing with data. Read more.

Analytics in the cloud: Building a modern cloud-based big data warehouse

14:05–14:45 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Intermediate

Greg Rahn (Cloudera)

Average rating:

(3.29, 7 ratings)

For many organizations, the next big data warehouse will be in the cloud. Greg Rahn shares considerations for evaluating the cloud for analytics and big data warehousing, including different architectural approaches to optimize price and performance. Read more.

Audi's journey to an enterprise big data platform

14:05–14:45 Wednesday, 23/05/2018

Data engineering and architecture
Location: S11B Level: Intermediate

Secondary topics: Data Platforms, Transportation and Logistics

Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)

Average rating:

(4.33, 3 ratings)

Carsten Herbe and Matthias Graunitz detail Audi's journey from a Hadoop proof of concept to a multitenant enterprise platform, sharing lessons learned, the decisions Audi made, and how a number of use cases are implemented using the platform. Read more.

Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka

14:05–14:45 Wednesday, 23/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Beginner

Michael Noll (Confluent)

Average rating:

(4.67, 6 ratings)

Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. Read more.

Building a healthcare decision support system for ICD10/HCC coding through deep learning

14:05–14:45 Wednesday, 23/05/2018

Data science and machine learning, Data-driven business management, Emerging technologies and case studies
Location: Capital Suite 10/11 Level: Intermediate

Manas Ranjan Kar (Episource)

Average rating:

(3.00, 3 ratings)

Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work. Read more.

Solving data cleaning and unification using human-guided machine learning

14:05–14:45 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 15/16 Level: Intermediate

Secondary topics: Data Integration and Data Pipelines sessions

Ihab Ilyas (University of Waterloo)

Average rating:

(4.40, 5 ratings)

Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution. Read more.

Time for a new relation: Going from RDBMS to a graph database

14:05–14:45 Wednesday, 23/05/2018

Data engineering and architecture, Expo Hall
Location: Expo Hall Level: Intermediate

Secondary topics: Time Series and Graphs

Tags:

Patrick McFadin (DataStax)

Average rating:

(5.00, 2 ratings)

Graph databases are becoming mainstream. Patrick McFadin explains how to use the knowledge you have gained from your years of working with relational databases in this brave new world. There are many similarities but also some significant differences that can open up completely new use cases. If you're deciding whether to take the plunge into graph databases, this is the talk for you. Read more.

14:55

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

14:55–15:35 Wednesday, 23/05/2018

Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications
Location: Capital Suite 7 Level: Intermediate

Lee Blum (Verint Systems)

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. Read more.

Executive Briefing: What you need to know about fast data

14:55–15:35 Wednesday, 23/05/2018

Data-driven business management, Executive Briefing, Strata Business Summit, Streaming systems and real-time applications
Location: Capital Suite 17 Level: Beginner

Dean Wampler (Anyscale)

Average rating:

(4.00, 2 ratings)

Streaming data systems, so called fast data, promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler outlines what you need to know to exploit fast data successfully. Read more.

The IoT and AI for good (sponsored by Hitachi Vantara)

14:55–15:35 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 2/3

Wael Elrifai (Hitachi Vantara)

Wael Elrifai shares his experiences working in the IoT and AI spaces, covering complexities, pitfalls, and opportunities to explain why innovation isn’t just good for business—it's a societal imperative. Read more.

Incorporating data sources inside and outside of the data center (sponsored by Cisco)

14:55–15:35 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 4

Chiang Yang (Cisco)

Han Yang explains how Cisco is leveraging big data and analytics and details how the company is helping customers to incorporate data sources from the internet of things and deploy machine learning at the edge and at the enterprise. Read more.

Machine learning for time series: What works and what doesn't

14:55–15:35 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 12 Level: Intermediate

Secondary topics: E-commerce and Retail, Financial Services, Time Series and Graphs

Mikio Braun (Zalando)

Average rating:

(4.40, 15 ratings)

Time series data has many applications in industry, in particular predicting the future based on historical data. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn't. Read more.

Deep computer vision for manufacturing

14:55–15:35 Wednesday, 23/05/2018

Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 13 Level: Beginner

Aurélien Géron (Kiwisoft)

Average rating:

(3.67, 3 ratings)

Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact on manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way. Read more.

The business leader’s guide to designing indispensable analytics solutions and data products

14:55–15:35 Wednesday, 23/05/2018

Data science and machine learning, Data-driven business management, Visualization and user experience
Location: Capital Suite 14 Level: Beginner

Secondary topics: Visualization, Design, and UX

Brian O'Neill (Designing for Analytics)

Average rating:

(4.00, 2 ratings)

Gartner says 85%+ of big data projects will fail. Your own company may have even spent millions on a recent project that isn’t really delivering the value or UX everyone hoped for. Brian O'Neill explains why CDOs, PMs, and business leaders who leverage design to prioritize utility, usability, and customer value will realize the best ROIs and demonstrates how to start evaluating your UX. Read more.

Data science across data sources with Apache Arrow

14:55–15:35 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Intermediate

Tomer Shiran (Dremio)

Average rating:

(3.50, 2 ratings)

It's often impractical for organizations to physically consolidate all data into one system. Tomer Shiran offers an overview of Apache Arrow, an open source columnar, in-memory data representation that enables analytical systems and data sources to exchange and process data in real time, simplifying and accelerating data access without having to copy all data into one location. Read more.

Elastic map matching using Cloudera Altus and Apache Spark

14:55–15:35 Wednesday, 23/05/2018

Data engineering and architecture
Location: S11B Level: Beginner

Secondary topics: Transportation and Logistics

Timo Graen (Volkswagen AG ), Robert Neumann (Ultra Tendency)

Average rating:

(3.50, 2 ratings)

Map-matching applications exist in almost every telematics use case and are therefore crucial to all car manufacturers. Timo Graen and Robert Neumann detail the architecture behind Volkswagen Commercial Vehicle’s Altus-based map-matching application and lead a live demo featuring a map matching job in Altus. Read more.

Multi-data center and multitenant durable messaging with Apache Pulsar

14:55–15:35 Wednesday, 23/05/2018

Data engineering and architecture, Law, ethics, and governance, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Ivan Kelly (Streamlio)

Average rating:

(3.00, 2 ratings)

Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access. Read more.

DataOps: Nine steps to transform your data science impact

14:55–15:35 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 10/11

Harvinder Atwal (Moneysupermarket)

Average rating:

(5.00, 4 ratings)

Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, and shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, and more. Read more.

Successful data cultures: Inclusivity, empathy, retention, and results

14:55–15:35 Wednesday, 23/05/2018

Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical

Kim Nilsson (Pivigo), Phil Harvey (Microsoft)

Average rating:

(4.67, 9 ratings)

Our lives are being transformed by data, changing our understanding of work, play, and health. Every organization can take advantage of this resource, but something is holding us back: us. Kim Nilsson and Phil Harvey explain how to build a successful data culture that embeds data at the heart of every organization through people and delivers success through empathy, communication, and humanity. Read more.

Machine-learned model quality monitoring in fast data and streaming applications

14:55–15:35 Wednesday, 23/05/2018

Data science and machine learning, Expo Hall, Streaming systems and real-time applications
Location: Expo Hall Level: Intermediate

Secondary topics: Managing and Deploying Machine Learning

Emre Velipasaoglu (Lightbend)

Average rating:

(3.67, 3 ratings)

Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications. Read more.

15:35

15:35–16:35 Wednesday, 23/05/2018

Location: Expo Hall (Capital Hall 24)

Afternoon break sponsored by Airbus (1h)

16:35

How to protect big data in a containerized environment

16:35–17:15 Wednesday, 23/05/2018

Data engineering and architecture, Platform security and cybersecurity
Location: Capital Suite 7 Level: Non-technical

Secondary topics: Security and Privacy

Thomas Phelan (HPE BlueData)

Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE), but TDE can be difficult to configure and manage—issues that are only compounded when running on Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them. Read more.

Executive Briefing: BI on big data

16:35–17:15 Wednesday, 23/05/2018

Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Beginner

Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)

Average rating:

(4.33, 6 ratings)

If your goal is to provide data to an analyst rather than a data scientist, what’s the best way to deliver analytics? There are 70+ BI tools in the market and a dozen or more SQL- or OLAP-on-Hadoop open source projects. Mark Madsen and Shant Hovsepian discuss the trade-offs between a number of architectures that provide self-service access to data. Read more.

The eAGLE accelerator: How to speed up migrations from legacy ETL to big data implementations

16:35–17:15 Wednesday, 23/05/2018

Data engineering and architecture
Location: Capital Suite 2/3 Level: Intermediate

Enric Biosca Trias (everis), Angel Valencia (everis)

Average rating:

(2.00, 2 ratings)

Enric Biosca offers an overview of the eAGLE accelerator, which speeds up migration processes from legacy ETL to big data implementations by enabling auditing, lineage, and translation of legacy code for big data. Along the way, Enric demonstrates how graph and automatic translation technologies help companies reduce their migration times. Read more.

Fortune 100 lessons: Architecting data lakes for real-time analytics and AI (sponsored by Attunity)

16:35–17:15 Wednesday, 23/05/2018

Sponsored
Location: Capital Suite 4

Ted Orme (Attunity)

Average rating:

(4.00, 3 ratings)

Modern analytics and AI initiatives require an adaptable data lake with a multistage architectural design to effectively ingest, stage, and provision specific datasets in real time. Ted Orme discusses his experience at Attunity creating a real-time data integration solution for Fortune 100 organizations and shares best practices and lessons learned along the way. Read more.

Correlation analysis on live data streams

16:35–17:15 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 12 Level: Intermediate

Secondary topics: Time Series and Graphs

Arun Kejariwal (Independent), Francois Orsini (MZ)

Average rating:

(3.14, 7 ratings)

The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.

Using Siamese CNNs for removing duplicate entries from real estate listing databases

16:35–17:15 Wednesday, 23/05/2018

Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

Secondary topics: Data Integration and Data Pipelines sessions, Media, Advertising, Entertainment

Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)

Average rating:

(4.00, 1 rating)

Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Sergey Ermolin and Olga Ermolin detail an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages a transfer learning Siamese architecture based on VGG-16 CNN topology. Read more.

Architectural design for interactive visualization

16:35–17:15 Wednesday, 23/05/2018

Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Beginner

Secondary topics: Visualization, Design, and UX

Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)

Average rating:

(5.00, 1 rating)

Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity. Read more.

Making stateless containers reliable and available even with stateful applications

16:35–17:15 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Intermediate

Paul Curtis (Weaveworks)

Average rating:

(4.00, 2 ratings)

The flexibility advantage conferred by containers depends on their ephemeral nature, so it’s useful to keep containers stateless. However, many applications require state—access to a scalable persistence layer that supports real mutable files, tables, and streams. Paul Curtis demonstrates how to make containerized applications reliable, available, and performant, even with stateful applications. Read more.

Improving DevOps and QA efficiency using machine learning and NLP methods

16:35–17:15 Wednesday, 23/05/2018

Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Secondary topics: Text and Language processing and analysis

Ran Taig (Dell), Omer Sagi (Dell)

Average rating:

(2.00, 1 rating)

DevOps and QA engineers spend a significant amount of time investigating reoccurring issues. These issues are often represented by large configuration and log files, so the process of investigating whether two issues are duplicates can be a very tedious task. Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues. Read more.

Kafka in jail: Running Kafka in container-orchestrated clusters

16:35–17:15 Wednesday, 23/05/2018

Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Sean Glover (Lightbend)

Average rating:

(2.50, 2 ratings)

Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters. Read more.

Narrative extraction: Analyzing the world’s narratives through natural language understanding

16:35–17:15 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 10/11 Level: Non-technical

Secondary topics: Text and Language processing and analysis

Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)

Average rating:

(3.67, 3 ratings)

Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication. Read more.

Data Collaboratives

16:35–17:15 Wednesday, 23/05/2018

Data-driven business management, Emerging technologies and case studies, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate

Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)

Average rating:

(5.00, 2 ratings)

Jude McCorry and Mahmood Adil offer an overview of Data Collaboratives, a new form of collaboration beyond the public-private partnership model, in which participants from different sectors exchange data, skills, leadership, and knowledge to solve complex problems facing children in Scotland and worldwide. Read more.

Data-driven ecosystems in the automotive industry

16:35–17:15 Wednesday, 23/05/2018

Data engineering and architecture, Expo Hall
Location: Expo Hall Level: Intermediate

Tobias Burger (BMW Group), Joshua Goerner (BMW AG)

Average rating:

(5.00, 1 rating)

The BMW Group IT team drives the usage of data-driven technologies and forms the nucleus of a data-centric culture inside of the organization. Tobias Bürger and Joshua Görner discuss the E-to-E relationship of data and models and share best practices for scaling applications in real-world environments. Read more.

17:25

Security, governance, and cloud analytics, oh my!

17:25–18:05 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture, Law, ethics, and governance, Platform security and cybersecurity
Location: Capital Suite 7 Level: Beginner

Secondary topics: Security and Privacy

Nikki Rouda (Cloudera), Nick Curcuru (Mastercard)

Average rating:

(4.00, 2 ratings)

Having so many cloud-based analytics services available is a dream come true. However, it's a nightmare to manage proper security and governance across all those different services. Nikki Rouda and Nick Curcuru share advice on how to minimize the risk and effort in protecting and managing data for multidisciplinary analytics and explain how to avoid the hassle and extra cost of siloed approaches. Read more.

Executive Briefing: Why machine-learned models crash and burn in production and what to do about it

17:25–18:05 Wednesday, 23/05/2018

Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate

Secondary topics: Managing and Deploying Machine Learning

David Talby (Pacific AI)

Average rating:

(4.00, 1 rating)

Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.

Batch and real-time processing in LINE's log analysis platform

17:25–18:05 Wednesday, 23/05/2018

Data engineering and architecture
Location: Capital Suite 2/3 Level: Beginner

Wataru Yukawa (LINE)

LINE—one of the most popular messaging applications in Asia—offers many services, such as its news application. These services sometimes depend on real-time processing. Wataru Yukawa offers an overview of LINE's web tracking system, which consists of the JavaScript SDK, NGINX Fluentd, Kafka, Elasticsearch, and Hadoop, and explains how it helps with batch and real-time processing. Read more.

Code Property Graph: A modern, queryable data storage for source code

17:25–18:05 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 12 Level: Intermediate

Secondary topics: Security and Privacy, Time Series and Graphs

Fabian Yamaguchi (ShiftLeft)

Average rating:

(4.33, 3 ratings)

Fabian Yamaguchi offers an overview of Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. Read more.

Using LSTMs to aid professional translators

17:25–18:05 Wednesday, 23/05/2018

Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 13 Level: Intermediate

Secondary topics: Text and Language processing and analysis

Darren Cook (QQ Trend)

Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English. Read more.

Democratizing data within your organization

17:25–18:05 Wednesday, 23/05/2018

Data science and machine learning
Location: Capital Suite 14 Level: Intermediate

Mark Grover (Lyft), Deepak Tiwari (Lyft)

Average rating:

(3.83, 6 ratings)

Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data. Read more.

Practical advice for driving down the cost of cloud big data platforms

17:25–18:05 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Beginner

Christopher Royles (Cloudera)

Average rating:

(4.00, 1 rating)

Big data and cloud deployments return huge benefits in flexibility and economics but can also result in runaway costs and failed projects. Drawing on his production experience, Christopher Royles shares tips and best practices for determining initial sizing, strategic planning, and longer-term operation, helping you deliver an efficient platform, reduce costs, and implement a successful project. Read more.

Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am

17:25–18:05 Wednesday, 23/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Holden Karau (Independent), Rachel Warren (Salesforce Einstein)

Average rating:

(4.00, 2 ratings)

Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads. Read more.

Stream processing for the practitioner: Blueprints for common stream processing use cases with Apache Flink

17:25–18:05 Wednesday, 23/05/2018

Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Aljoscha Krettek (Ververica)

Average rating:

(4.67, 3 ratings)

Aljoscha Krettek offers an overview of the modern stream processing space, details the challenges posed by stateful and event-time-aware stream processing, and shares core archetypes ("application blueprints”) for stream processing drawn from real-world use cases with Apache Flink. Read more.

Rent, rain, and regulations: Leveraging structure in big data to predict criminal activity

17:25–18:05 Wednesday, 23/05/2018

Data science and machine learning, Emerging technologies and case studies, Law, ethics, and governance
Location: Capital Suite 10/11 Level: Intermediate

Jorie Koster-Hale (Dataiku)

Average rating:

(5.00, 3 ratings)

Because crime is affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future. Read more.

Blind men and elephants: What’s missing from your big data?

17:25–18:05 Wednesday, 23/05/2018

Data-driven business management, Emerging technologies and case studies, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical

Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)

Average rating:

(3.60, 5 ratings)

Big data analytics tends to focus on what is easily available, which is by and large data about what has already happened, the implicit assumption being that past behavior will predict future behavior. Organizations already possess data they aren’t exploiting. Barry Singleton and Richard Goyder explain how, with the right tools, it can be used to develop far more powerful predictive algorithms. Read more.

18:05

Expo Hall Reception

18:05–19:05 Wednesday, 23/05/2018

Location: Expo Hall (Capital Hall 24)

Average rating:

(2.00, 2 ratings)

Unwind after a long day of sessions with small bites and drinks while networking with Strata attendees, exhibitors, and sponsors. Read more.

19:05

19:05–20:00 Wednesday, 23/05/2018

Location: TBD

TBC

20:00

Data After Dark: A Night in Shoreditch (sponsored by Domino and Cloudera)

20:00–22:00 Wednesday, 23/05/2018

Location: Shoreditch

Average rating:

(4.00, 2 ratings)

Enjoy great food and drink at Data After Dark: A Night in Shoreditch. Be sure to take in the street art as you make your way between Zigfrid von Underbelly and Trapeze Bar. Read more.

Thursday, 24/05/2018

8:15

Speed Networking

8:15–8:45 Thursday, 24/05/2018

Location: Auditorium Foyer

Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

8:45

8:45–9:00 Thursday, 24/05/2018

Location: Auditorium Foyer

Coffee break sponsored by Data Artisans (8:00 - 9:00) (15m)

9:00

Thursday opening welcome

9:00–9:05 Thursday, 24/05/2018

Location: Auditorium

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Average rating:

(1.00, 1 rating)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

9:05

So, you want to be successful in the open future?

9:05–9:20 Thursday, 24/05/2018

Location: Auditorium

Louise Beaumont (Publicis Groupe | techUK | NPSO)

Average rating:

(2.93, 15 ratings)

Louise Beaumont explores the five characteristics of companies that choose to succeed. Read more.

9:20

Machine learning: Research and industry

9:20–9:35 Thursday, 24/05/2018

Location: Auditorium

Mikio Braun (Zalando)

Average rating:

(3.65, 17 ratings)

Mikio Braun has worked in both research and industry and draws on this experience to share insights on how these two areas are the same (and how they are different). He then details how deep learning might change the game again. Read more.

9:35

Moving machine learning and analytics to hyperspeed

9:35–9:45 Thursday, 24/05/2018

Location: Auditorium

Amr Awadallah (Cloudera), Ankit Tharwani (Barclays UK), Bala Chandrasekaran (Barclays)

Average rating:

(3.29, 14 ratings)

Imagine the value you could drive in your business if you could accelerate your journey to machine learning and analytics. Amr Awadallah, Ankit Tharwani, and Bala Chandrasekaran explain how Barclays has driven innovation in real-time analytics and machine learning with Apache Kudu, accelerating the time to value across multiple business initiatives, including marketing, fraud prevention, and more. Read more.

9:50

When to KISS

9:50–10:00 Thursday, 24/05/2018

Location: Auditorium

Zubin Siganporia (QED Analytics)

Average rating:

(4.24, 17 ratings)

The KISS principle tells us to "Keep it simple, stupid." As machine learning techniques become more sophisticated, the need to KISS only becomes greater. Zubin Siganporia discusses the role that simplicity plays in approaching a problem and then convincing end users to adopt data-driven solutions to their challenges. Read more.

10:00

Cloud and the golden age of data analytics (sponsored by Google Cloud)

10:00–10:10 Thursday, 24/05/2018

Location: Auditorium

Tom Grey (Google)

Average rating:

(2.86, 14 ratings)

The history of data analytics has been marked by an environment of scarcity. The way we approach data analytics is only just catching up. Tom Grey explains why we are on the cusp of a golden age of analytics and machine learning. Read more.

10:10

Out of the lab and into real life

10:10–10:25 Thursday, 24/05/2018

Location: Auditorium

Christine Foster (The Alan Turing Institute)

Average rating:

(3.31, 13 ratings)

There is a common conception that artificial intelligence will change business. But as researchers at the Alan Turing Institute (the national center for data science and AI) well know, a new algorithm alone does not change the world. Christine Foster explores how businesses and researchers can find common ground and how today’s academic papers turn into tomorrow’s data science. Read more.

10:25

The good, the bad, and the internet?

10:25–10:40 Thursday, 24/05/2018

Location: Auditorium

Martha Lane Fox (CBE)

Average rating:

(4.26, 19 ratings)

Keynote with Martha Lane Fox Read more.

10:45

10:45–11:15 Thursday, 24/05/2018

Location: Expo Hall (Capital Hall 24)

Morning break (30m)

11:15

Accelerating development velocity of production ML systems with Docker

11:15–11:55 Thursday, 24/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 7 Level: Intermediate

Secondary topics: Data Platforms, Managing and Deploying Machine Learning, Media, Advertising, Entertainment

Kinnary Jangla (Pinterest)

Average rating:

(3.00, 5 ratings)

Having trouble coordinating development of your production ML system between a team of developers? Microservices drifting and causing problems debugging? Kinnary Jangla explains how Pinterest dockerized the services powering its home feed and how it impacted the engineering productivity of its ML teams while increasing uptime and ease of deployment. Read more.

Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it

11:15–11:55 Thursday, 24/05/2018

Executive Briefing, Strata Business Summit
Location: Capital Suite 17

Mick Hollison (Cloudera)

Average rating:

(2.00, 1 rating)

Mick Hollison shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them. Read more.

Building the bridge from big data to machine learning and artificial intelligence (sponsored by Google Cloud)

11:15–11:55 Thursday, 24/05/2018

Sponsored
Location: Capital Suite 2/3

Ryan Lippert (Google Cloud)

If your company isn’t good at analytics, it’s not ready for AI. Ryan Lippert explains how the right data strategy can set you up for success in machine learning and artificial intelligence—the new ground for gaining competitive edge and creating business value. Read more.

50 reasons to learn the shell for doing data science

11:15–11:55 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 12 Level: Beginner

Jeroen Janssens (Data Science Workshops)

Average rating:

(3.00, 2 ratings)

"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.

How Captricity manages 10,000 tiny deep learning models in production

11:15–11:55 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 13 Level: Beginner

Secondary topics: Managing and Deploying Machine Learning

Ramesh Sridharan (Captricity)

Average rating:

(4.00, 1 rating)

Most uses of deep learning involve models trained with large datasets. Ramesh Sridharan explains how Captricity uses deep learning with tiny datasets at scale, training thousands of models using tens to hundreds of examples each. These models are dynamically trained using an automatic deployment framework, and carefully chosen metrics further exploit error properties of the resulting models. Read more.

Rendezvous with AI

11:15–11:55 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 14 Level: Intermediate

Secondary topics: Managing and Deploying Machine Learning

Ted Dunning (MapR, now part of HPE)

Average rating:

(5.00, 1 rating)

Ted Dunning offers an overview of the rendezvous architecture, which is geared to deal with much of the complexity involved in deploying models to production, thus allowing more time to be spent thinking and doing real data science. Ted covers the ideas behind the architecture, practical scenarios, and advantages and disadvantages of the architecture. Read more.

Improving ad hoc and production workflows at Stitch Fix

11:15–11:55 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture, Platform security and cybersecurity
Location: S11A Level: Intermediate

Secondary topics: Data Platforms, E-commerce and Retail

Neelesh Salian (Stitch Fix)

Average rating:

(1.00, 1 rating)

Neelesh Srinivas Salian offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way. Read more.

Big data, big quality: Data quality at Spotify

11:15–11:55 Thursday, 24/05/2018

Data engineering and architecture
Location: S11B Level: Intermediate

Secondary topics: Data Integration and Data Pipelines sessions, Data Platforms, Media, Advertising, Entertainment

Irene Gonzálvez (Spotify)

Average rating:

(3.88, 8 ratings)

Irene Gonzálvez shares Spotify's process for ensuring data quality, covering why and how the company became aware of its importance, the products it has developed, and future strategy. Read more.

You’re doing it wrong: How Zoomdata rearchitected streaming

11:15–11:55 Thursday, 24/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Beginner

Secondary topics: Visualization, Design, and UX

Erin Recachinas (Zoomdata)

Average rating:

(4.00, 2 ratings)

The value of real-time streaming analytics with historical data is immense. Big data application Zoomdata updates historical dashboards in real time without complex reaggregations, but streaming in the age of the IoT requires handling of data in volumes not seen in traditional feeds. Erin Recachinas explains how Zoomdata moved to a scalable microservice architecture for streaming sources. Read more.

Model parallelism in Spark ML cross-validation

11:15–11:55 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 10/11 Level: Beginner

Nick Pentreath (IBM), BRYAN CUTLER (IBM)

Average rating:

(2.50, 2 ratings)

Tuning a Spark ML model using cross-validation involves a computationally expensive search over a large parameter space. Nick Pentreath and Bryan Cutler explain how enabling Spark to evaluate models in parallel can significantly reduce the time to complete this process for large workloads and share best practices for choosing the right configuration to achieve optimal resource usage. Read more.

Using Python to analyze financial markets

11:15–11:55 Thursday, 24/05/2018

Strata Business Summit
Location: Capital Suite 15/16

Saeed Amen (Cuemacro)

Average rating:

(4.40, 5 ratings)

Saeed Amen explores Python libraries that can be used at the various stages of financial analysis, including time series analysis, visualization, structuring data, and storing market data. Read more.

Modeling time series in R

11:15–11:55 Thursday, 24/05/2018

Data science and machine learning, Expo Hall
Location: Expo Hall Level: Beginner

Secondary topics: Time Series and Graphs

Jared Lander (Lander Analytics)

Average rating:

(4.00, 2 ratings)

Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.

12:05

Deep learning with TensorFlow and Spark using GPUs and Docker containers

12:05–12:45 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: Capital Suite 7 Level: Beginner

Secondary topics: Managing and Deploying Machine Learning

Nanda Vijaydev (BlueData), Thomas Phelan (HPE BlueData)

Average rating:

(4.17, 6 ratings)

In the past, you needed a high-end proprietary stack for advanced machine learning, but today, you can use open source machine learning and deep learning algorithms available with distributed computing technologies like Apache Spark and GPUs. Nanda Vijaydev and Thomas Phelan demonstrate how to deploy a TensorFlow and Spark with NVIDIA CUDA stack on Docker containers in a multitenant environment. Read more.

Executive Briefing: Artificial intelligence—The next digital frontier?

12:05–12:45 Thursday, 24/05/2018

Executive Briefing, Strata Business Summit
Location: Capital Suite 17

Louise Herring (McKinsey & Company)

Average rating:

(5.00, 1 rating)

After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we’re still early in the cycle of adoption. Louise Herring explains where investment is going, patterns of AI adoption and value capture by enterprises, and how the value potential of AI across sectors and business functions is beginning to emerge. Read more.

Machine learning at Intuit: Five delightful use cases

12:05–12:45 Thursday, 24/05/2018

Data science and machine learning, Streaming systems and real-time applications
Location: Capital Suite 12 Level: Intermediate

Secondary topics: Financial Services

Calum Murray (Intuit)

Average rating:

(1.50, 2 ratings)

Machine learning-based applications are becoming the new norm. Calum Murray shares five use cases at Intuit that use the data of over 60 million users to create delightful experiences for customers by solving repetitive tasks, freeing them up to spend time more productively or solving very complex tasks with simplicity and elegance. Read more.

A high-performance system for deep learning inference and visual inspection

12:05–12:45 Thursday, 24/05/2018

Data science and machine learning, Streaming systems and real-time applications
Location: Capital Suite 13 Level: Intermediate

Secondary topics: Data Platforms, Managing and Deploying Machine Learning

Moty Fania (Intel)

Moty Fania explains how Intel implemented an AI inference platform to enable internal visual inspection use cases and shares lessons learned along the way. The platform is based on open source technologies and was designed for real-time streaming and online actuation. Read more.

Ask Me Anything: Architecting a data platform for enterprise use

12:05–12:45 Thursday, 24/05/2018

Ask Me Anything
Location: Capital Suite 14

Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)

Average rating:

(3.33, 6 ratings)

Join Mark Madsen and Shant Hovsepian to discuss analytics strategy and planning, data architecture, data management, and BI on big data. Read more.

Setting up a lightweight distributed caching layer using Apache Arrow

12:05–12:45 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Advanced

Jacques Nadeau (Dremio)

Average rating:

(4.00, 3 ratings)

Jacques Nadeau offers an overview of a new Apache-licensed lightweight distributed in-memory cache that allows multiple applications to consume Arrow directly using the Arrow RPC and IPC protocols. You'll explore the system design and deployment architecture, learn how data science, analytical, and custom applications can all leverage the cache simultaneously, and see a live demo. Read more.

Big data at speed

12:05–12:45 Thursday, 24/05/2018

Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Secondary topics: Transportation and Logistics

Mark Grover (Lyft), Ted Malaska (Capital One)

Average rating:

(5.00, 6 ratings)

Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed. Read more.

Autonomous ETL with materialized views

12:05–12:45 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: Capital Suite 8/9 Level: Intermediate

Secondary topics: Data Integration and Data Pipelines sessions

Adesh Rao (Qubole), Abhishek Somani (Qubole)

Average rating:

(3.00, 2 ratings)

Adesh Rao and Abhishek Somani share a framework for materialized views in SQL-on-Hadoop engines that automatically suggests, creates, uses, invalidates, and refreshes views created on top of data for optimal performance and strict correctness. Read more.

Interpretable machine learning products

12:05–12:45 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate

Secondary topics: Financial Services

Mike Lee Williams (Cloudera Fast Forward Labs)

Average rating:

(5.00, 2 ratings)

Interpretable models result in more accurate, safer, and more profitable machine learning products, but interpretability can be hard to ensure. Michael Lee Williams examines the growing business case for interpretability, explores concrete applications including churn, finance, and healthcare, and demonstrates the use of LIME, an open source, model-agnostic tool you can apply to your models today. Read more.

On the limits of decision making with artificial intelligence

12:05–12:45 Thursday, 24/05/2018

Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate

Martin Goodson (Evolution AI)

Average rating:

(4.25, 4 ratings)

How can AI become part of our business processes? Should we entrust critical decisions to completely autonomous systems? Drawing on projects from businesses and UK government agencies, Martin Goodson explains how to increase confidence in AI systems and manage the transition to an AI-driven organization. Read more.

A heretical monitoring view: Using PostgreSQL to store Prometheus metrics and visualizing them in Grafana

12:05–12:45 Thursday, 24/05/2018

Data science and machine learning, Expo Hall, Streaming systems and real-time applications, Visualization and user experience
Location: Expo Hall Level: Intermediate

Secondary topics: Time Series and Graphs

Erik Nordström (Timescale)

Erik Nordström explains how and why to use PostgreSQL as a Prometheus backend to support complex questions (and get a proper SQL interface), offers an overview of pg_prometheus, a custom Prometheus datatype, and prometheus-postgresql-adapter, a remote storage adaptor for PostgreSQL, and shares his experience with TimescaleDB, which enables PostgreSQL to scale for classic monitoring volumes. Read more.

12:45

Thursday Topic Tables at Lunch

12:45–14:05 Thursday, 24/05/2018

Location: Expo Hall (Capital Hall 24)

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

Thursday Business Summit Lunch

12:45–14:05 Thursday, 24/05/2018

Location: Expo Hall - SBS lunch (Capital Hall 24)

Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.

14:05

Continuous delivery and machine learning

14:05–14:45 Thursday, 24/05/2018

Data engineering and architecture
Location: Capital Suite 7 Level: Beginner

Secondary topics: Managing and Deploying Machine Learning

Guillaume Salou (OVH)

Average rating:

(3.00, 5 ratings)

Guillaume Salou shares OVH's approach to continuous deployment of machine learning models, which involved building a full stack of automated machine learning. Automated machine learning allows the company to rebuild models efficiently and keep models up to date with fresh data brought by its data convergence tool. Read more.

Executive Briefing: Data privacy in the age of the internet of things

14:05–14:45 Thursday, 24/05/2018

Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Beginner

Secondary topics: Security and Privacy, Telecom

Alasdair Allan (Babilim Light Industries)

The increasing ubiquity of the internet of things has put a new focus on data privacy. Big data is all very well when it's harvested quietly and stealthily, but when your things tattle on you behind your back, it's a very different matter altogether. Alasdair Allan explains why the internet of things brings with it a whole new set of big data problems that can't be ignored. Read more.

The Data Intelligence Hub: On-demand Hadoop resource provisioning in Europe’s Industrial Data Space using Cloudera Altus

14:05–14:45 Thursday, 24/05/2018

Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 2/3 Level: Intermediate

Secondary topics: Telecom

Sven Loeffler (Deutsche Telekom)

Average rating:

(2.00, 1 rating)

Sven Löffler offers an overview of the Data Intelligence Hub, T-Systems's implementation of the Fraunhofer Industrial Data Space: a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things. Read more.

The ins and outs of forecasting in a hire business

14:05–14:45 Thursday, 24/05/2018

Data science and machine learning, Data-driven business management
Location: Capital Suite 12 Level: Beginner

Kaylea Haynes (Peak )

Deciding how much stock to hold is a challenge for hire businesses. There is a fine balance between holding enough stock to fulfill hires and not holding too much stock so that overall utilization is too low to achieve the return on investment. Kaylea Haynes shares a case study on forecasting the demand for thousands of assets across multiple locations. Read more.

Scaling the AI hierarchy of needs with TensorFlow, Spark, and Hops

14:05–14:45 Thursday, 24/05/2018

Data engineering and architecture
Location: Capital Suite 13 Level: Beginner

Jim Dowling (Logical Clocks)

Average rating:

(5.00, 2 ratings)

Distributed deep learning can increase the productivity of AI practitioners and reduce time to market for training models. Hadoop can fulfill a crucial role as a unified feature store and resource management platform for distributed deep learning. Jim Dowling offers an introduction to writing distributed DL applications, covering TensorFlow and Apache Spark frameworks that make distribution easy. Read more.

Ask Me Anything: Streaming applications and architectures

14:05–14:45 Thursday, 24/05/2018

Ask Me Anything
Location: Capital Suite 14

Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

Join Dean Wampler and Boris Lublinsky to discuss all things streaming: architecture, implementation, streaming engines and frameworks, techniques for serving machine learning models in production, traditional big data systems (dying or still relevant?), and general software architecture and data systems. Read more.

Why knowledge graphs are important to finance

14:05–14:45 Thursday, 24/05/2018

Data engineering and architecture
Location: S11A Level: Intermediate

haikal haikal (GRAKN.AI)

Average rating:

(3.50, 2 ratings)

Haikal Pribadi explains why knowledge graphs (KGs) are important for AI systems in the finance sector and details how they are being used to detect and uncover new knowledge, specifically for risk analysis, fraud detection, and GDPR use cases. Read more.

Bringing AI to BI: Microsoft's road to automated business incident monitoring and diagnostics with Project Kensho

14:05–14:45 Thursday, 24/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Secondary topics: Data Platforms, Time Series and Graphs

Tony Xing (Microsoft), Bixiong Xu (Microsoft)

Average rating:

(2.00, 1 rating)

Tony Xing and Bixiong Xu offer an overview of Project Kensho, Microsoft's one-stop shop for business incident monitoring and automated insights. Tony and Bixiong cover the technology's evolution, the architecture, the algorithms, and the benefits and the trade-offs. Along the way, they share a case study on Bing ads key metrics monitoring and automated diagnostic insights. Read more.

Complex event processing with Apache Flink

14:05–14:45 Thursday, 24/05/2018

Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Kostas Kloudas (data Artisans)

Average rating:

(2.25, 4 ratings)

Complex event processing (CEP) helps detect patterns over continuous streams of data. DNA sequencing, fraud detection, shipment tracking with specific characteristics (e.g., contaminated goods), and user activity analysis fall into this category. Kostas Kloudas offers an overview of Flink's CEP library and explains the benefits of its integration with Flink. Read more.

Human in the loop: A design pattern for managing teams working with machine learning

14:05–14:45 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 10/11 Level: Beginner

Paco Nathan (derwen.ai)

Average rating:

(4.50, 2 ratings)

Human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. Such systems are mostly automated, with exceptions referred to human experts, who help train the machines further. Paco Nathan offers an overview of HITL from the perspective of a business manager, focusing on use cases within O'Reilly Media. Read more.

Data, AI, and innovation in the enterprise

14:05–14:45 Thursday, 24/05/2018

Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Beginner

Michael Li (The Data Incubator), Philipp Diesinger (Boehringer Ingelheim), Julie Shin (Citigroup)

Average rating:

(5.00, 1 rating)

What are the latest initiatives and use cases around data and AI? How are data and AI reshaping industries? How do we foster a culture of data and innovation within a larger enterprise? What are some of the challenges of implementing AI within the enterprise setting? Michael Li moderates a panel of experts in different industries to answer these questions and more. Read more.

Spark NLP in action: Intelligent, high-accuracy fact extraction from long financial documents

14:05–14:45 Thursday, 24/05/2018

Data science and machine learning, Expo Hall
Location: Expo Hall Level: Intermediate

Secondary topics: Financial Services, Text and Language processing and analysis

David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)

Average rating:

(4.50, 4 ratings)

Spark NLP natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was not possible to date. David Talby, Saif Addin Ellafi, and Paul Parau explain how Spark NLP was used to augment the Recognos smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. Read more.

14:55

Machine learning platform lifecycle management

14:55–15:35 Thursday, 24/05/2018

Data engineering and architecture, Data-driven business management
Location: Capital Suite 7 Level: Intermediate

Secondary topics: Financial Services, Managing and Deploying Machine Learning

Hope Wang (Intuit)

Average rating:

(4.00, 3 ratings)

A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. Hope Wang explains how to manage various artifacts and their associations, automate deployment to support the lifecycle of a model, and build a cohesive machine learning platform. Read more.

Executive Briefings: Killer robots and how not to do data science

14:55–15:35 Thursday, 24/05/2018

Executive Briefing, Law, ethics, and governance, Strata Business Summit
Location: Capital Suite 17 Level: Non-technical

Secondary topics: Security and Privacy

Kate Vang (DataKind UK), Christine Henry (DataKind UK)

Not a day goes by without reading headlines about the fear of AI or how technology seems to be dividing us more than bringing us together. DataKind UK is passionate about using machine learning and artificial intelligence for social good. Kate Vang and Christine Henry explain what socially conscious AI looks like and what DataKind is doing to make it a reality. Read more.

Improving computer vision models at scale

14:55–15:35 Thursday, 24/05/2018

Data engineering and architecture
Location: Capital Suite 2/3 Level: Intermediate

Marton Balassi (Cloudera), Mirko Kämpf (Cloudera), Jan Kunigk (Cloudera)

Average rating:

(5.00, 2 ratings)

Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Read more.

Scaling data science (teams and technologies)

14:55–15:35 Thursday, 24/05/2018

Data science and machine learning, Data-driven business management, Emerging technologies and case studies
Location: Capital Suite 12 Level: Non-technical

David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)

Average rating:

(4.60, 5 ratings)

Cox Automotive is the world’s largest automotive service organization, which means it can combine data from across the entire vehicle lifecycle. Cox is on a journey to turn this data into insights. David Asboth and Shaun McGirr share their experience building up a data science team at Cox and scaling the company's data science process from laptop to Hadoop cluster. Read more.

Operationalize deep learning models for fraud detection with Azure Machine Learning Workbench

14:55–15:35 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

Secondary topics: Financial Services, Time Series and Graphs

Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)

Average rating:

(4.00, 2 ratings)

Advancements in computing technologies and ecommerce platforms have amplified the risk of online fraud, which results in billions of dollars of loss for the financial industry. This trend has urged companies to consider AI techniques, including deep learning, for fraud detection. Francesca Lazzeri and Jaya Mathew explain how to operationalize deep learning models with Azure ML to prevent fraud. Read more.

Are we doing this wrong? Advertisement features A/B testing

14:55–15:35 Thursday, 24/05/2018

Data science and machine learning, Data-driven business management
Location: Capital Suite 14 Level: Intermediate

Chen Salomon (Playbuzz)

Average rating:

(4.00, 1 rating)

A/B testing is the foundation of data-driven decision making. In today's world, advertising is crucial to a website's revenue, so it is even more important to measure the effects of changes correctly. Chen Salomon demonstrates how to correctly design and implement an advertisement A/B testing and shares pitfalls, potential biases related to advertisement metrics, and possible mitigations. Read more.

Mixing causal consistency and asynchronous replication for large Neo4j clusters

14:55–15:35 Thursday, 24/05/2018

Data engineering and architecture
Location: S11A Level: Intermediate

Secondary topics: Time Series and Graphs

Jim Webber (Neo4j)

Average rating:

(5.00, 3 ratings)

Jim Webber details how Neo4j mixes the strongly consistent Raft protocol with async log shipping and provides a strong consistency guarantee: causal, which means you can always at least read your writes even in very large multidata center clusters. Read more.

ClickFox: Customer journey analytics powered by OpenStack and Cloudera

14:55–15:35 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11B Level: Intermediate

Secondary topics: Data Platforms

Alvin HEIB (Cloudera), guy le roux (Atos)

Alvin Heib and Guy Leroux offer an overview of ClickFox, a platform able to cope with high-performance analytical needs, from bits and bytes to solving a customer needs, covering the platform's virtualization, big data, and analytical layers. Read more.

Radically modular data ingestion APIs in Apache Beam

14:55–15:35 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Advanced

Secondary topics: Data Integration and Data Pipelines sessions

Eugene Kirpichov (Google)

Average rating:

(4.50, 2 ratings)

Apache Beam offers users a novel programming model in which the classic batch-streaming dichotomy is erased and ships with a rich set of I/O connectors to popular storage systems. Eugene Kirpichov explains why Beam has made these connectors flexible and modular—a key component of which is Splittable DoFn, a novel programming model primitive that unifies data ingestion between batch and streaming. Read more.

Detecting small-scale mines in Ghana

14:55–15:35 Thursday, 24/05/2018

Data science and machine learning
Location: Capital Suite 10/11 Level: Intermediate

Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft)

Average rating:

(4.00, 3 ratings)

Michael Lanzetta and Elena Terenzi offer an overview of a collaboration between Microsoft and the Royal Holloway University that applied deep learning to locate illegal small-scale mines in Ghana using satellite imagery, scaled training using Kubernetes, and investigated the mines' impact on surrounding populations and environment. Read more.

The journey of machine learning platform adoption in enterprise

14:55–15:35 Thursday, 24/05/2018

Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical

Secondary topics: Data Platforms, Managing and Deploying Machine Learning

Simon Chan (Salesforce)

Average rating:

(4.00, 1 rating)

The promises of AI are great, but taking the steps to implement AI within an enterprise is challenging. The secret behind enterprise AI success often traces back to the underlying platform that accelerates AI development at scale. Based on years of experience helping executives establish AI product strategies, Simon Chan helps you discover the AI platform journey that is right for your business. Read more.

Big data meets renewable energy: Building a real-time asset management platform for renewable energy

14:55–15:35 Thursday, 24/05/2018

Data science and machine learning, Expo Hall
Location: Expo Hall Level: Beginner

Secondary topics: Data Integration and Data Pipelines sessions, Data Platforms

Stamatis Stefanakos (D ONE AG)

Average rating:

(4.33, 3 ratings)

Switzerland-based startup WinJi capitalizes on two current megatrends: big data and renewable energy. Stamatis Stefanakos offers an overview of WinJi's TruePower Asset Management Platform, covering the overall architecture and the motivation behind it, the physics behind the data, and the business case. Read more.

15:35

15:35–16:35 Thursday, 24/05/2018

Location: Expo Hall (Capital Hall 24)

Afternoon break (1h)

16:35

DevOps at ING Analytics: Combining data engineering with data operations

16:35–17:15 Thursday, 24/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 7 Level: Intermediate

Giuseppe D'alessio (ING Group)

Average rating:

(3.25, 4 ratings)

Giuseppe D'alessio details ING's DevOps journey, covering its impact on people, processes and tools, best practices, and pitfalls. Giuseppe concludes with a concrete example of using analytics and streaming technology on real-time applications. Read more.

Executive Briefing: The ROI of data-driven digital transformation

16:35–17:15 Thursday, 24/05/2018

Data-driven business management, Executive Briefing, Strata Business Summit
Location: Capital Suite 17 Level: Intermediate

Kevin Sigliano (IE Business School )

Average rating:

(5.00, 1 rating)

Financial and consumer ROI demands that business leaders understand the drivers and dynamics of digital transformation and big data. Kevin Sigliano explains why disrupting value propositions and continuous innovation are critical if you wish to dramatically improve the way your company engages customers and creates value and maximize financial results. Read more.

16:35–17:15 Thursday, 24/05/2018

Location: Capital Suite 12

TBC

Deep learning in the browser: Explorable explanations, model inference, and rapid prototyping

16:35–17:15 Thursday, 24/05/2018

Data science and machine learning, Visualization and user experience
Location: Capital Suite 13 Level: Intermediate

Amit Kapoor (narrativeVIZ), Bargava Subramanian (Binaize)

Amit Kapoor and Bargava Subramanian lead three live demos of deep learning (DL) done in the browser—building explorable explanations to aid insight, building model inference applications, and rapid prototyping and training an ML model—using the emerging client-side JavaScript libraries for DL. Read more.

Human-in-the-loop data science with Jupyter widgets

16:35–17:15 Thursday, 24/05/2018

Data engineering and architecture, Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Intermediate

Pascal Bugnion (ASI Data Science)

Jupyter widgets let you create lightweight, interactive graphical interfaces directly in Jupyter notebooks. Pascal Bugnion demonstrates how to use Jupyter widgets to implement human-in-the-loop machine learning with highly interactive user interfaces. Read more.

Learning how to design automatically updating AI with Apache Kafka and Deeplearning4j

16:35–17:15 Thursday, 24/05/2018

Data engineering and architecture, Streaming systems and real-time applications
Location: S11A Level: Beginner

Jason Bell (Independent Speaker)

Jason Bell offers an overview of a self-learning knowledge system that uses Apache Kafka and Deeplearning4j to accept data, apply training to a neural network, and output predictions. Jason covers the system design and the rationale behind it and the implications of using a streaming data with deep learning and artificial intelligence. Read more.

You call it data lake; we call it Data Historian.

16:35–17:15 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Secondary topics: Data Platforms

Naghman Waheed (Bayer Crop Science), Brian Arnold (Bayer)

Average rating:

(4.50, 2 ratings)

There are a number of tools that make it easy to implement a data lake. However, most lack the essential features that prevent your data lake from turning into a data swamp. Naghman Waheed and Brian Arnold offer an overview of Monsanto's Data Historian platform, which can ingest, store, and access datasets without compromising ease of use, governance, or security. Read more.

Stream scaling in Pravega

16:35–17:15 Thursday, 24/05/2018

Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Flavio Junqueira (Dell EMC)

Stream processing is in the spotlight. Enabling low-latency insights and actions out of continuously generated data is compelling to a number of application domains, and the ability to adapt to workload variations is critical to many applications. Flavio Junqueira explores Pravega, a stream store that scales streams automatically and enables applications to scale downstream by signaling changes. Read more.

Predicting rent arrears: Leveraging data science in the public sector

16:35–17:15 Thursday, 24/05/2018

Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 10/11 Level: Beginner

Secondary topics: Financial Services

Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)

Average rating:

(5.00, 5 ratings)

One major challenge to social housing is determining how best to target interventions when tenants fall behind on rent payments. Jonathan Leslie, Maryam Qurashi, and Tom Harrison discuss a recent project in which a team of data scientist trainees helped Hackney Council devise a more efficient, targeted strategy to detect and prioritize such situations. Read more.

The artful science of metrics: Measurements that work

16:35–17:15 Thursday, 24/05/2018

Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Intermediate

Tags:

Ketan Gangatirkar (Indeed)

Average rating:

(5.00, 1 rating)

Quantitative measurement is the key to scaling businesses, processes, and products and making them better. It sounds easy: just pick a number and improve it. However, actually choosing a metric is an exploration of a many-dimensional space with no map and no guide. Until now. Join Ketan Gangatirkar to learn how to choose the right metrics so you can build a better product and a better business. Read more.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

Schedule List ViewGrid View

Sponsorship Opportunities

Partner Opportunities

Contact Us

Schedule List View Grid View