Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule List View Grid View

Topics

S11A

11:15 Improving ad hoc and production workflows at Stitch Fix Neelesh Salian (Stitch Fix)

12:05 Setting up a lightweight distributed caching layer using Apache Arrow Jacques Nadeau (Dremio)

14:05 Why knowledge graphs are important to finance haikal haikal (GRAKN.AI)

14:55 Mixing causal consistency and asynchronous replication for large Neo4j clusters Jim Webber (Neo4j)

16:35 Learning how to design automatically updating AI with Apache Kafka and Deeplearning4j Jason Bell (Independent Speaker)

S11B

11:15 Big data, big quality: Data quality at Spotify Irene Gonzálvez (Spotify)

12:05 Big data at speed Mark Grover (Lyft), Ted Malaska (Capital One)

14:05 Bringing AI to BI: Microsoft's road to automated business incident monitoring and diagnostics with Project Kensho Tony Xing (Microsoft), Bixiong Xu (Microsoft)

14:55 ClickFox: Customer journey analytics powered by OpenStack and Cloudera Alvin HEIB (Cloudera), guy le roux (Atos)

16:35 You call it data lake; we call it Data Historian. Naghman Waheed (Bayer Crop Science), Brian Arnold (Bayer)

Capital Suite 7

11:15 Accelerating development velocity of production ML systems with Docker Kinnary Jangla (Pinterest)

12:05 Deep learning with TensorFlow and Spark using GPUs and Docker containers Nanda Vijaydev (BlueData), Thomas Phelan (HPE BlueData)

14:05 Continuous delivery and machine learning Guillaume Salou (OVH)

14:55 Machine learning platform lifecycle management Hope Wang (Intuit)

16:35 DevOps at ING Analytics: Combining data engineering with data operations Giuseppe D'alessio (ING Group)

Capital Suite 8/9

11:15 You’re doing it wrong: How Zoomdata rearchitected streaming Erin Recachinas (Zoomdata)

12:05 Autonomous ETL with materialized views Adesh Rao (Qubole), Abhishek Somani (Qubole)

14:05 Complex event processing with Apache Flink Kostas Kloudas (data Artisans)

14:55 Radically modular data ingestion APIs in Apache Beam Eugene Kirpichov (Google)

16:35 Stream scaling in Pravega Flavio Junqueira (Dell EMC)

Capital Suite 10/11

11:15 Model parallelism in Spark ML cross-validation Nick Pentreath (IBM), BRYAN CUTLER (IBM)

12:05 Interpretable machine learning products Mike Lee Williams (Cloudera Fast Forward Labs)

14:05 Human in the loop: A design pattern for managing teams working with machine learning Paco Nathan (derwen.ai)

14:55 Detecting small-scale mines in Ghana Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft)

16:35 Predicting rent arrears: Leveraging data science in the public sector Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)

Capital Suite 12

11:15 50 reasons to learn the shell for doing data science Jeroen Janssens (Data Science Workshops)

12:05 Machine learning at Intuit: Five delightful use cases Calum Murray (Intuit)

14:05 The ins and outs of forecasting in a hire business Kaylea Haynes (Peak )

14:55 Scaling data science (teams and technologies) David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)

16:35

Capital Suite 13

11:15 How Captricity manages 10,000 tiny deep learning models in production Ramesh Sridharan (Captricity)

12:05 A high-performance system for deep learning inference and visual inspection Moty Fania (Intel)

14:05 Scaling the AI hierarchy of needs with TensorFlow, Spark, and Hops Jim Dowling (Logical Clocks)

14:55 Operationalize deep learning models for fraud detection with Azure Machine Learning Workbench Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)

16:35 Deep learning in the browser: Explorable explanations, model inference, and rapid prototyping Amit Kapoor (narrativeVIZ), Bargava Subramanian (Binaize)

Capital Suite 14

11:15 Rendezvous with AI Ted Dunning (MapR, now part of HPE)

12:05 Ask Me Anything: Architecting a data platform for enterprise use Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)

14:05 Ask Me Anything: Streaming applications and architectures Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

14:55 Are we doing this wrong? Advertisement features A/B testing Chen Salomon (Playbuzz)

16:35 Human-in-the-loop data science with Jupyter widgets Pascal Bugnion (ASI Data Science)

Capital Suite 15/16

11:15 Using Python to analyze financial markets Saeed Amen (Cuemacro)

12:05 On the limits of decision making with artificial intelligence Martin Goodson (Evolution AI)

14:05 Data, AI, and innovation in the enterprise Michael Li (The Data Incubator), Philipp Diesinger (Boehringer Ingelheim), Julie Shin (Citigroup)

14:55 The journey of machine learning platform adoption in enterprise Simon Chan (Salesforce)

16:35 The artful science of metrics: Measurements that work Ketan Gangatirkar (Indeed)

Capital Suite 17

11:15 Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it Mick Hollison (Cloudera)

12:05 Executive Briefing: Artificial intelligence—The next digital frontier? Louise Herring (McKinsey & Company)

14:05 Executive Briefing: Data privacy in the age of the internet of things Alasdair Allan (Babilim Light Industries)

14:55 Executive Briefings: Killer robots and how not to do data science Kate Vang (DataKind UK), Christine Henry (DataKind UK)

16:35 Executive Briefing: The ROI of data-driven digital transformation Kevin Sigliano (IE Business School )

Expo Hall

11:15 Modeling time series in R Jared Lander (Lander Analytics)

12:05 A heretical monitoring view: Using PostgreSQL to store Prometheus metrics and visualizing them in Grafana Erik Nordström (Timescale)

14:05 Spark NLP in action: Intelligent, high-accuracy fact extraction from long financial documents David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)

14:55 Big data meets renewable energy: Building a real-time asset management platform for renewable energy Stamatis Stefanakos (D ONE AG)

Capital Suite 2/3

11:15 Building the bridge from big data to machine learning and artificial intelligence (sponsored by Google Cloud) Ryan Lippert (Google Cloud)

14:05 The Data Intelligence Hub: On-demand Hadoop resource provisioning in Europe’s Industrial Data Space using Cloudera Altus Sven Loeffler (Deutsche Telekom)

14:55 Improving computer vision models at scale Marton Balassi (Cloudera), Mirko Kämpf (Cloudera), Jan Kunigk (Cloudera)

Auditorium
9:00 Thursday opening welcome Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

9:05 So, you want to be successful in the open future? Louise Beaumont (Publicis Groupe | techUK | NPSO)

9:20 Machine learning: Research and industry Mikio Braun (Zalando)

9:35 Moving machine learning and analytics to hyperspeed Amr Awadallah (Cloudera), Ankit Tharwani (Barclays UK), Bala Chandrasekaran (Barclays)

9:50 When to KISS Zubin Siganporia (QED Analytics)

10:00 Cloud and the golden age of data analytics (sponsored by Google Cloud) Tom Grey (Google)

10:10 Out of the lab and into real life Christine Foster (The Alan Turing Institute)

10:25 The good, the bad, and the internet? Martha Lane Fox (CBE)

8:15 Speed Networking | Room: Auditorium Foyer

8:45 Coffee break sponsored by Data Artisans (8:00 - 9:00) | Room: Auditorium Foyer

10:45 Morning break | Room: Expo Hall (Capital Hall 24)

12:45 Lunch sponsored by Google Cloud Thursday Topic Tables at Lunch | Room: Expo Hall (Capital Hall 24)

12:45 Thursday Business Summit Lunch | Room: Expo Hall - SBS lunch (Capital Hall 24)

15:35 Afternoon break | Room: Expo Hall (Capital Hall 24)

11:15-11:55 (40m) Big data and data science in the cloud, Data engineering and architecture, Platform security and cybersecurity Data Platforms, E-commerce and Retail

Improving ad hoc and production workflows at Stitch Fix

Neelesh Salian (Stitch Fix)

Neelesh Srinivas Salian offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way.

12:05-12:45 (40m) Big data and data science in the cloud, Data engineering and architecture

Setting up a lightweight distributed caching layer using Apache Arrow

Jacques Nadeau (Dremio)

Jacques Nadeau offers an overview of a new Apache-licensed lightweight distributed in-memory cache that allows multiple applications to consume Arrow directly using the Arrow RPC and IPC protocols. You'll explore the system design and deployment architecture, learn how data science, analytical, and custom applications can all leverage the cache simultaneously, and see a live demo.

14:05-14:45 (40m) Data engineering and architecture

Why knowledge graphs are important to finance

haikal haikal (GRAKN.AI)

Haikal Pribadi explains why knowledge graphs (KGs) are important for AI systems in the finance sector and details how they are being used to detect and uncover new knowledge, specifically for risk analysis, fraud detection, and GDPR use cases.

14:55-15:35 (40m) Data engineering and architecture Time Series and Graphs

Mixing causal consistency and asynchronous replication for large Neo4j clusters

Jim Webber (Neo4j)

Jim Webber details how Neo4j mixes the strongly consistent Raft protocol with async log shipping and provides a strong consistency guarantee: causal, which means you can always at least read your writes even in very large multidata center clusters.

16:35-17:15 (40m) Data engineering and architecture, Streaming systems and real-time applications

Learning how to design automatically updating AI with Apache Kafka and Deeplearning4j

Jason Bell (Independent Speaker)

Jason Bell offers an overview of a self-learning knowledge system that uses Apache Kafka and Deeplearning4j to accept data, apply training to a neural network, and output predictions. Jason covers the system design and the rationale behind it and the implications of using a streaming data with deep learning and artificial intelligence.

11:15-11:55 (40m) Data engineering and architecture Data Integration and Data Pipelines sessions, Data Platforms, Media, Advertising, Entertainment

Big data, big quality: Data quality at Spotify

Irene Gonzálvez (Spotify)

Irene Gonzálvez shares Spotify's process for ensuring data quality, covering why and how the company became aware of its importance, the products it has developed, and future strategy.

12:05-12:45 (40m) Data engineering and architecture, Emerging technologies and case studies, Streaming systems and real-time applications Transportation and Logistics

Big data at speed

Mark Grover (Lyft), Ted Malaska (Capital One)

Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed.

14:05-14:45 (40m) Data engineering and architecture, Streaming systems and real-time applications Data Platforms, Time Series and Graphs

Bringing AI to BI: Microsoft's road to automated business incident monitoring and diagnostics with Project Kensho

Tony Xing (Microsoft), Bixiong Xu (Microsoft)

Tony Xing and Bixiong Xu offer an overview of Project Kensho, Microsoft's one-stop shop for business incident monitoring and automated insights. Tony and Bixiong cover the technology's evolution, the architecture, the algorithms, and the benefits and the trade-offs. Along the way, they share a case study on Bing ads key metrics monitoring and automated diagnostic insights.

14:55-15:35 (40m) Big data and data science in the cloud, Data engineering and architecture Data Platforms

ClickFox: Customer journey analytics powered by OpenStack and Cloudera

Alvin HEIB (Cloudera), guy le roux (Atos)

Alvin Heib and Guy Leroux offer an overview of ClickFox, a platform able to cope with high-performance analytical needs, from bits and bytes to solving a customer needs, covering the platform's virtualization, big data, and analytical layers.

16:35-17:15 (40m) Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications Data Platforms

You call it data lake; we call it Data Historian.

Naghman Waheed (Bayer Crop Science), Brian Arnold (Bayer)

There are a number of tools that make it easy to implement a data lake. However, most lack the essential features that prevent your data lake from turning into a data swamp. Naghman Waheed and Brian Arnold offer an overview of Monsanto's Data Historian platform, which can ingest, store, and access datasets without compromising ease of use, governance, or security.

11:15-11:55 (40m) Data engineering and architecture, Streaming systems and real-time applications Data Platforms, Managing and Deploying Machine Learning, Media, Advertising, Entertainment

Accelerating development velocity of production ML systems with Docker

Kinnary Jangla (Pinterest)

Having trouble coordinating development of your production ML system between a team of developers? Microservices drifting and causing problems debugging? Kinnary Jangla explains how Pinterest dockerized the services powering its home feed and how it impacted the engineering productivity of its ML teams while increasing uptime and ease of deployment.

12:05-12:45 (40m) Big data and data science in the cloud, Data engineering and architecture Managing and Deploying Machine Learning

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Nanda Vijaydev (BlueData), Thomas Phelan (HPE BlueData)

In the past, you needed a high-end proprietary stack for advanced machine learning, but today, you can use open source machine learning and deep learning algorithms available with distributed computing technologies like Apache Spark and GPUs. Nanda Vijaydev and Thomas Phelan demonstrate how to deploy a TensorFlow and Spark with NVIDIA CUDA stack on Docker containers in a multitenant environment.

14:05-14:45 (40m) Data engineering and architecture Managing and Deploying Machine Learning

Continuous delivery and machine learning

Guillaume Salou (OVH)

Guillaume Salou shares OVH's approach to continuous deployment of machine learning models, which involved building a full stack of automated machine learning. Automated machine learning allows the company to rebuild models efficiently and keep models up to date with fresh data brought by its data convergence tool.

14:55-15:35 (40m) Data engineering and architecture, Data-driven business management Financial Services, Managing and Deploying Machine Learning

Machine learning platform lifecycle management

Hope Wang (Intuit)

A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. Hope Wang explains how to manage various artifacts and their associations, automate deployment to support the lifecycle of a model, and build a cohesive machine learning platform.

16:35-17:15 (40m) Data engineering and architecture, Streaming systems and real-time applications

DevOps at ING Analytics: Combining data engineering with data operations

Giuseppe D'alessio (ING Group)

Giuseppe D'alessio details ING's DevOps journey, covering its impact on people, processes and tools, best practices, and pitfalls. Giuseppe concludes with a concrete example of using analytics and streaming technology on real-time applications.

11:15-11:55 (40m) Data engineering and architecture, Streaming systems and real-time applications Visualization, Design, and UX

You’re doing it wrong: How Zoomdata rearchitected streaming

Erin Recachinas (Zoomdata)

The value of real-time streaming analytics with historical data is immense. Big data application Zoomdata updates historical dashboards in real time without complex reaggregations, but streaming in the age of the IoT requires handling of data in volumes not seen in traditional feeds. Erin Recachinas explains how Zoomdata moved to a scalable microservice architecture for streaming sources.

12:05-12:45 (40m) Big data and data science in the cloud, Data engineering and architecture Data Integration and Data Pipelines sessions

Autonomous ETL with materialized views

Adesh Rao (Qubole), Abhishek Somani (Qubole)

Adesh Rao and Abhishek Somani share a framework for materialized views in SQL-on-Hadoop engines that automatically suggests, creates, uses, invalidates, and refreshes views created on top of data for optimal performance and strict correctness.

14:05-14:45 (40m) Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications

Complex event processing with Apache Flink

Kostas Kloudas (data Artisans)

Complex event processing (CEP) helps detect patterns over continuous streams of data. DNA sequencing, fraud detection, shipment tracking with specific characteristics (e.g., contaminated goods), and user activity analysis fall into this category. Kostas Kloudas offers an overview of Flink's CEP library and explains the benefits of its integration with Flink.

14:55-15:35 (40m) Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications Data Integration and Data Pipelines sessions

Radically modular data ingestion APIs in Apache Beam

Eugene Kirpichov (Google)

Apache Beam offers users a novel programming model in which the classic batch-streaming dichotomy is erased and ships with a rich set of I/O connectors to popular storage systems. Eugene Kirpichov explains why Beam has made these connectors flexible and modular—a key component of which is Splittable DoFn, a novel programming model primitive that unifies data ingestion between batch and streaming.

16:35-17:15 (40m) Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications

Stream scaling in Pravega

Flavio Junqueira (Dell EMC)

Stream processing is in the spotlight. Enabling low-latency insights and actions out of continuously generated data is compelling to a number of application domains, and the ability to adapt to workload variations is critical to many applications. Flavio Junqueira explores Pravega, a stream store that scales streams automatically and enables applications to scale downstream by signaling changes.

11:15-11:55 (40m) Data science and machine learning

Model parallelism in Spark ML cross-validation

Nick Pentreath (IBM), BRYAN CUTLER (IBM)

Tuning a Spark ML model using cross-validation involves a computationally expensive search over a large parameter space. Nick Pentreath and Bryan Cutler explain how enabling Spark to evaluate models in parallel can significantly reduce the time to complete this process for large workloads and share best practices for choosing the right configuration to achieve optimal resource usage.

12:05-12:45 (40m) Data science and machine learning Financial Services

Interpretable machine learning products

Mike Lee Williams (Cloudera Fast Forward Labs)

Interpretable models result in more accurate, safer, and more profitable machine learning products, but interpretability can be hard to ensure. Michael Lee Williams examines the growing business case for interpretability, explores concrete applications including churn, finance, and healthcare, and demonstrates the use of LIME, an open source, model-agnostic tool you can apply to your models today.

14:05-14:45 (40m) Data science and machine learning

Human in the loop: A design pattern for managing teams working with machine learning

Paco Nathan (derwen.ai)

Human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. Such systems are mostly automated, with exceptions referred to human experts, who help train the machines further. Paco Nathan offers an overview of HITL from the perspective of a business manager, focusing on use cases within O'Reilly Media.

14:55-15:35 (40m) Data science and machine learning

Detecting small-scale mines in Ghana

Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft)

Michael Lanzetta and Elena Terenzi offer an overview of a collaboration between Microsoft and the Royal Holloway University that applied deep learning to locate illegal small-scale mines in Ghana using satellite imagery, scaled training using Kubernetes, and investigated the mines' impact on surrounding populations and environment.

16:35-17:15 (40m) Data science and machine learning, Emerging technologies and case studies Financial Services

Predicting rent arrears: Leveraging data science in the public sector

Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)

One major challenge to social housing is determining how best to target interventions when tenants fall behind on rent payments. Jonathan Leslie, Maryam Qurashi, and Tom Harrison discuss a recent project in which a team of data scientist trainees helped Hackney Council devise a more efficient, targeted strategy to detect and prioritize such situations.

11:15-11:55 (40m) Data science and machine learning

50 reasons to learn the shell for doing data science

Jeroen Janssens (Data Science Workshops)

"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems.

12:05-12:45 (40m) Data science and machine learning, Streaming systems and real-time applications Financial Services

Machine learning at Intuit: Five delightful use cases

Calum Murray (Intuit)

Machine learning-based applications are becoming the new norm. Calum Murray shares five use cases at Intuit that use the data of over 60 million users to create delightful experiences for customers by solving repetitive tasks, freeing them up to spend time more productively or solving very complex tasks with simplicity and elegance.

14:05-14:45 (40m) Data science and machine learning, Data-driven business management

The ins and outs of forecasting in a hire business

Kaylea Haynes (Peak )

Deciding how much stock to hold is a challenge for hire businesses. There is a fine balance between holding enough stock to fulfill hires and not holding too much stock so that overall utilization is too low to achieve the return on investment. Kaylea Haynes shares a case study on forecasting the demand for thousands of assets across multiple locations.

14:55-15:35 (40m) Data science and machine learning, Data-driven business management, Emerging technologies and case studies

Scaling data science (teams and technologies)

David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)

Cox Automotive is the world’s largest automotive service organization, which means it can combine data from across the entire vehicle lifecycle. Cox is on a journey to turn this data into insights. David Asboth and Shaun McGirr share their experience building up a data science team at Cox and scaling the company's data science process from laptop to Hadoop cluster.

16:35-17:15 (40m)

Session

11:15-11:55 (40m) Data science and machine learning Managing and Deploying Machine Learning

How Captricity manages 10,000 tiny deep learning models in production

Ramesh Sridharan (Captricity)

Most uses of deep learning involve models trained with large datasets. Ramesh Sridharan explains how Captricity uses deep learning with tiny datasets at scale, training thousands of models using tens to hundreds of examples each. These models are dynamically trained using an automatic deployment framework, and carefully chosen metrics further exploit error properties of the resulting models.

12:05-12:45 (40m) Data science and machine learning, Streaming systems and real-time applications Data Platforms, Managing and Deploying Machine Learning

A high-performance system for deep learning inference and visual inspection

Moty Fania (Intel)

Moty Fania explains how Intel implemented an AI inference platform to enable internal visual inspection use cases and shares lessons learned along the way. The platform is based on open source technologies and was designed for real-time streaming and online actuation.

14:05-14:45 (40m) Data engineering and architecture

Scaling the AI hierarchy of needs with TensorFlow, Spark, and Hops

Jim Dowling (Logical Clocks)

Distributed deep learning can increase the productivity of AI practitioners and reduce time to market for training models. Hadoop can fulfill a crucial role as a unified feature store and resource management platform for distributed deep learning. Jim Dowling offers an introduction to writing distributed DL applications, covering TensorFlow and Apache Spark frameworks that make distribution easy.

14:55-15:35 (40m) Data science and machine learning Financial Services, Time Series and Graphs

Operationalize deep learning models for fraud detection with Azure Machine Learning Workbench

Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)

Advancements in computing technologies and ecommerce platforms have amplified the risk of online fraud, which results in billions of dollars of loss for the financial industry. This trend has urged companies to consider AI techniques, including deep learning, for fraud detection. Francesca Lazzeri and Jaya Mathew explain how to operationalize deep learning models with Azure ML to prevent fraud.

16:35-17:15 (40m) Data science and machine learning, Visualization and user experience

Deep learning in the browser: Explorable explanations, model inference, and rapid prototyping

Amit Kapoor (narrativeVIZ), Bargava Subramanian (Binaize)

Amit Kapoor and Bargava Subramanian lead three live demos of deep learning (DL) done in the browser—building explorable explanations to aid insight, building model inference applications, and rapid prototyping and training an ML model—using the emerging client-side JavaScript libraries for DL.

11:15-11:55 (40m) Data science and machine learning Managing and Deploying Machine Learning

Rendezvous with AI

Ted Dunning (MapR, now part of HPE)

Ted Dunning offers an overview of the rendezvous architecture, which is geared to deal with much of the complexity involved in deploying models to production, thus allowing more time to be spent thinking and doing real data science. Ted covers the ideas behind the architecture, practical scenarios, and advantages and disadvantages of the architecture.

12:05-12:45 (40m) Ask Me Anything

Ask Me Anything: Architecting a data platform for enterprise use

Mark Madsen (Teradata), Shant Hovsepian (Arcadia Data)

Join Mark Madsen and Shant Hovsepian to discuss analytics strategy and planning, data architecture, data management, and BI on big data.

14:05-14:45 (40m) Ask Me Anything

Ask Me Anything: Streaming applications and architectures

Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

Join Dean Wampler and Boris Lublinsky to discuss all things streaming: architecture, implementation, streaming engines and frameworks, techniques for serving machine learning models in production, traditional big data systems (dying or still relevant?), and general software architecture and data systems.

14:55-15:35 (40m) Data science and machine learning, Data-driven business management

Are we doing this wrong? Advertisement features A/B testing

Chen Salomon (Playbuzz)

A/B testing is the foundation of data-driven decision making. In today's world, advertising is crucial to a website's revenue, so it is even more important to measure the effects of changes correctly. Chen Salomon demonstrates how to correctly design and implement an advertisement A/B testing and shares pitfalls, potential biases related to advertisement metrics, and possible mitigations.

16:35-17:15 (40m) Data engineering and architecture, Data science and machine learning, Visualization and user experience

Human-in-the-loop data science with Jupyter widgets

Pascal Bugnion (ASI Data Science)

Jupyter widgets let you create lightweight, interactive graphical interfaces directly in Jupyter notebooks. Pascal Bugnion demonstrates how to use Jupyter widgets to implement human-in-the-loop machine learning with highly interactive user interfaces.

11:15-11:55 (40m) Strata Business Summit

Using Python to analyze financial markets

Saeed Amen (Cuemacro)

Saeed Amen explores Python libraries that can be used at the various stages of financial analysis, including time series analysis, visualization, structuring data, and storing market data.

12:05-12:45 (40m) Data-driven business management, Strata Business Summit

On the limits of decision making with artificial intelligence

Martin Goodson (Evolution AI)

How can AI become part of our business processes? Should we entrust critical decisions to completely autonomous systems? Drawing on projects from businesses and UK government agencies, Martin Goodson explains how to increase confidence in AI systems and manage the transition to an AI-driven organization.

14:05-14:45 (40m) Data-driven business management, Strata Business Summit

Data, AI, and innovation in the enterprise

Michael Li (The Data Incubator), Philipp Diesinger (Boehringer Ingelheim), Julie Shin (Citigroup)

What are the latest initiatives and use cases around data and AI? How are data and AI reshaping industries? How do we foster a culture of data and innovation within a larger enterprise? What are some of the challenges of implementing AI within the enterprise setting? Michael Li moderates a panel of experts in different industries to answer these questions and more.

14:55-15:35 (40m) Data-driven business management, Strata Business Summit Data Platforms, Managing and Deploying Machine Learning

The journey of machine learning platform adoption in enterprise

Simon Chan (Salesforce)

The promises of AI are great, but taking the steps to implement AI within an enterprise is challenging. The secret behind enterprise AI success often traces back to the underlying platform that accelerates AI development at scale. Based on years of experience helping executives establish AI product strategies, Simon Chan helps you discover the AI platform journey that is right for your business.

16:35-17:15 (40m) Data-driven business management, Strata Business Summit

The artful science of metrics: Measurements that work

Ketan Gangatirkar (Indeed)

Quantitative measurement is the key to scaling businesses, processes, and products and making them better. It sounds easy: just pick a number and improve it. However, actually choosing a metric is an exploration of a many-dimensional space with no map and no guide. Until now. Join Ketan Gangatirkar to learn how to choose the right metrics so you can build a better product and a better business.

11:15-11:55 (40m) Executive Briefing, Strata Business Summit

Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it

Mick Hollison (Cloudera)

Mick Hollison shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them.

12:05-12:45 (40m) Executive Briefing, Strata Business Summit

Executive Briefing: Artificial intelligence—The next digital frontier?

Louise Herring (McKinsey & Company)

After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we’re still early in the cycle of adoption. Louise Herring explains where investment is going, patterns of AI adoption and value capture by enterprises, and how the value potential of AI across sectors and business functions is beginning to emerge.

14:05-14:45 (40m) Executive Briefing, Law, ethics, and governance, Strata Business Summit Security and Privacy, Telecom

Executive Briefing: Data privacy in the age of the internet of things

Alasdair Allan (Babilim Light Industries)

The increasing ubiquity of the internet of things has put a new focus on data privacy. Big data is all very well when it's harvested quietly and stealthily, but when your things tattle on you behind your back, it's a very different matter altogether. Alasdair Allan explains why the internet of things brings with it a whole new set of big data problems that can't be ignored.

14:55-15:35 (40m) Executive Briefing, Law, ethics, and governance, Strata Business Summit Security and Privacy

Executive Briefings: Killer robots and how not to do data science

Kate Vang (DataKind UK), Christine Henry (DataKind UK)

Not a day goes by without reading headlines about the fear of AI or how technology seems to be dividing us more than bringing us together. DataKind UK is passionate about using machine learning and artificial intelligence for social good. Kate Vang and Christine Henry explain what socially conscious AI looks like and what DataKind is doing to make it a reality.

16:35-17:15 (40m) Data-driven business management, Executive Briefing, Strata Business Summit

Executive Briefing: The ROI of data-driven digital transformation

Kevin Sigliano (IE Business School )

Financial and consumer ROI demands that business leaders understand the drivers and dynamics of digital transformation and big data. Kevin Sigliano explains why disrupting value propositions and continuous innovation are critical if you wish to dramatically improve the way your company engages customers and creates value and maximize financial results.

11:15-11:55 (40m) Data science and machine learning, Expo Hall Time Series and Graphs

Modeling time series in R

Jared Lander (Lander Analytics)

Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models.

12:05-12:45 (40m) Data science and machine learning, Expo Hall, Streaming systems and real-time applications, Visualization and user experience Time Series and Graphs

A heretical monitoring view: Using PostgreSQL to store Prometheus metrics and visualizing them in Grafana

Erik Nordström (Timescale)

Erik Nordström explains how and why to use PostgreSQL as a Prometheus backend to support complex questions (and get a proper SQL interface), offers an overview of pg_prometheus, a custom Prometheus datatype, and prometheus-postgresql-adapter, a remote storage adaptor for PostgreSQL, and shares his experience with TimescaleDB, which enables PostgreSQL to scale for classic monitoring volumes.

14:05-14:45 (40m) Data science and machine learning, Expo Hall Financial Services, Text and Language processing and analysis

Spark NLP in action: Intelligent, high-accuracy fact extraction from long financial documents

David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)

Spark NLP natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was not possible to date. David Talby, Saif Addin Ellafi, and Paul Parau explain how Spark NLP was used to augment the Recognos smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents.

14:55-15:35 (40m) Data science and machine learning, Expo Hall Data Integration and Data Pipelines sessions, Data Platforms

Big data meets renewable energy: Building a real-time asset management platform for renewable energy

Stamatis Stefanakos (D ONE AG)

Switzerland-based startup WinJi capitalizes on two current megatrends: big data and renewable energy. Stamatis Stefanakos offers an overview of WinJi's TruePower Asset Management Platform, covering the overall architecture and the motivation behind it, the physics behind the data, and the business case.

11:15-11:55 (40m) Sponsored

Building the bridge from big data to machine learning and artificial intelligence (sponsored by Google Cloud)

Ryan Lippert (Google Cloud)

If your company isn’t good at analytics, it’s not ready for AI. Ryan Lippert explains how the right data strategy can set you up for success in machine learning and artificial intelligence—the new ground for gaining competitive edge and creating business value.

14:05-14:45 (40m) Big data and data science in the cloud, Data science and machine learning Telecom

The Data Intelligence Hub: On-demand Hadoop resource provisioning in Europe’s Industrial Data Space using Cloudera Altus

Sven Loeffler (Deutsche Telekom)

Sven Löffler offers an overview of the Data Intelligence Hub, T-Systems's implementation of the Fraunhofer Industrial Data Space: a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things.

14:55-15:35 (40m) Data engineering and architecture

Improving computer vision models at scale

Marton Balassi (Cloudera), Mirko Kämpf (Cloudera), Jan Kunigk (Cloudera)

Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable.

9:00-9:05 (5m)

Thursday opening welcome

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.

9:05-9:20 (15m)

So, you want to be successful in the open future?

Louise Beaumont (Publicis Groupe | techUK | NPSO)

Louise Beaumont explores the five characteristics of companies that choose to succeed.

9:20-9:35 (15m)

Machine learning: Research and industry

Mikio Braun (Zalando)

Mikio Braun has worked in both research and industry and draws on this experience to share insights on how these two areas are the same (and how they are different). He then details how deep learning might change the game again.

9:35-9:45 (10m)

Moving machine learning and analytics to hyperspeed

Amr Awadallah (Cloudera), Ankit Tharwani (Barclays UK), Bala Chandrasekaran (Barclays)

Imagine the value you could drive in your business if you could accelerate your journey to machine learning and analytics. Amr Awadallah, Ankit Tharwani, and Bala Chandrasekaran explain how Barclays has driven innovation in real-time analytics and machine learning with Apache Kudu, accelerating the time to value across multiple business initiatives, including marketing, fraud prevention, and more.

9:50-10:00 (10m)

When to KISS

Zubin Siganporia (QED Analytics)

The KISS principle tells us to "Keep it simple, stupid." As machine learning techniques become more sophisticated, the need to KISS only becomes greater. Zubin Siganporia discusses the role that simplicity plays in approaching a problem and then convincing end users to adopt data-driven solutions to their challenges.

10:00-10:10 (10m) Sponsored keynote

Cloud and the golden age of data analytics (sponsored by Google Cloud)

Tom Grey (Google)

The history of data analytics has been marked by an environment of scarcity. The way we approach data analytics is only just catching up. Tom Grey explains why we are on the cusp of a golden age of analytics and machine learning.

10:10-10:25 (15m)

Out of the lab and into real life

Christine Foster (The Alan Turing Institute)

There is a common conception that artificial intelligence will change business. But as researchers at the Alan Turing Institute (the national center for data science and AI) well know, a new algorithm alone does not change the world. Christine Foster explores how businesses and researchers can find common ground and how today’s academic papers turn into tomorrow’s data science.

10:25-10:40 (15m)

The good, the bad, and the internet?

Martha Lane Fox (CBE)

Keynote with Martha Lane Fox

8:15-8:45 (30m)

Speed Networking

Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees.

8:45-9:00 (15m)

Break: Coffee break sponsored by Data Artisans (8:00 - 9:00)

10:45-11:15 (30m)

Break: Morning break

12:45-14:05 (1h 20m)

Thursday Topic Tables at Lunch

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.

12:45-14:05 (1h 20m)

Thursday Business Summit Lunch

Join Strata Business Summit speakers and attendees for a networking lunch on Thursday.

15:35-16:35 (1h)

Break: Afternoon break

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

Schedule List ViewGrid View

Topics

Sponsorship Opportunities

Partner Opportunities

Contact Us

Schedule List View Grid View