San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Schedule List View Grid View

Topics

Expo Hall (Capital Hall N24)

11:15 How to keep ethical with machine learning Jerry Overton (DXC)

12:05 Deep learning for recommender systems Oliver Gindele (Datatonic)

14:05 AI for good at scale in real time: Challenges in machine learning and deep learning Alex Jaimes (Dataminr)

14:55

Expo Hall 2 (Capital Hall N24)

11:15 Streaming at Lyft Thomas Weise (Lyft)

12:05 Unleashing Apache Kafka and TensorFlow in hybrid architectures Kai Wähner (Confluent)

14:05 Autoscaling Spark on Kubernetes Holden Karau (Independent), Kris Nova (Independent)

14:55 Performant time series data management and analytics with PostgreSQL Michael Freedman (TimescaleDB | Princeton University)

S11 A

11:15 Scaling Impala: Common mistakes and best practices Manish Maheshwari (Cloudera)

12:05 Schema on read and the new logging way David Josephsen (Sparkpost)

14:05 Mutant tests too: The SQL Elliot West (Hotels.com), Jaydene Green (Hotels.com)

14:55 The future of cloud native data warehousing: Emerging trends and technologies Greg Rahn (Cloudera)

16:35 Deep learning with TensorFlow and Spark using GPUs and Docker containers Thomas Phelan (HPE BlueData)

S11 B

11:15 Big data analytics in the public cloud: Challenges and opportunities Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)

12:05 Herding elephants: Seamless data access in a multicluster clouds Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)

14:05 The vegan data diet: How Wikipedia cuts down privacy issues while keeping data fit Marcel Ruiz Forns (Wikimedia Foundation)

14:55 Mastering data with Spark and machine learning Sonal Goyal (Nube)

16:35 Migrating Apache Oozie workflows to Apache Airflow Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University | Google Cloud)

Capital Suite 8/9

11:15 Half-correct and half-wrong collective data wisdom: 3 patterns to sanity Sandeep U (Intuit)

12:05 Data science at Deutsche Telekom: Predicting global travel patterns and network demand Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)

14:05 Unlocking insights in AI by building a feature store Willem Pienaar (GOJEK), Zhi Ling Chen (GOJEK)

14:55 Architecting a data platform to support analytic workflows for scientific data Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)

16:35 From legacy to cloud: An end-to-end data integration journey Max Schultze (Zalando SE)

Capital Suite 10/11

11:15 Transforming a financial services data infrastructure for the modern era by building a PCI DSS-compliant data platform from the ground up on AWS Eoin O'Flanagan (NewDay), Darragh McConville (Kainos)

12:05 Application intelligence: Bridging the gap between human expertise and machine learning Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)

14:05 Simplicity at scale: How Cloudflare’s analyses some of the world’s largest DDoS attacks Tom Walwyn (Cloudflare)

14:55 Learning how to perform ETL data migrations with open source tool Embulk Jason Bell (Independent Speaker)

Capital Suite 12

11:15 Insightful health: Amplifying intelligence in healthcare patient flow execution Fabio Ferraretto, Claudia Regina Laselva (Albert Einstein Jewish Hospital)

12:05 Starting with the end in mind: Lessons learned from data strategies that work Vidya Raman (Cloudera)

14:05 Practicing data science: A collection of case studies Rosaria Silipo (KNIME)

14:55 Data-driven digital transformation and jobs: The new software hierarchy and ML Robert Cohen (Economic Strategy Institute)

Capital Suite 13

11:15 Executive Briefing: Analytics for executives Brandy Freitas (Pitney Bowes)

12:05 Executive Briefing: The intelligent edge and the demise of big data? Alasdair Allan (Babilim Light Industries)

14:05 Executive Briefing: The hidden data scientists lurking in your company Jack Norris (MapR Technologies)

14:55 Executive Briefing: AWS technology trends—Data lakes and analytics Nikki Rouda (Amazon Web Services)

16:35 Executive Briefing: What it takes to use machine learning in fast data pipelines Dean Wampler (Anyscale)

Capital Suite 14

11:15 Fraud detection at a financial institution using unsupervised learning and text mining David Dogon (Van Lanschot Kempen)

12:05 NLP Architect by Intel's AI Lab Moshe Wasserblat (Intel)

14:05 8 prerequisites of a graph query language Mingxi Wu (TigerGraph)

14:55 Learning with limited labeled data Shioulin Sam (Cloudera Fast Forward Labs)

16:35 Evaluating cybersecurity defenses with a data science approach Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)

Capital Suite 15/16

11:15 Learning "learning to rank" Sophie Watson (Red Hat)

12:05 How to mitigate mobile fraud risk by data analytics SEONMIN KIM (LINE)

14:05 Reinforcement learning: A gentle introduction and an industrial application Christian Hidber (bSquare)

14:55 Early incident detection using fusion analytics of commuter-centric data sources Christopher Hooi (Land Transport Authority of Singapore)

16:35 Improving infrastructure efficiency with unsupervised algorithms Alexandre Hubert (Dataiku)

Capital Suite 17

11:15 Deep learning for speech synthesis: The good news, the bad news, and the fake news Scott Stevenson (Faculty)

12:05 Inclusive design: Deep learning on audio in Azure, identifying sounds in real time Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)

14:05 Deep learning for fonts Raghotham Sripadraj (Ericsson), Nischal Harohalli Padmanabha (Omnius)

14:55 A deep learning approach to automatic call routing Tal Doron (GigaSpaces)

Capital Suite 2/3

11:15 Oracle's second-generation cloud: Optimized for the partner ecosystem (sponsored by Oracle Cloud Infrastructure) Ben Lackey (Oracle)

Auditorium
9:00 Thursday keynote welcome Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

9:05 The unstoppable rise of white box data Chris Taggart (OpenCorporates)

9:20 Building data science capacity in your organization Shingai Manjengwa (Fireside Analytics Inc.)

9:35 Combining creativity and analytics David Boyle (Audience Strategies)

9:50 BMW’s journey to the data-driven enterprise from the edge to AI Amr Awadallah (Cloudera), Tobias Burger (BMW Group)

10:00 Rise of the (advertising) machines Michael Tidmarsh (Ogilvy)

10:15 Privacy, identity, and autonomy in the age of big data and AI Sandra Wachter (University of Oxford)

10:45 Morning break | Room: Expo Hall

12:45 Thursday Topic Tables at Lunch | Room: Expo Hall

15:35 Afternoon break | Room: Expo Hall

8:00 Early morning coffee sponsored by AXA | Room: Level 0 - Blvd

8:15 Speed Networking | Room: Level 0 - Blvd

11:15-11:55 (40m) Data Science, Machine Learning & AI, Expo Hall Ethics

How to keep ethical with machine learning

Jerry Overton (DXC)

Machine learning (ML) algorithms are good at learning new behaviors but bad at identifying when those behaviors are harmful or don’t make sense. Bias, ethics, and fairness are big risk factors in ML. However, we creators have a lot of experience dealing with intelligent beings—one another. Jerry Overton uses this common sense to build a checklist for protecting against ethical violations with ML.

12:05-12:45 (40m) Data Science, Machine Learning & AI, Expo Hall Deep Learning, Media, Marketing, Advertising, Retail and e-commerce

Deep learning for recommender systems

Oliver Gindele (Datatonic)

The success of deep learning has reached the realm of structured data in the past few years, where neural networks have been shown to improve the effectiveness and predictability of recommendation engines. Oliver Gindele offers a brief overview of such deep recommender systems and explains how they can be implemented in TensorFlow.

14:05-14:45 (40m) Data Science, Machine Learning & AI, Expo Hall Data Integration and Data Pipelines, Deep Learning

AI for good at scale in real time: Challenges in machine learning and deep learning

Alex Jaimes (Dataminr)

When emergency events occur, social signals and sensor data are generated. Alex Jaimes explains how to apply machine learning and deep learning to process large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases.

14:55-15:35 (40m)

Session

11:15-11:55 (40m) Data Engineering and Architecture, Expo Hall, Streaming and IoT Data Platforms, Streaming and realtime analytics, Transportation and Logistics

Streaming at Lyft

Thomas Weise (Lyft)

Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Lyft's systems need to track and react to event streams in real time to update locations, compute routes and estimates, balance prices, and more. Thomas Weise offers an overview of the streaming platform that powers these use cases.

12:05-12:45 (40m) Data Engineering and Architecture, Expo Hall AI and Data technologies in the cloud, Model lifecycle management

Unleashing Apache Kafka and TensorFlow in hybrid architectures

Kai Wähner (Confluent)

How do you leverage the flexibility and extreme scale of the public cloud and the Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures that span multiple public clouds—or bridge your on-premises data center to the cloud? Join Kai Wähner to learn how to use technologies such as TensorFlow with Kafka’s open source ecosystem for machine learning infrastructures.

14:05-14:45 (40m) Data Engineering and Architecture, Expo Hall AI and Data technologies in the cloud

Autoscaling Spark on Kubernetes

Holden Karau (Independent), Kris Nova (Independent)

In the Kubernetes world, where declarative resources are a first-class citizen, running complicated workloads across distributed infrastructure is easy, and processing big data workloads using Spark is common practice, we can finally look at constructing a hybrid system of running Spark in a distributed cloud native way. Join respective experts Kris Nova and Holden Karau for a fun adventure.

14:55-15:35 (40m) Data Engineering and Architecture, Expo Hall Streaming and realtime analytics, Temporal data and time-series

Performant time series data management and analytics with PostgreSQL

Michael Freedman (TimescaleDB | Princeton University)

Time series databases require ingesting high volumes of structured data, answering complex, performant queries for recent and historical time intervals, and performing specialized time-centric analysis and data management. Michael Freedman explains how to avoid these operational problems by reengineering Postgres to serve as a general data platform, including high-volume time series workloads.

11:15-11:55 (40m) Data Engineering and Architecture

Scaling Impala: Common mistakes and best practices

Manish Maheshwari (Cloudera)

Apache Impala is an MPP SQL query engine for planet-scale queries. When set up and used properly, Impala is able to handle hundreds of nodes and tens of thousands of queries hourly. Manish Maheshwari explains how to avoid pitfalls in Impala configuration (memory limits, admission pools, metadata management, statistics), along with best practices and anti-patterns for end users or BI applications.

12:05-12:45 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, Data Platforms, Streaming and realtime analytics

Schema on read and the new logging way

David Josephsen (Sparkpost)

David Josephsen tells the story of how Sparkpost's reliability engineering team abandoned ELK for a DIY schema-on-read logging infrastructure. Join in to learn the architectural details, trials, and tribulations from the company's Internal Event Hose data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet, and AWS Athena to make logging sane.

14:05-14:45 (40m) Data Engineering and Architecture

Mutant tests too: The SQL

Elliot West (Hotels.com), Jaydene Green (Hotels.com)

Elliot West and Jay Green share approaches for applying software engineering best practices to SQL-based data applications to improve maintainability and data quality. Using open source tools, Elliot and Jay show how to build effective test suites for Apache Hive code bases and offer an overview of Mutant Swarm, a tool to identify weaknesses in tests and to measure SQL code coverage.

14:55-15:35 (40m) Data Engineering and Architecture AI and Data technologies in the cloud

The future of cloud native data warehousing: Emerging trends and technologies

Greg Rahn (Cloudera)

Data warehouses have traditionally run in the data center, and in recent years, they've been adapted to be more cloud native. Greg Rahn discusses a number of emerging trends and technologies that will impact how data warehouses are run both in the cloud and on-premises and explains what that means for architects, administrators, and end users.

16:35-17:15 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Platforms

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Thomas Phelan (HPE BlueData)

Organizations need to keep ahead of their competition by using the latest AI, ML, and DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. Thomas Phelan discusses the effective deployment of such applications in a container environment.

11:15-11:55 (40m) Data Engineering and Architecture AI and Data technologies in the cloud

Big data analytics in the public cloud: Challenges and opportunities

Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)

Jian Zhang, Chendi Xue, and Yuan Zhou explore the challenges of migrating big data analytics workloads to the public cloud (e.g., performance lost and missing features) and demonstrate how to use a new in-memory data accelerator leveraging persistent memory and RDMA NICs to resolve this issues and enable new opportunities for big data workloads on the cloud.

12:05-12:45 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Platforms

Herding elephants: Seamless data access in a multicluster clouds

Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)

Travel platform Expedia Group likes to give its data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. Pradeep Bhadani and Elliot West explain how the company built a unified virtual data lake on top of its many heterogeneous and distributed data platforms.

14:05-14:45 (40m) Data Engineering and Architecture Security and Privacy

The vegan data diet: How Wikipedia cuts down privacy issues while keeping data fit

Marcel Ruiz Forns (Wikimedia Foundation)

Analysts and researchers studying Wikipedia are hungry for long-term data to build experiments and feed data-driven decisions. But Wikipedia has a strict privacy policy that prevents storing privacy-sensitive data over 90 days. Marcel Ruiz Forns explains how the Wikimedia Foundation's analytics team is working on a vegan data diet to satisfy both.

14:55-15:35 (40m) Data Engineering and Architecture Automation in data science and big data, Data preparation, data governance, and data lineage

Mastering data with Spark and machine learning

Sonal Goyal (Nube)

Enterprise data on customers, vendors, and products is often siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting, and 360 views. Traditional rule-based MDM systems with legacy architectures struggle to unify this growing data. Sonal Goyal offers an overview of a modern master data application using Spark, Cassandra, ML, and Elastic.

16:35-17:15 (40m) Data Engineering and Architecture Data Integration and Data Pipelines

Migrating Apache Oozie workflows to Apache Airflow

Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University | Google Cloud)

Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems, the former focusing on Apache Hadoop jobs. Feng Lu, James Malone, Apurva Desai, and Cameron Moberg explore an open source Oozie-to-Airflow migration tool developed at Google as a part of creating an effective cross-cloud and cross-system solution.

11:15-11:55 (40m) Data Engineering and Architecture Data preparation, data governance, and data lineage, Financial Services

Half-correct and half-wrong collective data wisdom: 3 patterns to sanity

Sandeep U (Intuit)

Teams today rely on dictionaries of collective wisdom—a mixed bag with regard to correctness: some datasets have accurate attribute details, while others are incorrect and outdated. This significantly impacts productivity of analysts and scientists. Sandeep Uttamchandani outlines three patterns to better manage data dictionaries.

12:05-12:45 (40m) Data Engineering and Architecture Data Platforms, Security and Privacy, Transportation and Logistics

Data science at Deutsche Telekom: Predicting global travel patterns and network demand

Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)

Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management.

14:05-14:45 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, AI and machine learning in the enterprise, Data Platforms, Transportation and Logistics

Unlocking insights in AI by building a feature store

Willem Pienaar (GOJEK), Zhi Ling Chen (GOJEK)

Features are key to driving impact with AI at all scales, allowing organizations to dramatically accelerate innovation and time to market. Willem Pienaar and Zhiling Chen explain how GOJEK, Indonesia's first billion-dollar startup, unlocked insights in AI by building a feature store called Feast, and the lessons they learned along the way.

14:55-15:35 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, IoT and its applications

Architecting a data platform to support analytic workflows for scientific data

Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)

In upstream oil and gas, a vast amount of the data requested for analytics projects is scientific data: physical measurements about the real world. Historically, this data has been managed library style, but a new system was needed to best provide this data. Sun Maria Lehmann and Jane McConnell share architectural best practices learned from their work with subsurface data.

16:35-17:15 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, Retail and e-commerce

From legacy to cloud: An end-to-end data integration journey

Max Schultze (Zalando SE)

Max Schultze details Zalondo's end-to-end data integration platform to serve analytical use cases and machine learning throughout the company, covering raw data collection, standardized data preparation (binary conversion, partitioning, etc.), user-driven analytics, and machine learning.

11:15-11:55 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, Financial Services, Security and Privacy

Transforming a financial services data infrastructure for the modern era by building a PCI DSS-compliant data platform from the ground up on AWS

Eoin O'Flanagan (NewDay), Darragh McConville (Kainos)

Eoin O'Flanagan and Darragh McConville explain how NewDay built a high-performance contemporary data processing platform from the ground up on AWS. Join in to explore the company's journey from a traditional legacy onsite data estate to an entirely cloud-based PCI DSS-compliant platform.

12:05-12:45 (40m) Data Engineering and Architecture AI and machine learning in the enterprise

Application intelligence: Bridging the gap between human expertise and machine learning

Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)

Artificial intelligence and machine learning are now popularly used terms, but how do you make use of these techniques without throwing away the valuable knowledge of experienced employees? Rebecca Simmonds and Michael McCune delve into this idea with examples of how distributed machine learning frameworks fit together naturally with business rules management systems.

14:05-14:45 (40m) Data Engineering and Architecture Security and Privacy, Streaming and realtime analytics

Simplicity at scale: How Cloudflare’s analyses some of the world’s largest DDoS attacks

Tom Walwyn (Cloudflare)

Cloudflare powers nearly 10 percent of all Internet requests worldwide, absorbing some of the largest DDoS attacks. Learn how we use ClickHouse and SQL to simplify our data pipelines on a global scale while experiencing over 10 million events per second.

14:55-15:35 (40m) Data Engineering and Architecture Data Integration and Data Pipelines

Learning how to perform ETL data migrations with open source tool Embulk

Jason Bell (Independent Speaker)

The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. Jason Bell offers an overview of the Embulk tool and outlines some common data migration scenarios that a data engineer could employ using the tool.

11:15-11:55 (40m) Case studies, Strata Business Summit Health and Medicine

Insightful health: Amplifying intelligence in healthcare patient flow execution

Fabio Ferraretto, Claudia Regina Laselva (Albert Einstein Jewish Hospital)

Fabio Ferraretto and Claudia Regina Laselva explain how Hospital Albert Einstein and Accenture evolved patient flow experience and efficiency with the use of applied AI, statistics, and combinatorial math, allowing the hospital to anticipate E2E visibility within patient flow operations, from admission of emergency and elective demands to assignment and medical releases.

12:05-12:45 (40m) Executive Briefing and best practices, Strata Business Summit AI and machine learning in the enterprise

Starting with the end in mind: Lessons learned from data strategies that work

Vidya Raman (Cloudera)

Not surprisingly, there's no single approach to embracing data-driven innovations within any industry vertical. However, some enterprises are doing a better job than others when it comes to establishing a culture, process, and infrastructure that lends itself to data-driven innovations. Vidya Raman explores some key foundational ingredients that span multiple industries.

14:05-14:45 (40m) Case studies, Strata Business Summit AI and machine learning in the enterprise

Practicing data science: A collection of case studies

Rosaria Silipo (KNIME)

Rosaria Silipo shares a collection of past data science projects. While the structure is often similar—data collection, data transformation, model training, deployment—each required its own special trick, whether a change in perspective or a particular technique to deal with special case and special business questions.

14:55-15:35 (40m) Culture and organization, Strata Business Summit AI and machine learning in the enterprise

Data-driven digital transformation and jobs: The new software hierarchy and ML

Robert Cohen (Economic Strategy Institute)

Robert Cohen discusses the skills that employers are seeking from employees in digital jobs, linked to the new software hierarchy driving digital transformation. Robert describes this software hierarchy as one that ranges from DevOps, CI/CD, and microservices to Kubernetes and Istio. This hierarchy is used to define the jobs that are central to data-driven digital transformation.

11:15-11:55 (40m) Executive Briefing and best practices, Strata Business Summit AI and machine learning in the enterprise, Transportation and Logistics

Executive Briefing: Analytics for executives

Brandy Freitas (Pitney Bowes)

Data science is an approachable field given the right framing. Often, though, practitioners and executives are describing opportunities using completely different languages. Brandy Freitas walks you through developing context and vocabulary around data science topics to help build a culture of data within your organization.

12:05-12:45 (40m) Executive Briefing and best practices, Strata Business Summit IoT and its applications, Security and Privacy

Executive Briefing: The intelligent edge and the demise of big data?

Alasdair Allan (Babilim Light Industries)

Alasdair Allan explains why the current age, where privacy is no longer "a social norm," may not long survive the coming of the internet of things, as new smart embedded hardware may cause the demise of large-scale data harvesting. Smart devices will process data at the edge, allowing us to extract insights from the data without storing potentially privacy- and GDPR-infringing data.

14:05-14:45 (40m) Executive Briefing and best practices, Strata Business Summit AI and machine learning in the enterprise

Executive Briefing: The hidden data scientists lurking in your company

Jack Norris (MapR Technologies)

Many companies delay addressing core improvements in increasing revenues, reducing costs and risk exposure by tying changes to a to-be-hired data scientist. Drawing on three customer examples, Jack Norris explains how to achieve excellent results faster by starting with domain experience and helping developers and analysts better leverage data with available and understandable analytics.

14:55-15:35 (40m) Executive Briefing and best practices, Strata Business Summit AI and Data technologies in the cloud

Executive Briefing: AWS technology trends—Data lakes and analytics

Nikki Rouda (Amazon Web Services)

Nikki Rouda shares key trends in data lakes and analytics and explains how they shape the services offered by AWS. Specific topics include the rise of machine-generated data and semistructured and unstructured data as dominant sources of new data, the move toward serverless, SPI-centric computing, and the growing need for local access to data from users around the world.

16:35-17:15 (40m) Executive Briefing and best practices, Strata Business Summit Data Integration and Data Pipelines, Streaming and realtime analytics

Executive Briefing: What it takes to use machine learning in fast data pipelines

Dean Wampler (Anyscale)

Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations.

11:15-11:55 (40m) Data Science, Machine Learning & AI AI and machine learning in the enterprise, Financial Services, Security and Privacy, Text and Language processing and analysis

Fraud detection at a financial institution using unsupervised learning and text mining

David Dogon (Van Lanschot Kempen)

David Dogon dives into a best practice use case for detecting fraud at a financial institution and details a dynamic and robust monitoring system that successfully detects unwanted client behavior. Join in to learn how machine learning models can provide a solution in cases where traditional systems fall short.

12:05-12:45 (40m) Data Science, Machine Learning & AI Deep Learning, Text and Language processing and analysis

NLP Architect by Intel's AI Lab

Moshe Wasserblat (Intel)

Moshe Wasserblat offers an overview of NLP Architect, an open source DL NLP library that provides SOTA NLP models, making it easy for researchers to implement NLP algorithms and for data scientists to build NLP-based solutions for extracting insight from textual data to improve business operations.

14:05-14:45 (40m) Data Science, Machine Learning & AI Graph technologies and analytics

8 prerequisites of a graph query language

Mingxi Wu (TigerGraph)

Graph query language is the key to unleash the value from connected data. Mingxi Wu outlines the eight prerequisites of a practical graph query language, drawn from six years' experience dealing with real-world graph analytical use cases. Along the way, Mingxi compares GSQL, Gremlin, Cypher, and SPARQL, pointing out their respective pros and cons.

14:55-15:35 (40m) Data Science, Machine Learning & AI

Learning with limited labeled data

Shioulin Sam (Cloudera Fast Forward Labs)

Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real-world applications. What if machines could learn with fewer labeled examples? Shioulin Sam shares an algorithmic solution that relies on collaboration between humans and machines to label smartly and discusses product possibilities.

16:35-17:15 (40m) Data Science, Machine Learning & AI AI and machine learning in the enterprise, Financial Services, Security and Privacy

Evaluating cybersecurity defenses with a data science approach

Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)

Cybersecurity analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked as they are bombarded with and burned out by the sheer number of alerts that they must carefully investigate. Brennan Lodge and Jay Kesavan explain how to use a data science model for alert evaluations to empower your cybersecurity analysts.

11:15-11:55 (40m) Data Science, Machine Learning & AI Media, Marketing, Advertising, Retail and e-commerce

Learning "learning to rank"

Sophie Watson (Red Hat)

Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement learning-to-rank algorithms and provides you with the information you need to implement your own successful ranking system.

12:05-12:45 (40m) Data Science, Machine Learning & AI

How to mitigate mobile fraud risk by data analytics

SEONMIN KIM (LINE)

Seonmin Kim offers an introduction to activities that mitigate the risk of mobile payments through various data analytical skills, drawn from actual case studies of mobile frauds, along with tree-based machine learning, graph analytics, and statistical approaches.

14:05-14:45 (40m) Data Science, Machine Learning & AI IoT and its applications, Temporal data and time-series

Reinforcement learning: A gentle introduction and an industrial application

Christian Hidber (bSquare)

Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in Go, or flying a helicopter. No big datasets with the “right” answers are needed: the algorithms learn by experimenting. Christian Hidber shows how and why RL works and demonstrates how to apply it to an industrial hydraulics application with 7,000 clients in 42 countries.

14:55-15:35 (40m) Data Science, Machine Learning & AI IoT and its applications, Temporal data and time-series, Transportation and Logistics

Early incident detection using fusion analytics of commuter-centric data sources

Christopher Hooi (Land Transport Authority of Singapore)

Christopher Hooi offers an overview of the Fusion Analytics for Public Transport Event Response (FASTER) system, a real-time advanced analytics solution for early warning of potential train incidents. FASTER uses engineering and commuter-centric IoT data sources to activate contingency plans at the earliest possible time and reduce impact to commuters.

16:35-17:15 (40m) Data Science, Machine Learning & AI IoT and its applications, Transportation and Logistics

Improving infrastructure efficiency with unsupervised algorithms

Alexandre Hubert (Dataiku)

GRDF helps bring natural gas to nearly 11 million customers every day. Alexandre Hubert explains how, in partnership with GRDF, Dataiku worked to optimize the manual process of qualifying addresses to visit and ultimately save GRDF time and money. This solution was the culmination of a yearlong adventure in the land of maintenance experts, legacy IT systems, and Agile development.

11:15-11:55 (40m) Data Science, Machine Learning & AI Deep Learning, Graph technologies and analytics, Security and Privacy

Deep learning for speech synthesis: The good news, the bad news, and the fake news

Scott Stevenson (Faculty)

Modern deep learning systems allow us to build speech synthesis systems with the naturalness of a human speaker. While there are myriad benevolent applications, this also ushers in a new era of fake news. Scott Stevenson explores the danger of such systems and details how deep learning can also be used to build countermeasures to protect against political disinformation.

12:05-12:45 (40m) Data Science, Machine Learning & AI

Inclusive design: Deep learning on audio in Azure, identifying sounds in real time

Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)

In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure.

14:05-14:45 (40m) Data Science, Machine Learning & AI Deep Learning

Deep learning for fonts

Raghotham Sripadraj (Ericsson), Nischal Harohalli Padmanabha (Omnius)

Deep learning has enabled massive breakthroughs in offbeat tracks and has enabled better understanding of how an artist paints, how an artist composes music, and so on. Nischal Harohalli Padmanabha and Raghotham Sripadraj discuss their project Deep Learning for Humans and their plans to build a font classifier.

14:55-15:35 (40m) Data Science, Machine Learning & AI AI and machine learning in the enterprise, Deep Learning

A deep learning approach to automatic call routing

Tal Doron (GigaSpaces)

Technological advancements are transforming customer experience, and businesses are beginning to benefit from deep learning innovations to automate call center routing to the most proper agent. Tal Doron explains how to run deep learning models with Intel BigDL and Spark frameworks colocated on an in-memory computing platform to enhance the customer experience without the need for GPUs

11:15-11:55 (40m) Sponsored

Oracle's second-generation cloud: Optimized for the partner ecosystem (sponsored by Oracle Cloud Infrastructure)

Ben Lackey (Oracle)

Join Ben Lackey to learn how Oracle Cloud Infrastructure's architecture makes it the right place to run compute-intensive partner applications like H20.ai, Cloudera, DataStax, and more.

9:00-9:05 (5m)

Thursday keynote welcome

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.

9:05-9:20 (15m)

The unstoppable rise of white box data

Chris Taggart (OpenCorporates)

Chris Taggart explains the benefits of white box data and outlines the structural shifts that are moving the data world toward it.

9:20-9:35 (15m)

Building data science capacity in your organization

Shingai Manjengwa (Fireside Analytics Inc.)

Shingai Manjengwa shares insights from teaching data science to 300,000 online learners, second-career college graduates, and grade 12/6th form high school students, explaining how business leaders can increase data science skill sets across different levels and functions in an organization to create real and measurable value from data.

9:35-9:50 (15m)

Combining creativity and analytics

David Boyle (Audience Strategies)

Companies that harness creativity and data in tandem have growth rates twice as high as companies that don’t. David Boyle shares lessons from his successes and failures in trying to do just that across presidential politics, with pop stars, and with power brands in the world of luxury goods. Join in to find out how analysts can work differently to build these partnerships and unlock this growth.

9:50-10:00 (10m)

BMW’s journey to the data-driven enterprise from the edge to AI

Amr Awadallah (Cloudera), Tobias Burger (BMW Group)

BMW Group is an extraordinary company. As a technology pioneer it's an enterprise that recognizes the value that data to offers to the business. The company's global platform draws data from over 150 different systems and delivers governed data to various divisions. Join Amr Awadallah and Tobias Burger to discover some of BMW's most important use cases leveraging data from the edge to AI.

10:00-10:15 (15m)

Rise of the (advertising) machines

Michael Tidmarsh (Ogilvy)

Ogilvy's Mike Tidmarsh looks at how data and AI are radically reshaping the world of marketing communications and explores the impacts—good and bad—for professionals and consumers alike.

10:15-10:35 (20m) Security and Privacy

Privacy, identity, and autonomy in the age of big data and AI

Sandra Wachter (University of Oxford)

Big data analytics and AI draw nonintuitive and unverifiable inferences about the behaviors, preferences, and lives of individuals. These inferences draw on diverse and feature-rich data of unpredictable value and create new opportunities for discriminatory, biased, and invasive decision making. Sandra Wachter discusses how this expands potential victims of discrimination and potential harm.

10:45-11:15 (30m)

Break: Morning break

12:45-14:05 (1h 20m)

Thursday Topic Tables at Lunch

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.

15:35-16:35 (1h)

Break: Afternoon break

8:00-9:00 (1h)

Break: Early morning coffee sponsored by AXA

8:15-8:45 (30m)

Speed Networking

Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

Schedule List ViewGrid View

Topics

Sponsorship Opportunities

Partner Opportunities

Contact Us

Schedule List View Grid View