Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
 
Expo Hall (Capital Hall N24)
Add How to keep ethical with machine learning to your personal schedule
11:15 How to keep ethical with machine learning Jerry Overton (DXC)
Add Deep learning for recommender systems to your personal schedule
12:05 Deep learning for recommender systems Oliver Gindele (Datatonic)
14:55
Expo Hall 2 (Capital Hall N24)
Add Streaming at Lyft to your personal schedule
11:15 Streaming at Lyft Thomas Weise (Lyft)
Add Autoscaling Spark on Kubernetes to your personal schedule
14:05 Autoscaling Spark on Kubernetes Holden Karau (Independent), Kris Nova (Independent)
Add Performant time series data management and analytics with PostgreSQL to your personal schedule
14:55 Performant time series data management and analytics with PostgreSQL Michael Freedman (TimescaleDB | Princeton University)
S11 A
Add Scaling Impala: Common mistakes and best practices to your personal schedule
11:15 Scaling Impala: Common mistakes and best practices Manish Maheshwari (Cloudera)
Add Schema on read and the new logging way to your personal schedule
12:05 Schema on read and the new logging way David Josephsen (Sparkpost)
Add Mutant tests too: The SQL to your personal schedule
14:05 Mutant tests too: The SQL Elliot West (Hotels.com), Jaydene Green (Hotels.com)
S11 B
Add Big data analytics in the public cloud: Challenges and opportunities  to your personal schedule
11:15 Big data analytics in the public cloud: Challenges and opportunities Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)
Add Herding elephants: Seamless data access in a multicluster clouds to your personal schedule
12:05 Herding elephants: Seamless data access in a multicluster clouds Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)
Add Migrating Apache Oozie workflows to Apache Airflow to your personal schedule
16:35 Migrating Apache Oozie workflows to Apache Airflow Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University | Google Cloud)
Capital Suite 8/9
Add Data science at Deutsche Telekom: Predicting global travel patterns and network demand to your personal schedule
12:05 Data science at Deutsche Telekom: Predicting global travel patterns and network demand Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
Add Unlocking insights in AI by building a feature store to your personal schedule
14:05 Unlocking insights in AI by building a feature store Willem Pienaar (GOJEK), Zhi Ling Chen (GOJEK)
Add Architecting a data platform to support analytic workflows for scientific data to your personal schedule
14:55 Architecting a data platform to support analytic workflows for scientific data Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
Capital Suite 10/11
Add Application intelligence: Bridging the gap between human expertise and machine learning to your personal schedule
12:05 Application intelligence: Bridging the gap between human expertise and machine learning Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)
Capital Suite 12
Add Insightful health: Amplifying intelligence in healthcare patient flow execution to your personal schedule
11:15 Insightful health: Amplifying intelligence in healthcare patient flow execution Fabio Ferraretto, Claudia Regina Laselva (Albert Einstein Jewish Hospital)
Capital Suite 13
Add Executive Briefing: Analytics for executives to your personal schedule
11:15 Executive Briefing: Analytics for executives Brandy Freitas (Pitney Bowes)
Add Executive Briefing: The intelligent edge and the demise of big data? to your personal schedule
12:05 Executive Briefing: The intelligent edge and the demise of big data? Alasdair Allan (Babilim Light Industries)
Capital Suite 14
Add NLP Architect by Intel's AI Lab to your personal schedule
12:05 NLP Architect by Intel's AI Lab Moshe Wasserblat (Intel)
Add 8 prerequisites of a graph query language to your personal schedule
14:05 8 prerequisites of a graph query language Mingxi Wu (TigerGraph)
Add Learning with limited labeled data to your personal schedule
14:55 Learning with limited labeled data Shioulin Sam (Cloudera Fast Forward Labs)
Add Evaluating cybersecurity defenses with a data science approach to your personal schedule
16:35 Evaluating cybersecurity defenses with a data science approach Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)
Capital Suite 15/16
Add Learning "learning to rank" to your personal schedule
11:15 Learning "learning to rank" Sophie Watson (Red Hat)
Add Early incident detection using fusion analytics of commuter-centric data sources to your personal schedule
14:55 Early incident detection using fusion analytics of commuter-centric data sources Christopher Hooi (Land Transport Authority of Singapore)
Capital Suite 17
Add Inclusive design: Deep learning on audio in Azure, identifying sounds in real time to your personal schedule
12:05 Inclusive design: Deep learning on audio in Azure, identifying sounds in real time Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
Add Deep learning for fonts to your personal schedule
14:05 Deep learning for fonts Raghotham Sripadraj (Ericsson), Nischal Harohalli Padmanabha (Omnius)
Capital Suite 2/3
Add Thursday keynote welcome to your personal schedule
Auditorium
9:00 Thursday keynote welcome Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Add The unstoppable rise of white box data to your personal schedule
9:05 The unstoppable rise of white box data Chris Taggart (OpenCorporates)
Add Building data science capacity in your organization to your personal schedule
9:20 Building data science capacity in your organization Shingai Manjengwa (Fireside Analytics Inc.)
Add Combining creativity and analytics to your personal schedule
9:35 Combining creativity and analytics David Boyle (Audience Strategies)
Add BMW’s journey to the data-driven enterprise from the edge to AI to your personal schedule
9:50 BMW’s journey to the data-driven enterprise from the edge to AI Amr Awadallah (Cloudera), Tobias Burger (BMW Group)
Add Rise of the (advertising) machines to your personal schedule
10:00 Rise of the (advertising) machines Michael Tidmarsh (Ogilvy)
Add Privacy, identity, and autonomy in the age of big data and AI to your personal schedule
10:15 Privacy, identity, and autonomy in the age of big data and AI Sandra Wachter (University of Oxford)
10:45 Morning break | Room: Expo Hall
Add Thursday Topic Tables at Lunch to your personal schedule
12:45 Thursday Topic Tables at Lunch | Room: Expo Hall
15:35 Afternoon break | Room: Expo Hall
8:00 Early morning coffee sponsored by AXA | Room: Level 0 - Blvd
Add Speed Networking to your personal schedule
8:15 Speed Networking | Room: Level 0 - Blvd
11:15-11:55 (40m) Data Science, Machine Learning & AI, Expo Hall Ethics
How to keep ethical with machine learning
Jerry Overton (DXC)
Machine learning (ML) algorithms are good at learning new behaviors but bad at identifying when those behaviors are harmful or don’t make sense. Bias, ethics, and fairness are big risk factors in ML. However, we creators have a lot of experience dealing with intelligent beings—one another. Jerry Overton uses this common sense to build a checklist for protecting against ethical violations with ML.
12:05-12:45 (40m) Data Science, Machine Learning & AI, Expo Hall Deep Learning, Media, Marketing, Advertising, Retail and e-commerce
Deep learning for recommender systems
Oliver Gindele (Datatonic)
The success of deep learning has reached the realm of structured data in the past few years, where neural networks have been shown to improve the effectiveness and predictability of recommendation engines. Oliver Gindele offers a brief overview of such deep recommender systems and explains how they can be implemented in TensorFlow.
14:05-14:45 (40m) Data Science, Machine Learning & AI, Expo Hall Data Integration and Data Pipelines, Deep Learning
AI for good at scale in real time: Challenges in machine learning and deep learning
Alex Jaimes (Dataminr)
When emergency events occur, social signals and sensor data are generated. Alex Jaimes explains how to apply machine learning and deep learning to process large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases.
14:55-15:35 (40m)
Session
11:15-11:55 (40m) Data Engineering and Architecture, Expo Hall, Streaming and IoT Data Platforms, Streaming and realtime analytics, Transportation and Logistics
Streaming at Lyft
Thomas Weise (Lyft)
Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Lyft's systems need to track and react to event streams in real time to update locations, compute routes and estimates, balance prices, and more. Thomas Weise offers an overview of the streaming platform that powers these use cases.
12:05-12:45 (40m) Data Engineering and Architecture, Expo Hall AI and Data technologies in the cloud, Model lifecycle management
Unleashing Apache Kafka and TensorFlow in hybrid architectures
Kai Wähner (Confluent)
How do you leverage the flexibility and extreme scale of the public cloud and the Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures that span multiple public clouds—or bridge your on-premises data center to the cloud? Join Kai Wähner to learn how to use technologies such as TensorFlow with Kafka’s open source ecosystem for machine learning infrastructures.
14:05-14:45 (40m) Data Engineering and Architecture, Expo Hall AI and Data technologies in the cloud
Autoscaling Spark on Kubernetes
Holden Karau (Independent), Kris Nova (Independent)
In the Kubernetes world, where declarative resources are a first-class citizen, running complicated workloads across distributed infrastructure is easy, and processing big data workloads using Spark is common practice, we can finally look at constructing a hybrid system of running Spark in a distributed cloud native way. Join respective experts Kris Nova and Holden Karau for a fun adventure.
14:55-15:35 (40m) Data Engineering and Architecture, Expo Hall Streaming and realtime analytics, Temporal data and time-series
Performant time series data management and analytics with PostgreSQL
Michael Freedman (TimescaleDB | Princeton University)
Time series databases require ingesting high volumes of structured data, answering complex, performant queries for recent and historical time intervals, and performing specialized time-centric analysis and data management. Michael Freedman explains how to avoid these operational problems by reengineering Postgres to serve as a general data platform, including high-volume time series workloads.
11:15-11:55 (40m) Data Engineering and Architecture
Scaling Impala: Common mistakes and best practices
Manish Maheshwari (Cloudera)
Apache Impala is an MPP SQL query engine for planet-scale queries. When set up and used properly, Impala is able to handle hundreds of nodes and tens of thousands of queries hourly. Manish Maheshwari explains how to avoid pitfalls in Impala configuration (memory limits, admission pools, metadata management, statistics), along with best practices and anti-patterns for end users or BI applications.
12:05-12:45 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, Data Platforms, Streaming and realtime analytics
Schema on read and the new logging way
David Josephsen (Sparkpost)
David Josephsen tells the story of how Sparkpost's reliability engineering team abandoned ELK for a DIY schema-on-read logging infrastructure. Join in to learn the architectural details, trials, and tribulations from the company's Internal Event Hose data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet, and AWS Athena to make logging sane.
14:05-14:45 (40m) Data Engineering and Architecture
Mutant tests too: The SQL
Elliot West (Hotels.com), Jaydene Green (Hotels.com)
Elliot West and Jay Green share approaches for applying software engineering best practices to SQL-based data applications to improve maintainability and data quality. Using open source tools, Elliot and Jay show how to build effective test suites for Apache Hive code bases and offer an overview of Mutant Swarm, a tool to identify weaknesses in tests and to measure SQL code coverage.
14:55-15:35 (40m) Data Engineering and Architecture AI and Data technologies in the cloud
The future of cloud native data warehousing: Emerging trends and technologies
Greg Rahn (Cloudera)
Data warehouses have traditionally run in the data center, and in recent years, they've been adapted to be more cloud native. Greg Rahn discusses a number of emerging trends and technologies that will impact how data warehouses are run both in the cloud and on-premises and explains what that means for architects, administrators, and end users.
16:35-17:15 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Platforms
Deep learning with TensorFlow and Spark using GPUs and Docker containers
Thomas Phelan (HPE BlueData)
Organizations need to keep ahead of their competition by using the latest AI, ML, and DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. Thomas Phelan discusses the effective deployment of such applications in a container environment.
11:15-11:55 (40m) Data Engineering and Architecture AI and Data technologies in the cloud
Big data analytics in the public cloud: Challenges and opportunities
Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)
Jian Zhang, Chendi Xue, and Yuan Zhou explore the challenges of migrating big data analytics workloads to the public cloud (e.g., performance lost and missing features) and demonstrate how to use a new in-memory data accelerator leveraging persistent memory and RDMA NICs to resolve this issues and enable new opportunities for big data workloads on the cloud.
12:05-12:45 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Platforms
Herding elephants: Seamless data access in a multicluster clouds
Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)
Travel platform Expedia Group likes to give its data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. Pradeep Bhadani and Elliot West explain how the company built a unified virtual data lake on top of its many heterogeneous and distributed data platforms.
14:05-14:45 (40m) Data Engineering and Architecture Security and Privacy
The vegan data diet: How Wikipedia cuts down privacy issues while keeping data fit
Marcel Ruiz Forns (Wikimedia Foundation)
Analysts and researchers studying Wikipedia are hungry for long-term data to build experiments and feed data-driven decisions. But Wikipedia has a strict privacy policy that prevents storing privacy-sensitive data over 90 days. Marcel Ruiz Forns explains how the Wikimedia Foundation's analytics team is working on a vegan data diet to satisfy both.
14:55-15:35 (40m) Data Engineering and Architecture Automation in data science and big data, Data preparation, data governance, and data lineage
Mastering data with Spark and machine learning
Sonal Goyal (Nube)
Enterprise data on customers, vendors, and products is often siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting, and 360 views. Traditional rule-based MDM systems with legacy architectures struggle to unify this growing data. Sonal Goyal offers an overview of a modern master data application using Spark, Cassandra, ML, and Elastic.
16:35-17:15 (40m) Data Engineering and Architecture Data Integration and Data Pipelines
Migrating Apache Oozie workflows to Apache Airflow
Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University | Google Cloud)
Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems, the former focusing on Apache Hadoop jobs. Feng Lu, James Malone, Apurva Desai, and Cameron Moberg explore an open source Oozie-to-Airflow migration tool developed at Google as a part of creating an effective cross-cloud and cross-system solution.
11:15-11:55 (40m) Data Engineering and Architecture Data preparation, data governance, and data lineage, Financial Services
Half-correct and half-wrong collective data wisdom: 3 patterns to sanity
Sandeep U (Intuit)
Teams today rely on dictionaries of collective wisdom—a mixed bag with regard to correctness: some datasets have accurate attribute details, while others are incorrect and outdated. This significantly impacts productivity of analysts and scientists. Sandeep Uttamchandani outlines three patterns to better manage data dictionaries.
12:05-12:45 (40m) Data Engineering and Architecture Data Platforms, Security and Privacy, Transportation and Logistics
Data science at Deutsche Telekom: Predicting global travel patterns and network demand
Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management.
14:05-14:45 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, AI and machine learning in the enterprise, Data Platforms, Transportation and Logistics
Unlocking insights in AI by building a feature store
Willem Pienaar (GOJEK), Zhi Ling Chen (GOJEK)
Features are key to driving impact with AI at all scales, allowing organizations to dramatically accelerate innovation and time to market. Willem Pienaar and Zhiling Chen explain how GOJEK, Indonesia's first billion-dollar startup, unlocked insights in AI by building a feature store called Feast, and the lessons they learned along the way.
14:55-15:35 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, IoT and its applications
Architecting a data platform to support analytic workflows for scientific data
Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
In upstream oil and gas, a vast amount of the data requested for analytics projects is scientific data: physical measurements about the real world. Historically, this data has been managed library style, but a new system was needed to best provide this data. Sun Maria Lehmann and Jane McConnell share architectural best practices learned from their work with subsurface data.
16:35-17:15 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, Retail and e-commerce
From legacy to cloud: An end-to-end data integration journey
Max Schultze (Zalando SE)
Max Schultze details Zalondo's end-to-end data integration platform to serve analytical use cases and machine learning throughout the company, covering raw data collection, standardized data preparation (binary conversion, partitioning, etc.), user-driven analytics, and machine learning.
11:15-11:55 (40m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Integration and Data Pipelines, Financial Services, Security and Privacy
Transforming a financial services data infrastructure for the modern era by building a PCI DSS-compliant data platform from the ground up on AWS
Eoin O'Flanagan (NewDay), Darragh McConville (Kainos)
Eoin O'Flanagan and Darragh McConville explain how NewDay built a high-performance contemporary data processing platform from the ground up on AWS. Join in to explore the company's journey from a traditional legacy onsite data estate to an entirely cloud-based PCI DSS-compliant platform.
12:05-12:45 (40m) Data Engineering and Architecture AI and machine learning in the enterprise
Application intelligence: Bridging the gap between human expertise and machine learning
Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)
Artificial intelligence and machine learning are now popularly used terms, but how do you make use of these techniques without throwing away the valuable knowledge of experienced employees? Rebecca Simmonds and Michael McCune delve into this idea with examples of how distributed machine learning frameworks fit together naturally with business rules management systems.
14:05-14:45 (40m) Data Engineering and Architecture Security and Privacy, Streaming and realtime analytics
Simplicity at scale: How Cloudflare’s analyses some of the world’s largest DDoS attacks
Tom Walwyn (Cloudflare)
Cloudflare powers nearly 10 percent of all Internet requests worldwide, absorbing some of the largest DDoS attacks. Learn how we use ClickHouse and SQL to simplify our data pipelines on a global scale while experiencing over 10 million events per second.
14:55-15:35 (40m) Data Engineering and Architecture Data Integration and Data Pipelines
Learning how to perform ETL data migrations with open source tool Embulk
Jason Bell (Independent Speaker)
The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. Jason Bell offers an overview of the Embulk tool and outlines some common data migration scenarios that a data engineer could employ using the tool.
11:15-11:55 (40m) Case studies, Strata Business Summit Health and Medicine
Insightful health: Amplifying intelligence in healthcare patient flow execution
Fabio Ferraretto, Claudia Regina Laselva (Albert Einstein Jewish Hospital)
Fabio Ferraretto and Claudia Regina Laselva explain how Hospital Albert Einstein and Accenture evolved patient flow experience and efficiency with the use of applied AI, statistics, and combinatorial math, allowing the hospital to anticipate E2E visibility within patient flow operations, from admission of emergency and elective demands to assignment and medical releases.
12:05-12:45 (40m) Executive Briefing and best practices, Strata Business Summit AI and machine learning in the enterprise
Starting with the end in mind: Lessons learned from data strategies that work
Vidya Raman (Cloudera)
Not surprisingly, there's no single approach to embracing data-driven innovations within any industry vertical. However, some enterprises are doing a better job than others when it comes to establishing a culture, process, and infrastructure that lends itself to data-driven innovations. Vidya Raman explores some key foundational ingredients that span multiple industries.
14:05-14:45 (40m) Case studies, Strata Business Summit AI and machine learning in the enterprise
Practicing data science: A collection of case studies
Rosaria Silipo (KNIME)
Rosaria Silipo shares a collection of past data science projects. While the structure is often similar—data collection, data transformation, model training, deployment—each required its own special trick, whether a change in perspective or a particular technique to deal with special case and special business questions.
14:55-15:35 (40m) Culture and organization, Strata Business Summit AI and machine learning in the enterprise
Data-driven digital transformation and jobs: The new software hierarchy and ML
Robert Cohen (Economic Strategy Institute)
Robert Cohen discusses the skills that employers are seeking from employees in digital jobs, linked to the new software hierarchy driving digital transformation. Robert describes this software hierarchy as one that ranges from DevOps, CI/CD, and microservices to Kubernetes and Istio. This hierarchy is used to define the jobs that are central to data-driven digital transformation.
11:15-11:55 (40m) Executive Briefing and best practices, Strata Business Summit AI and machine learning in the enterprise, Transportation and Logistics
Executive Briefing: Analytics for executives
Brandy Freitas (Pitney Bowes)
Data science is an approachable field given the right framing. Often, though, practitioners and executives are describing opportunities using completely different languages. Brandy Freitas walks you through developing context and vocabulary around data science topics to help build a culture of data within your organization.
12:05-12:45 (40m) Executive Briefing and best practices, Strata Business Summit IoT and its applications, Security and Privacy
Executive Briefing: The intelligent edge and the demise of big data?
Alasdair Allan (Babilim Light Industries)
Alasdair Allan explains why the current age, where privacy is no longer "a social norm," may not long survive the coming of the internet of things, as new smart embedded hardware may cause the demise of large-scale data harvesting. Smart devices will process data at the edge, allowing us to extract insights from the data without storing potentially privacy- and GDPR-infringing data.
14:05-14:45 (40m) Executive Briefing and best practices, Strata Business Summit AI and machine learning in the enterprise
Executive Briefing: The hidden data scientists lurking in your company
Jack Norris (MapR Technologies)
Many companies delay addressing core improvements in increasing revenues, reducing costs and risk exposure by tying changes to a to-be-hired data scientist. Drawing on three customer examples, Jack Norris explains how to achieve excellent results faster by starting with domain experience and helping developers and analysts better leverage data with available and understandable analytics.
14:55-15:35 (40m) Executive Briefing and best practices, Strata Business Summit AI and Data technologies in the cloud
Executive Briefing: AWS technology trends—Data lakes and analytics
Nikki Rouda (Amazon Web Services)
Nikki Rouda shares key trends in data lakes and analytics and explains how they shape the services offered by AWS. Specific topics include the rise of machine-generated data and semistructured and unstructured data as dominant sources of new data, the move toward serverless, SPI-centric computing, and the growing need for local access to data from users around the world.
16:35-17:15 (40m) Executive Briefing and best practices, Strata Business Summit Data Integration and Data Pipelines, Streaming and realtime analytics
Executive Briefing: What it takes to use machine learning in fast data pipelines
Dean Wampler (Anyscale)
Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations.
11:15-11:55 (40m) Data Science, Machine Learning & AI AI and machine learning in the enterprise, Financial Services, Security and Privacy, Text and Language processing and analysis
Fraud detection at a financial institution using unsupervised learning and text mining
David Dogon (Van Lanschot Kempen)
David Dogon dives into a best practice use case for detecting fraud at a financial institution and details a dynamic and robust monitoring system that successfully detects unwanted client behavior. Join in to learn how machine learning models can provide a solution in cases where traditional systems fall short.
12:05-12:45 (40m) Data Science, Machine Learning & AI Deep Learning, Text and Language processing and analysis
NLP Architect by Intel's AI Lab
Moshe Wasserblat (Intel)
Moshe Wasserblat offers an overview of NLP Architect, an open source DL NLP library that provides SOTA NLP models, making it easy for researchers to implement NLP algorithms and for data scientists to build NLP-based solutions for extracting insight from textual data to improve business operations.
14:05-14:45 (40m) Data Science, Machine Learning & AI Graph technologies and analytics
8 prerequisites of a graph query language
Mingxi Wu (TigerGraph)
Graph query language is the key to unleash the value from connected data. Mingxi Wu outlines the eight prerequisites of a practical graph query language, drawn from six years' experience dealing with real-world graph analytical use cases. Along the way, Mingxi compares GSQL, Gremlin, Cypher, and SPARQL, pointing out their respective pros and cons.
14:55-15:35 (40m) Data Science, Machine Learning & AI
Learning with limited labeled data
Shioulin Sam (Cloudera Fast Forward Labs)
Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real-world applications. What if machines could learn with fewer labeled examples? Shioulin Sam shares an algorithmic solution that relies on collaboration between humans and machines to label smartly and discusses product possibilities.
16:35-17:15 (40m) Data Science, Machine Learning & AI AI and machine learning in the enterprise, Financial Services, Security and Privacy
Evaluating cybersecurity defenses with a data science approach
Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)
Cybersecurity analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked as they are bombarded with and burned out by the sheer number of alerts that they must carefully investigate. Brennan Lodge and Jay Kesavan explain how to use a data science model for alert evaluations to empower your cybersecurity analysts.
11:15-11:55 (40m) Data Science, Machine Learning & AI Media, Marketing, Advertising, Retail and e-commerce
Learning "learning to rank"
Sophie Watson (Red Hat)
Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement learning-to-rank algorithms and provides you with the information you need to implement your own successful ranking system.
12:05-12:45 (40m) Data Science, Machine Learning & AI
How to mitigate mobile fraud risk by data analytics
SEONMIN KIM (LINE)
Seonmin Kim offers an introduction to activities that mitigate the risk of mobile payments through various data analytical skills, drawn from actual case studies of mobile frauds, along with tree-based machine learning, graph analytics, and statistical approaches.
14:05-14:45 (40m) Data Science, Machine Learning & AI IoT and its applications, Temporal data and time-series
Reinforcement learning: A gentle introduction and an industrial application
Christian Hidber (bSquare)
Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in Go, or flying a helicopter. No big datasets with the “right” answers are needed: the algorithms learn by experimenting. Christian Hidber shows how and why RL works and demonstrates how to apply it to an industrial hydraulics application with 7,000 clients in 42 countries.
14:55-15:35 (40m) Data Science, Machine Learning & AI IoT and its applications, Temporal data and time-series, Transportation and Logistics
Early incident detection using fusion analytics of commuter-centric data sources
Christopher Hooi (Land Transport Authority of Singapore)
Christopher Hooi offers an overview of the Fusion Analytics for Public Transport Event Response (FASTER) system, a real-time advanced analytics solution for early warning of potential train incidents. FASTER uses engineering and commuter-centric IoT data sources to activate contingency plans at the earliest possible time and reduce impact to commuters.
16:35-17:15 (40m) Data Science, Machine Learning & AI IoT and its applications, Transportation and Logistics
Improving infrastructure efficiency with unsupervised algorithms
Alexandre Hubert (Dataiku)
GRDF helps bring natural gas to nearly 11 million customers every day. Alexandre Hubert explains how, in partnership with GRDF, Dataiku worked to optimize the manual process of qualifying addresses to visit and ultimately save GRDF time and money. This solution was the culmination of a yearlong adventure in the land of maintenance experts, legacy IT systems, and Agile development.
11:15-11:55 (40m) Data Science, Machine Learning & AI Deep Learning, Graph technologies and analytics, Security and Privacy
Deep learning for speech synthesis: The good news, the bad news, and the fake news
Scott Stevenson (Faculty)
Modern deep learning systems allow us to build speech synthesis systems with the naturalness of a human speaker. While there are myriad benevolent applications, this also ushers in a new era of fake news. Scott Stevenson explores the danger of such systems and details how deep learning can also be used to build countermeasures to protect against political disinformation.
12:05-12:45 (40m) Data Science, Machine Learning & AI
Inclusive design: Deep learning on audio in Azure, identifying sounds in real time
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure.
14:05-14:45 (40m) Data Science, Machine Learning & AI Deep Learning
Deep learning for fonts
Raghotham Sripadraj (Ericsson), Nischal Harohalli Padmanabha (Omnius)
Deep learning has enabled massive breakthroughs in offbeat tracks and has enabled better understanding of how an artist paints, how an artist composes music, and so on. Nischal Harohalli Padmanabha and Raghotham Sripadraj discuss their project Deep Learning for Humans and their plans to build a font classifier.
14:55-15:35 (40m) Data Science, Machine Learning & AI AI and machine learning in the enterprise, Deep Learning
A deep learning approach to automatic call routing
Tal Doron (GigaSpaces)
Technological advancements are transforming customer experience, and businesses are beginning to benefit from deep learning innovations to automate call center routing to the most proper agent. Tal Doron explains how to run deep learning models with Intel BigDL and Spark frameworks colocated on an in-memory computing platform to enhance the customer experience without the need for GPUs
11:15-11:55 (40m) Sponsored
Oracle's second-generation cloud: Optimized for the partner ecosystem (sponsored by Oracle Cloud Infrastructure)
Ben Lackey (Oracle)
Join Ben Lackey to learn how Oracle Cloud Infrastructure's architecture makes it the right place to run compute-intensive partner applications like H20.ai, Cloudera, DataStax, and more.
9:00-9:05 (5m)
Thursday keynote welcome
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.
9:05-9:20 (15m)
The unstoppable rise of white box data
Chris Taggart (OpenCorporates)
Chris Taggart explains the benefits of white box data and outlines the structural shifts that are moving the data world toward it.
9:20-9:35 (15m)
Building data science capacity in your organization
Shingai Manjengwa (Fireside Analytics Inc.)
Shingai Manjengwa shares insights from teaching data science to 300,000 online learners, second-career college graduates, and grade 12/6th form high school students, explaining how business leaders can increase data science skill sets across different levels and functions in an organization to create real and measurable value from data.
9:35-9:50 (15m)
Combining creativity and analytics
David Boyle (Audience Strategies)
Companies that harness creativity and data in tandem have growth rates twice as high as companies that don’t. David Boyle shares lessons from his successes and failures in trying to do just that across presidential politics, with pop stars, and with power brands in the world of luxury goods. Join in to find out how analysts can work differently to build these partnerships and unlock this growth.
9:50-10:00 (10m)
BMW’s journey to the data-driven enterprise from the edge to AI
Amr Awadallah (Cloudera), Tobias Burger (BMW Group)
BMW Group is an extraordinary company. As a technology pioneer it's an enterprise that recognizes the value that data to offers to the business. The company's global platform draws data from over 150 different systems and delivers governed data to various divisions. Join Amr Awadallah and Tobias Burger to discover some of BMW's most important use cases leveraging data from the edge to AI.
10:00-10:15 (15m)
Rise of the (advertising) machines
Michael Tidmarsh (Ogilvy)
Ogilvy's Mike Tidmarsh looks at how data and AI are radically reshaping the world of marketing communications and explores the impacts—good and bad—for professionals and consumers alike.
10:15-10:35 (20m) Security and Privacy
Privacy, identity, and autonomy in the age of big data and AI
Sandra Wachter (University of Oxford)
Big data analytics and AI draw nonintuitive and unverifiable inferences about the behaviors, preferences, and lives of individuals. These inferences draw on diverse and feature-rich data of unpredictable value and create new opportunities for discriminatory, biased, and invasive decision making. Sandra Wachter discusses how this expands potential victims of discrimination and potential harm.
10:45-11:15 (30m)
Break: Morning break
12:45-14:05 (1h 20m)
Thursday Topic Tables at Lunch
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
15:35-16:35 (1h)
Break: Afternoon break
8:00-9:00 (1h)
Break: Early morning coffee sponsored by AXA
8:15-8:45 (30m)
Speed Networking
Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees.