Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule List View Grid View

Topics

1A 06/07

11:20am Applying petabyte-scale analytics and machine learning to billions of news reading sessions Andrew Montalenti (Parse.ly )

1:10pm Spark NLP in action: How SelectData uses AI to better understand home health patients David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)

2:00pm Big data at speed Ted Malaska (Capital One), Mark Grover (Lyft)

3:30pm Modeling time series in R Jared Lander (Lander Analytics)

4:20pm Analytics maturity: Industry trends and financial impacts Bill Franks (International Institute For Analytics)

1A 08

11:20am Predicting residential occupancy and hot water usage from high-frequency, multivector utilities data Cris Lowery (Baringa Partners), Marc Warner (ASI)

1:10pm Scalable machine learning for data cleaning Ihab Ilyas (University of Waterloo)

2:00pm Let the machines learn to improve data quality Archana Anandakrishnan (American Express)

3:30pm InnerSource for reproducible and extensible business analysis Emily Riederer (Capital One)

4:20pm Infrastructure for deploying machine learning to production in large financial institutions: Lessons learned and best practices Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)

1A 12/14

11:20am The Vega project: Building an ecosystem of tools for interactive visualization Jeffrey Heer (Trifacta | University of Washington)

1:10pm Augmented reality: Going beyond plots in 3D Bob Levy (Virtual Cove, Inc.)

2:00pm Stories beat statistics: How to master the art and science of data storytelling Brent Dykes (Domo)

3:30pm Data visualization in mixed reality with Python Anna Nicanorova (Annalect)

4:20pm UX strategies for underperforming analytics services and data products Brian O'Neill (Designing for Analytics)

1A 15/16

11:20am Democratizing deep learning with transfer learning Lars Hulstaert (Microsoft)

1:10pm A high-performance system for deep learning inference and visual inspection Moty Fania (Intel), Sergei Kom (Intel)

2:00pm Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )

3:30pm Classifying job execution using deep learning Ash Munshi (Pepperdata)

4:20pm Deep learning on audio in Azure to detect sounds in real time Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)

1A 10

11:20am TonY: Native support of TensorFlow on Hadoop Jonathan Hung (LinkedIn), Keqiu Hu (LinkedIn), Zhe Zhang (LinkedIn)

1:10pm Deep learning on YARN: Running distributed TensorFlow, MXNet, Caffe, and XGBoost on Hadoop clusters Wangda Tan (Cloudera)

2:00pm Kubeflow explained: Portable machine learning on Kubernetes Michelle Casbon (Google)

3:30pm Managing data chaos in the world of microservices Oleksii Kachaiev (Attendify)

4:20pm The move to a modern data platform in the cloud: Pitfalls to avoid and best practices to follow Amandeep Khurana (Okera)

1A 21/22

11:20am Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am Holden Karau (Independent), Rachel Warren (Salesforce Einstein), Anya Bida (Salesforce)

1:10pm A/B testing at Uber: How we built a BYOM (bring your own metrics) platform Milene Darnis (Uber)

2:00pm Aetna's advanced analytics platform, Data Fabric Occhio Orsini (Aetna)

3:30pm Self-service modern analytics on the GovCloud Ramesh Krishnan (lmco), Steven Morgan (Lockheed Martin)

4:20pm Building turnkey recommendations for 5% of internet video Nir Yungster (JW Player), Kamil Sindi (JW Player)

1A 23/24

11:20am Progress for big data in Kubernetes Ted Dunning (MapR, now part of HPE)

1:10pm Case study: A Spark-based distributed simulation optimization architecture for portfolio optimization in retail banking Kaushik Deka (Novantas), Ted Gibson (Novantas)

2:00pm Using big data to unlock the delivery of personalized, multilingual real-time chat services for global financial service organizations Timothy Walpole (BJSS)

3:30pm Cassandra versus cloud databases Jonathan Ellis (DataStax)

4:20pm Best practices for developing an enterprise data hub to collect and analyze 1 TB of data a day from a multiple services with Apache Kafka and Google Cloud Platform Kenji Hayashida (Recruit Lifestyle co., ltd.), Toru Sasaki (NTT DATA Corporation)

1E 07/08

11:20am Near-real-time anomaly detection at Lyft Thomas Weise (Lyft), Mark Grover (Lyft)

1:10pm A deep dive into Kafka controller Jun Rao (Confluent)

2:00pm High-performance messaging with Apache Pulsar Karthik Ramasamy (Streamlio), Matteo Merli (Streamlio)

3:30pm Machine learning for nonstationary streaming data using Structured Streaming and StreamDM Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

4:20pm IoT edge processing with Apache NiFi, Apache MiniFi, and multiple deep learning libraries TIMOTHY SPANN (Cloudera)

1E 09

11:20am Data discovery and lineage: Integrating streaming data in the public cloud with on-prem, classic data stores, and heterogeneous schema types Barbara Eckman (Comcast)

1:10pm How Komatsu is improving mining efficiencies using the IoT and machine learning Shawn Terry (Komatsu Mining Corp)

2:00pm Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)

3:30pm Kafka at PayPal: Enabling 400 billion messages a day Kevin Lu (PayPal), Maulin Vasavada (PayPal), Na Yang (PayPal)

4:20pm TuneIn: How to get your jobs tuned while you are sleeping Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn)

1E 10/11

11:20am The care and feeding of data scientists: Concrete tips for retaining your data science team Michelangelo D'Agostino (ShopRunner)

1:10pm Best practices for migrating big data workloads to Amazon Web Services (sponsored by Amazon Web Services) Faria Bruno (Amazon Web Services)

2:00pm Building it beautiful: Analyzing the effectiveness of platform products and marketing at scale Josh Laurito (Squarespace)

3:30pm Scaling data infrastructure in the fashion world; or, “What is this? Business intelligence for ants?” Francesco Mucio (Francescomuc.io)

4:20pm Real-time machine intelligence in IndyCar and Tour de France Yasuyuki Kataoka (NTT Innovation Institute, Inc.)

1E 12/13

11:20am Data and privacy at scale at Wikipedia Nuria Ruiz (Wikimedia)

1:10pm Enacting Data Subject Access Rights for GDPR with data services and data management Jean-Michel Franco (Talend)

2:00pm Digging for gold: Developing AI in healthcare against unstructured text data Chiny Driscoll (MetiStream), Jawad Khan (Rush University Medical Center )

3:30pm Balancing stakeholder interests in personal data governance technology LaVonne Reimer, JD (Lumenous)

4:20pm A day in the life of a data scientist: How do we train our teams to get started with AI? Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)

1E 14

11:20am Executive Briefing: From Business to AI—The missing pieces in becoming "AI ready" Mikio Braun (Zalando)

1:10pm Executive Briefing: Analytics for executives—Building an approachable language to drive data science in your organization Brandy Freitas (Pitney Bowes)

2:00pm Executive Briefing: What you need to know about fast data Dean Wampler (Anyscale)

3:30pm Executive Briefing: Best practices for human in the loop—The business case for active learning Paco Nathan (derwen.ai)

4:20pm Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda) Mathew Lodge (Anaconda)

Expo Hall

11:20am Data at Netflix: See what’s next Michelle Ufford (Netflix)

1:10pm The state of Postgres Umur Cubukcu (Citus Data)

2:00pm Building a high-performance model serving engine from scratch using Kubernetes, GPUs, Docker, Istio, and TensorFlow Chris Fregly (Amazon Web Services)

1A 01/02

11:20am Assumptions, constraints, and risks: How the wrong assumptions can jeopardize any model (sponsored by IBM) Jennifer Shin (8 Path Solutions | NYU Stern | IBM)

1:10pm Quick, reliable, and cost-effective ways to operationalize big data apps (sponsored by Unravel) Shivnath Babu (Unravel Data Systems | Duke University), Madhusudan Tumma (TIAA)

2:00pm Getting the most out of advanced analytics with people (sponsored by Alteryx) Patrick Nussbaumer (Alteryx)

3:30pm Why the internet of things doesn’t exist but will still reshape your business Ajay Kulkarni (TimescaleDB)

4:20pm Assumptions, constraints, and risks: How the wrong assumptions can jeopardize any model (sponsored by IBM) Jennifer Shin (8 Path Solutions | NYU Stern | IBM)

1A 03/04/05

11:20am Building the bridge from big data to ML, featuring Geotab (sponsored by Google Cloud) Bob Bradley (Geotab), Chad W. Jennings (Google)

1:10pm On the road to digital transformation, AI is a team sport (sponsored by Oracle + DataScience.com) Ian Swanson (Oracle)

2:00pm Redis for velocity and volume: Fast data ingest and probabilistic data structures (sponsored by Redi Labs) Kyle Davis (Redis Labs)

3:30pm Stochastic field theory for time series Revant Nayar (FMI Technologies LLC )

1E 06

11:20am Augmented data engineering: Leveraging machine learning in data profiling and discovery (sponsored by Io-Tahoe) Arun Murugan (GE Digital), Jeff Miller (GE)

1:10pm From two weeks in Python to two hours in Pentaho: Building modern big data pipelines for machine learning (sponsored by Hitachi Vantara) Dave Huh (Hitachi Vantara), Kevin Haas (Hitachi Vantara)

2:00pm From analytic silos to analytic democratization: How (and why) companies make the shift (sponsored by Dataiku) Deborah Reynolds (Pfizer), Kurt Muehmel (Dataiku)

3:30pm The importance of experimental iteration: A data-centric approach to an AI project (sponsored by Globant) Antonio Fragoso (Globant)

3E
8:50am Thursday keynotes Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

9:00am Sound design and the future of experience Amber Case (MIT Media Lab)

9:15am Wait. . .pizza is a vegetable? Decoding regulations using machine learning (sponsored by IBM) Dinesh Nirmal (IBM)

9:20am Practical ML today and tomorrow Hilary Mason (Cloudera Fast Forward Labs)

9:30am Derive value from analytics and AI at scale (sponsored by Intel) 马子雅 (Ziya Ma) (Intel)

9:35am Quantifying forgiveness Julia Angwin (ProPublica)

9:55am Smarter cities through Geotab with BigQuery ML and geospatial analytics (sponsored by Google Cloud) Chad W. Jennings (Google)

10:05am Brain-based human-machine interfaces: New developments, legal and ethical issues, and potential uses Amanda Pustilnik (University of Maryland School of Law | Center for Law, Brain & Behavior, Mass. General Hospital)

10:20am The data imperative (sponsored by Zaloni) Ben Sharma (Zaloni)

10:25am Black box: How AI will amplify the best and worst of humanity Jacob Ward (CNN | Al Jazeera | PBS)

10:50am Morning break sponsored by IBM | Room: 3B | Expo Hall

2:30pm Afternoon break sponsored by Google Cloud | Room: 3B | Expo Hall

8:00am Morning Coffee | Room: 3E Foyer

8:00am Speed Networking | Room: Crystal Palace

12:00pm Lunch sponsored by MemSQL Thursday Business Summit Lunch | Room: 3D 09

12:00pm Thursday Topic Tables at Lunch | Room: Expo Hall (Hall 3B)

11:20am-12:00pm (40m) Data science and machine learning Media, Marketing, Advertising, Text and Language processing and analysis

Applying petabyte-scale analytics and machine learning to billions of news reading sessions

Andrew Montalenti (Parse.ly )

What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how Parse.ly has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content.

1:10pm-1:50pm (40m) Data science and machine learning Health and Medicine, Text and Language processing and analysis

Spark NLP in action: How SelectData uses AI to better understand home health patients

David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)

David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols outline a question-answering system for accurately extracting facts from free-text patient records and share best practices for training domain-specific deep learning NLP models. The solution is based on Spark NLP, an extension of Spark ML that provides state-of-the-art performance and accuracy for natural language understanding.

2:00pm-2:40pm (40m) Data engineering and architecture Transportation and Logistics

Big data at speed

Ted Malaska (Capital One), Mark Grover (Lyft)

Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed.

3:30pm-4:10pm (40m) Data science and machine learning Temporal data and time-series analytics

Modeling time series in R

Jared Lander (Lander Analytics)

Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models.

4:20pm-5:00pm (40m) Data science and machine learning, Data-driven business management, Strata Business Summit Machine Learning in the enterprise

Analytics maturity: Industry trends and financial impacts

Bill Franks (International Institute For Analytics)

Drawing on a recent study of the analytics maturity level of large enterprises by the International Institute for Analytics, Bill Franks discusses how maturity varies by industry, shares key steps organizations can take to move up the maturity scale, and explains how the research correlates analytics maturity with a wide range of success metrics, including financial and reputational measures.

11:20am-12:00pm (40m) Data science and machine learning Temporal data and time-series analytics

Predicting residential occupancy and hot water usage from high-frequency, multivector utilities data

Cris Lowery (Baringa Partners), Marc Warner (ASI)

In EU households, heating and hot water alone account for 80% of energy usage. Cristobal Lowery and Marc Warner explain how future home energy management systems could improve their energy efficiency by predicting resident needs through utilities data, with a particular focus on the key data features, the need for data compression, and the data quality challenges.

1:10pm-1:50pm (40m) Data science and machine learning Data preparation, governance and privacy

Scalable machine learning for data cleaning

Ihab Ilyas (University of Waterloo)

Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas explains why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions.

2:00pm-2:40pm (40m) Data science and machine learning Data preparation, governance and privacy, Financial Services

Let the machines learn to improve data quality

Archana Anandakrishnan (American Express)

Building accurate machine learning models hinges on the quality of the data. Errors and anomalies get in the way of data scientists doing their best work. Archana Anandakrishnan explains how American Express created an automated, scalable system for measurement and management of data quality. The methods are modular and adaptable to any domain where accurate decisions from ML models are critical.

3:30pm-4:10pm (40m) Data-driven business management, Strata Business Summit Financial Services

InnerSource for reproducible and extensible business analysis

Emily Riederer (Capital One)

Emily Riederer explains how best practices from data science, open source, and open science can solve common business pain points. Using a case example from Capital One, Emily illustrates how designing empathetic analytical tools and fostering a vibrant InnerSource community are keys to developing reproducible and extensible business analysis.

4:20pm-5:00pm (40m) Data engineering and architecture, Data science and machine learning Financial Services, Model lifecycle management

Infrastructure for deploying machine learning to production in large financial institutions: Lessons learned and best practices

Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)

Large financial institutions have many data science teams (e.g., those for fraud, credit risk, and marketing), each often using diverse set of tools to build predictive models. There are many challenges involved in productionizing these predictive AI models. Harish Doddi and Jerry Xu share challenges and lessons learned deploying AI models to production in large financial institutions.

11:20am-12:00pm (40m) Data science and machine learning, Visualization and user experience

The Vega project: Building an ecosystem of tools for interactive visualization

Jeffrey Heer (Trifacta | University of Washington)

Jeffrey Heer offers an overview of Vega and Vega-Lite—high-level declarative languages for interactive visualization that support exploratory data analysis, communication, and the development of new visualization tools.

1:10pm-1:50pm (40m) Data science and machine learning, Visualization and user experience Ethics and Privacy, Financial Services, Media, Marketing, Advertising

Augmented reality: Going beyond plots in 3D

Bob Levy (Virtual Cove, Inc.)

Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions.

2:00pm-2:40pm (40m) Data science and machine learning, Visualization and user experience

Stories beat statistics: How to master the art and science of data storytelling

Brent Dykes (Domo)

Companies collect all kinds of data and use advanced tools and techniques to find insights, but they often fail in the last mile: communicating insights effectively to drive change. Brent Dykes discusses the power that stories wield over statistics and explores the art and science of data storytelling—an essential skill in today’s data economy.

3:30pm-4:10pm (40m) Data science and machine learning, Visualization and user experience

Data visualization in mixed reality with Python

Anna Nicanorova (Annalect)

Data visualization is supposed to be our map to information. However, contemporary charting techniques have a few shortcomings, including context reduction, hard numeric grasp, and perceptual dehumanization. Anna Nicanorova explains how augmented reality can solve these issues by presenting an intuitive and interactive environment for data exploration.

4:20pm-5:00pm (40m) Data science and machine learning, Visualization and user experience Machine Learning in the enterprise

UX strategies for underperforming analytics services and data products

Brian O'Neill (Designing for Analytics)

Gartner says 85%+ of big data projects will fail, despite the fact your company may have invested millions on engineering implementation. Why are customers and employees not engaging with these products and services? Brian O'Neill explains why a "people first, technology second" mission—a design strategy, in other words—enables the best UX and business outcomes possible.

11:20am-12:00pm (40m) Data science and machine learning Deep Learning

Democratizing deep learning with transfer learning

Lars Hulstaert (Microsoft)

Transfer learning allows data scientists to leverage insights from large labeled datasets. The general idea of transfer learning is to use knowledge learned from tasks for which a lot of labeled data is available in settings where little labeled data is available. Lars Hulstaert explains what transfer learning is and how it can boost your NLP or CV pipelines.

1:10pm-1:50pm (40m) Data science and machine learning Data Platforms, Deep Learning

A high-performance system for deep learning inference and visual inspection

Moty Fania (Intel), Sergei Kom (Intel)

Moty Fania and Sergei Kom share their experience and lessons learned implementing an AI inference platform to enable internal visual inspection use cases. The platform is based on open source technologies and was designed for real-time, streaming, and online actuation.

2:00pm-2:40pm (40m) Big data and data science in the cloud, Data science and machine learning Deep Learning, Media, Marketing, Advertising

Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )

Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé.

3:30pm-4:10pm (40m) Data science and machine learning Deep Learning

Classifying job execution using deep learning

Ash Munshi (Pepperdata)

Ash Munshi outlines a technique for labeling applications using runtime measurements of CPU, memory, and network I/O along with a deep neural network. This labeling groups the applications into buckets that have understandable characteristics, which can then be used to reason about the cluster and its performance.

4:20pm-5:00pm (40m) Big data and data science in the cloud, Data science and machine learning Deep Learning

Deep learning on audio in Azure to detect sounds in real time

Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)

In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure.

11:20am-12:00pm (40m) Data engineering and architecture Data Platforms, Deep Learning

TonY: Native support of TensorFlow on Hadoop

Jonathan Hung (LinkedIn), Keqiu Hu (LinkedIn), Zhe Zhang (LinkedIn)

Jonathan Hung, Keqiu Hu, and Zhe Zhang offer an overview of TensorFlow on YARN (TonY), a framework to natively run TensorFlow on Hadoop. TonY enables running TensorFlow distributed training as a new type of Hadoop application. Its native Hadoop connector, together with other features, aims to run TensorFlow jobs as reliably and flexibly as other first-class citizens on Hadoop.

1:10pm-1:50pm (40m) Data engineering and architecture Data Platforms, Deep Learning, Model lifecycle management

Deep learning on YARN: Running distributed TensorFlow, MXNet, Caffe, and XGBoost on Hadoop clusters

Wangda Tan (Cloudera)

In order to train deep learning and machine learning models, you must leverage applications such as TensorFlow, MXNet, Caffe, and XGBoost. Wangda Tan discusses new features in Apache Hadoop 3.x to better support deep learning workloads and demonstrates how to run these applications on YARN.

2:00pm-2:40pm (40m) Data engineering and architecture Model lifecycle management

Kubeflow explained: Portable machine learning on Kubernetes

Michelle Casbon (Google)

Michelle Casbon demonstrates how to build a machine learning application with Kubeflow. Kubeflow makes it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and supports the full lifecycle of an ML product, including iteration via Jupyter notebooks. Join Michelle to find out what Kubeflow currently supports and the long-term vision for the project.

3:30pm-4:10pm (40m) Data engineering and architecture

Managing data chaos in the world of microservices

Oleksii Kachaiev (Attendify)

When we talk about microservices, we usually focus on the communication layer. In practice, data is the much harder and often overlooked problem. Splitting applications into independent units leads to increased complexity, such as structural and semantic changes, knowledge sharing, and data discovery. Join Alexey Kachayev to explore emerging technologies created to tackle these challenges.

4:20pm-5:00pm (40m) Data engineering and architecture

The move to a modern data platform in the cloud: Pitfalls to avoid and best practices to follow

Amandeep Khurana (Okera)

Amandeep Khurana shares critical data management practices for easy and unified data access that meets security and regulatory compliance, helping you avoid the pitfalls that could lead to complex expensive architectures.

11:20am-12:00pm (40m) Data engineering and architecture

Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am

Holden Karau (Independent), Rachel Warren (Salesforce Einstein), Anya Bida (Salesforce)

Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads.

1:10pm-1:50pm (40m) Data engineering and architecture Data Platforms, Transportation and Logistics

A/B testing at Uber: How we built a BYOM (bring your own metrics) platform

Milene Darnis (Uber)

Every new launch at Uber is vetted via robust A/B testing. Given the pace at which Uber operates, the metrics needed to assess the impact of experiments constantly evolve. Milene Darnis explains how the team built a scalable and self-serve platform that lets users plug in any metric to analyze.

2:00pm-2:40pm (40m) Data engineering and architecture Data Platforms, Health and Medicine

Aetna's advanced analytics platform, Data Fabric

Occhio Orsini (Aetna)

Occhio Orsini offers an overview of Aetna's Data Fabric platform. Join in to learn the needs and desires that led to the creation of the advanced analytics platform, explore the platform's architecture, technology, and capabilities, and understand the key technologies and capabilities that made it possible to build a hybrid solution across on-premises and cloud-hosted data centers.

3:30pm-4:10pm (40m) Big data and data science in the cloud, Data engineering and architecture

Self-service modern analytics on the GovCloud

Ramesh Krishnan (lmco), Steven Morgan (Lockheed Martin)

Lockheed Martin is a data-driven company with a massive variety and volume of data. To extract the most value from its information assets, the company is constantly exploring ways to enable effective self-service scenarios. Ramesh Krishnan and Steve Morgan discuss Lockheed Martin's journey into modern analytics and explore its analytics platform focused on leveraging AWS GovCloud.

4:20pm-5:00pm (40m) Big data and data science in the cloud, Data engineering and architecture Deep Learning, Media, Marketing, Advertising, Recommendation Systems

Building turnkey recommendations for 5% of internet video

Nir Yungster (JW Player), Kamil Sindi (JW Player)

JW Player—the world’s largest network-independent video platform, representing 5% of global internet video—provides on-demand recommendations as a service to thousands of media publishers. Nir Yungster and Kamil Sindi explain how the company is systematically improving model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves.

11:20am-12:00pm (40m) Emerging technologies & case studies

Progress for big data in Kubernetes

Ted Dunning (MapR, now part of HPE)

Stateful containers are a well-known anti-pattern, but the standard solution—managing state in a separate storage tier—is costly and complex. Recent developments have changed things dramatically for the better. In particular, you can now manage a high-performance software-defined-storage tier entirely in Kubernetes. Ted Dunning describes what's new and how it makes big data easier on Kubernetes.

1:10pm-1:50pm (40m) Data engineering and architecture

Case study: A Spark-based distributed simulation optimization architecture for portfolio optimization in retail banking

Kaushik Deka (Novantas), Ted Gibson (Novantas)

Kaushik Deka and Ted Gibson share a large-scale optimization architecture in Spark for a consumer product portfolio optimization use case in retail banking. The architecture combines a simulator that distributes computation of complex real-world scenarios and a constraint optimizer that uses business rules as constraints to meet growth targets.

2:00pm-2:40pm (40m) Data engineering and architecture Data Platforms, Financial Services

Using big data to unlock the delivery of personalized, multilingual real-time chat services for global financial service organizations

Timothy Walpole (BJSS)

Financial service clients demand increased data-driven personalization, faster insight-based decisions, and multichannel real-time access. Tim Walpole details how organizations can deliver real-time, vendor-agnostic, personalized chat services and explores issues around security, privacy, legal sign-off, data compliance, and how the internet of things can be used as a delivery platform.

3:30pm-4:10pm (40m) Big data and data science in the cloud

Cassandra versus cloud databases

Jonathan Ellis (DataStax)

Is open source Apache Cassandra still relevant in an era of hosted cloud databases? Jonathan Ellis discusses Cassandra’s strengths and weaknesses relative to Amazon DynamoDB, Microsoft CosmosDB, and Google Cloud Spanner.

4:20pm-5:00pm (40m) Big data and data science in the cloud, Data engineering and architecture Data Integration and Data Pipelines

Best practices for developing an enterprise data hub to collect and analyze 1 TB of data a day from a multiple services with Apache Kafka and Google Cloud Platform

Kenji Hayashida (Recruit Lifestyle co., ltd.), Toru Sasaki (NTT DATA Corporation)

Recruit Group and NTT DATA Corporation have developed a platform based on a data hub, utilizing Apache Kafka. This platform can handle around 1 TB/day of application logs generated by a number of services in Recruit Group. Kenji Hayashida and Toru Sasaki share best practices for and lessons learned about topics such as schema evolution and network architecture.

11:20am-12:00pm (40m) Data engineering and architecture, Streaming systems & real-time applications Temporal data and time-series analytics, Transportation and Logistics

Near-real-time anomaly detection at Lyft

Thomas Weise (Lyft), Mark Grover (Lyft)

Thomas Weise and Mark Grover explain how Lyft uses its streaming platform to detect and respond to anomalous events, using data science tools for machine learning and a process that allows for fast and predictable deployment.

1:10pm-1:50pm (40m) Streaming systems & real-time applications

A deep dive into Kafka controller

Jun Rao (Confluent)

The controller is the brain of Apache Kafka and is responsible for maintaining the consistency of the replicas. Jun Rao outlines the main data flow in the controller, then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.

2:00pm-2:40pm (40m) Emerging technologies & case studies

High-performance messaging with Apache Pulsar

Karthik Ramasamy (Streamlio), Matteo Merli (Streamlio)

Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.

3:30pm-4:10pm (40m) Data engineering and architecture Temporal data and time-series analytics

Machine learning for nonstationary streaming data using Structured Streaming and StreamDM

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

The StreamDM library provides the largest collection of data stream mining algorithms for Spark. Heitor Murilo Gomes and Albert Bifet explain how to use StreamDM and Structured Streaming to develop, apply, and evaluate learning models specially for nonstationary streams (i.e., those with concept drifts).

4:20pm-5:00pm (40m) Data engineering and architecture

IoT edge processing with Apache NiFi, Apache MiniFi, and multiple deep learning libraries

TIMOTHY SPANN (Cloudera)

Timothy Spann leads a hands-on deep dive into using Apache MiniFi with Apache MXNet and other deep learning libraries on edge devices.

11:20am-12:00pm (40m) Data engineering and architecture, Law, ethics, governance Data Integration and Data Pipelines, Data preparation, governance and privacy, Media, Marketing, Advertising

Data discovery and lineage: Integrating streaming data in the public cloud with on-prem, classic data stores, and heterogeneous schema types

Barbara Eckman (Comcast)

Comcast’s streaming data platform comprises ingest, transformation, and storage services in the public cloud, with Apache Atlas for data discovery and lineage. Barbara Eckman explains how Comcast recently integrated on-prem data sources, including traditional data warehouses and RDBMSs, which required its data governance strategy to include relational and JSON schemas in addition to Apache Avro.

1:10pm-1:50pm (40m) Data engineering and architecture Transportation and Logistics

How Komatsu is improving mining efficiencies using the IoT and machine learning

Shawn Terry (Komatsu Mining Corp)

Global heavy equipment manufacturer Komatsu is using IoT data to continuously monitor some of the largest mining equipment to ultimately improve mine performance and efficiencies. Shawn Terry details the company's data journey and explains how it is using advanced analytics and predictive modeling to drive insights on terabytes of IoT data from connected mining equipment.

2:00pm-2:40pm (40m) Data engineering and architecture Data Platforms, Retail and e-commerce, Transportation and Logistics

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)

Tao Huang, Mang Zhang, and 白冰 explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average.

3:30pm-4:10pm (40m) Streaming systems & real-time applications Data Integration and Data Pipelines, Data Platforms, Financial Services

Kafka at PayPal: Enabling 400 billion messages a day

Kevin Lu (PayPal), Maulin Vasavada (PayPal), Na Yang (PayPal)

PayPal is one of the biggest Kafka users in the industry; it manages and maintains over 40 production Kafka clusters in three geodistributed data centers and supports 400 billion Kafka messages a day. Kevin Lu, Maulin Vasavada, and Na Yang explore the management and monitoring PayPal applies to Kafka, from client-perceived statistics to configuration management, failover, and data loss auditing.

4:20pm-5:00pm (40m) Data engineering and architecture

TuneIn: How to get your jobs tuned while you are sleeping

Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn)

Have you ever tuned a Spark or MR job? If the answer is yes, you already know how difficult it is to tune more than hundred parameters to optimize the resources used. Manoj Kumar, Pralabh Kumar, and Arpan Agrawal offer an overview of TuneIn, an auto-tuning tool developed to minimize the resource usage of jobs. Experiments have shown up to a 50% reduction in resource usage.

11:20am-12:00pm (40m) Data-driven business management, Strata Business Summit Machine Learning in the enterprise, Retail and e-commerce, Transportation and Logistics

The care and feeding of data scientists: Concrete tips for retaining your data science team

Michelangelo D'Agostino (ShopRunner)

Data scientists are hard to hire. But too often, companies struggle to find the right talent only to make avoidable mistakes that cause their best data scientists to leave. From org structure and leadership to tooling, infrastructure, and more, Michelangelo D'Agostino shares concrete (and inexpensive) tips for keeping your data scientists engaged, productive, and adding business value.

1:10pm-1:50pm (40m) Sponsored

Best practices for migrating big data workloads to Amazon Web Services (sponsored by Amazon Web Services)

Faria Bruno (Amazon Web Services)

Bruno Faria explains how to identify the components and workflows in your current environment and shares best practices to migrate these workloads to AWS.

2:00pm-2:40pm (40m) Data-driven business management, Strata Business Summit

Building it beautiful: Analyzing the effectiveness of platform products and marketing at scale

Josh Laurito (Squarespace)

Joshua Laurito explores systems Squarespace built for acquiring and enforcing consistency on obtained data and for inferring conclusions from a company’s marketing and product initiatives. Joshua discusses the intricacies of gathering and evaluating marketing and user data, from raising awareness to driving purchases, and shares results of previous analyses.

3:30pm-4:10pm (40m) Data engineering and architecture, Strata Business Summit Data Platforms, Media, Marketing, Advertising, Retail and e-commerce

Scaling data infrastructure in the fashion world; or, “What is this? Business intelligence for ants?”

Francesco Mucio (Francescomuc.io)

Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead.

4:20pm-5:00pm (40m) Data-driven business management, Strata Business Summit Transportation and Logistics

Real-time machine intelligence in IndyCar and Tour de France

Yasuyuki Kataoka (NTT Innovation Institute, Inc.)

One of the challenges of sports data analytics is how to deliver machine intelligence beyond a mere real-time monitoring tool. Yasuyuki Kataoka highlights various real-time machine learning models in both IndyCar and Tour de France, sharing real-time data processing architectures, machine learning models, and demonstrations that deliver meaningful insights for players and fans.

11:20am-12:00pm (40m) Law, ethics, governance, Strata Business Summit Ethics and Privacy

Data and privacy at scale at Wikipedia

Nuria Ruiz (Wikimedia)

The Wikipedia community feels strongly that you shouldn’t have to provide personal information to participate in the free knowledge movement. Nuria Ruiz discusses the challenges that this strong privacy stance poses for the Wikimedia Foundation, including how it affects data collection, and details some creative workarounds that allow WMF to calculate metrics in a privacy-conscious way.

1:10pm-1:50pm (40m) Law, ethics, governance, Strata Business Summit Data preparation, governance and privacy, Ethics and Privacy

Enacting Data Subject Access Rights for GDPR with data services and data management

Jean-Michel Franco (Talend)

GDPR is more than another regulation to be handled by your back office. Enacting the GDPR's Data Subject Access Rights (DSAR) requires practical actions. Jean-Michel Franco outlines the practical steps to deploy governed data services.

2:00pm-2:40pm (40m) Strata Business Summit Health and Medicine, Text and Language processing and analysis

Digging for gold: Developing AI in healthcare against unstructured text data

Chiny Driscoll (MetiStream), Jawad Khan (Rush University Medical Center )

Chiny Driscoll and Jawad Khan offer an overview of a solution by Cloudera and MetiStream that lets healthcare providers automate the extraction, processing, and analysis of clinical notes within an electronic health record in batch or real time, improving care, identifying errors, and recognizing efficiencies in billing and diagnoses.

3:30pm-4:10pm (40m) Law, ethics, governance Data preparation, governance and privacy, Ethics and Privacy

Balancing stakeholder interests in personal data governance technology

LaVonne Reimer, JD (Lumenous)

GDPR asks us to rethink personal data systems—viewing UI/UX, consent management, and value-add data services through the eyes of subjects of the data. LaVonne Reimer explains why the opportunity in the $150B credit and risk industry is to deploy data governance technologies that balance the interests of individuals to control their own data with requirements for trusted data.

4:20pm-5:00pm (40m) Data-driven business management, Strata Business Summit Machine Learning in the enterprise

A day in the life of a data scientist: How do we train our teams to get started with AI?

Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)

With the growing buzz around data science, many professionals want to learn how to become a data scientist—the role Harvard Business Review called the "sexiest job of the 21st century." Francesca Lazzeri and Jaya Mathew explain what it takes to become a data scientist and how artificial intelligence solutions have started to reinvent businesses.

11:20am-12:00pm (40m) Data-driven business management, Strata Business Summit Machine Learning in the enterprise, Retail and e-commerce

Executive Briefing: From Business to AI—The missing pieces in becoming "AI ready"

Mikio Braun (Zalando)

In order to become "AI ready," an organization not only has to provide the right technical infrastructure for data collection and processing but also must learn new skills. Mikio Braun highlights three pieces companies often miss when trying to become AI ready: making the connection between business problems and AI technology, implementing AI-driven development, and running AI-based projects.

1:10pm-1:50pm (40m) Data-driven business management, Strata Business Summit Machine Learning in the enterprise, Transportation and Logistics

Executive Briefing: Analytics for executives—Building an approachable language to drive data science in your organization

Brandy Freitas (Pitney Bowes)

Data science is an approachable field given the right framing. Often, though, practitioners and executives are describing opportunities using completely different languages. Join Brandy Freitas to develop context and vocabulary around data science topics to help build a culture of data within your organization.

2:00pm-2:40pm (40m) Strata Business Summit, Streaming systems & real-time applications

Executive Briefing: What you need to know about fast data

Dean Wampler (Anyscale)

Streaming data systems, so called "fast data," promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler shares what you need to know to exploit fast data successfully.

3:30pm-4:10pm (40m) Data-driven business management, Strata Business Summit

Executive Briefing: Best practices for human in the loop—The business case for active learning

Paco Nathan (derwen.ai)

Deep learning works well when you have large labeled datasets, but not every team has those assets. Paco Nathan offers an overview of active learning, an ML variant that incorporates human-in-the-loop computing. Active learning focuses input from human experts, leveraging intelligence already in the system, and provides systematic ways to explore and exploit uncertainty in your data.

4:20pm-5:00pm (40m) Sponsored

Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda)

Mathew Lodge (Anaconda)

The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Welcome to the future. Containers and Kubernetes make great language-agnostic distributed computing clusters: it's just as easy to deploy Python as it is Java. Mathew Lodge shows you how.

11:20am-12:00pm (40m) Data engineering and architecture, Expo Hall Data Platforms

Data at Netflix: See what’s next

Michelle Ufford (Netflix)

Michelle Ufford shares some of the cool things Netflix is doing with data and the big bets the company is making on data infrastructure, covering workflow orchestration, machine learning, interactive notebooks, centralized alerting, event-based processing, platform intelligence, and more.

1:10pm-1:50pm (40m) Data engineering and architecture, Expo Hall

The state of Postgres

Umur Cubukcu (Citus Data)

PostgreSQL is often regarded as the world’s most advanced open source database—and it’s on fire. Umur Cubukcu moves beyond the typical list of features in the next release to explore why so many new projects “just use Postgres” as their system of record (or system of engagement) at scale. Along the way, you’ll learn how PostgreSQL’s extension APIs are fueling innovations in relational databases.

2:00pm-2:40pm (40m) Data engineering and architecture, Expo Hall Model lifecycle management

Building a high-performance model serving engine from scratch using Kubernetes, GPUs, Docker, Istio, and TensorFlow

Chris Fregly (Amazon Web Services)

Chris Fregly details a full-featured, open source end-to-end TensorFlow model training and deployment system, using the latest advancements with Kubernetes, TensorFlow, and GPUs.

11:20am-12:00pm (40m) Sponsored

Assumptions, constraints, and risks: How the wrong assumptions can jeopardize any model (sponsored by IBM)

Jennifer Shin (8 Path Solutions | NYU Stern | IBM)

Common wisdom dictates that we should never make assumptions, but assumptions are essential in the creation of statistical models. Jennifer Shin explores how assumptions fit into the creation of a statistical model, the pitfalls of applying a model to data without taking the underlying assumptions into account, and how to identify datasets where the model and its assumptions are applicable.

1:10pm-1:50pm (40m) Sponsored

Quick, reliable, and cost-effective ways to operationalize big data apps (sponsored by Unravel)

Shivnath Babu (Unravel Data Systems | Duke University), Madhusudan Tumma (TIAA)

Operationalizing big data apps in a quick, reliable, and cost-effective manner remains a daunting task. Shivnath Babu and Madhusudan Tumma outline common problems and their causes and share best practices to find and fix these problems quickly and prevent such problems from happening in the first place.

2:00pm-2:40pm (40m) Sponsored

Getting the most out of advanced analytics with people (sponsored by Alteryx)

Patrick Nussbaumer (Alteryx)

There is a lot of buzz around data science and machine learning in the world today. Unfortunately, to truly innovate with data and advanced capabilities, organizations need to expand their focus beyond just a few specialists. Patrick Nussbaumer details how focusing on people can help improve analytic value and drive innovation.

3:30pm-4:10pm (40m) Data-driven business management

Why the internet of things doesn’t exist but will still reshape your business

Ajay Kulkarni (TimescaleDB)

Ajay Kulkarni explores the underlying changes that are characterizing the next wave of computing and shares several ways in which individual businesses and overall industries will be transformed.

4:20pm-5:00pm (40m) Sponsored

Assumptions, constraints, and risks: How the wrong assumptions can jeopardize any model (sponsored by IBM)

Jennifer Shin (8 Path Solutions | NYU Stern | IBM)

11:20am-12:00pm (40m) Sponsored

Building the bridge from big data to ML, featuring Geotab (sponsored by Google Cloud)

Bob Bradley (Geotab), Chad W. Jennings (Google)

If your company isn’t good at analytics, it’s not ready for AI. Bob Bradley and Chad W. Jennings explain how the right data strategy can set you up for success in machine learning and artificial intelligence—the new ground for gaining competitive edge and creating business value. You'll then see an in-depth demonstration of Google technology from smart cities innovator Geotab.

1:10pm-1:50pm (40m) Sponsored

On the road to digital transformation, AI is a team sport (sponsored by Oracle + DataScience.com)

Ian Swanson (Oracle)

Ian Swanson explores why and how data scientists and line-of-business leaders must treat AI as a team sport and explains what tools are needed to deploy models and applications that truly inform decision making.

2:00pm-2:40pm (40m) Sponsored

Redis for velocity and volume: Fast data ingest and probabilistic data structures (sponsored by Redi Labs)

Kyle Davis (Redis Labs)

Kyle Davis explains how Redis can be used for ingesting high-velocity data from large-scale platforms and IoT data collections as well as for storing and querying data using probabilistic data structures that trade some precision for both higher speed and lower storage requirements. Along the way, Kyle shares examples and a demo of the solution.

3:30pm-4:10pm (40m) Financial Services, Temporal data and time-series analytics

Stochastic field theory for time series

Revant Nayar (FMI Technologies LLC )

Machine learning has so far underperformed in time series prediction (slowness and overfitting), and classical methods are ineffective at capturing nonlinearity. Revant Nayar shares an alternative approach that is faster and more transparent and does not overfit. It can also pick up regime changes in the time series and systematically captures all the nonlinearity of a given dataset.

11:20am-12:00pm (40m) Sponsored

Augmented data engineering: Leveraging machine learning in data profiling and discovery (sponsored by Io-Tahoe)

Arun Murugan (GE Digital), Jeff Miller (GE)

Arun Murugan and Jeff Miller detail how complex relationships are discovered and modeled to simplify analytics while keeping an Agile architecture for data acquisition. You’ll see how GE uses machine learning (powered by Io-Tahoe) in data discovery and profiling for data engineering of the development of a standard data model essential to enterprise use cases.

1:10pm-1:50pm (40m) Sponsored

From two weeks in Python to two hours in Pentaho: Building modern big data pipelines for machine learning (sponsored by Hitachi Vantara)

Dave Huh (Hitachi Vantara), Kevin Haas (Hitachi Vantara)

Data in most organizations today is massive, messy, and often found in silos. With so many sources to analyze, data engineers need to construct robust data pipelines using automation and minimize duplicate processes, as computation is costly for big data. David Huh shares strategies to construct data pipelines for machine learning, including one to reduce time to insight from weeks to hours.

2:00pm-2:40pm (40m) Sponsored

From analytic silos to analytic democratization: How (and why) companies make the shift (sponsored by Dataiku)

Deborah Reynolds (Pfizer), Kurt Muehmel (Dataiku)

By creating a collaborative and interactive analytic environment, a forward-thinking company may harness the best capabilities of its business analysts and data scientists to answer the company’s most pressing business questions. Deborah Reynolds and Kurt Muehmel explain how large enterprises can successfully put data at the core of everyday business decisions.

3:30pm-4:10pm (40m) Sponsored

The importance of experimental iteration: A data-centric approach to an AI project (sponsored by Globant)

Antonio Fragoso (Globant)

Antonio Fragoso explores the key aspects of implementing a natural language processing project within your organization and reveals the necessary steps for making it a success. Antonio focuses on how to leverage an iterative process that can pave the way toward building a successful product.

8:50am-9:00am (10m)

Thursday keynotes

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.

9:00am-9:15am (15m)

Sound design and the future of experience

Amber Case (MIT Media Lab)

Amber Case outlines several methods that product designers and managers can use to improve everyday interactions through an understanding and application of sound design.

9:15am-9:20am (5m) Sponsored

Wait. . .pizza is a vegetable? Decoding regulations using machine learning (sponsored by IBM)

Dinesh Nirmal (IBM)

IBM Analytics’s Dinesh Nirmal solves school lunch and the struggle to keep ahead of regulations. With AI tech like deep learning and NLG, supplying meals to California’s kids leaps from enriching metadata for compliance to actionable insights for the business.

9:20am-9:30am (10m)

Practical ML today and tomorrow

Hilary Mason (Cloudera Fast Forward Labs)

Machine learning and artificial intelligence are exciting technologies, but real value comes from marrying those capabilities with the right business problems. Hilary Mason explores the current state of these technologies, investigates what's coming next in applied machine learning, and explains how to identify and execute on the right business opportunities at the right time.

9:30am-9:35am (5m) Sponsored

Derive value from analytics and AI at scale (sponsored by Intel)

马子雅 (Ziya Ma) (Intel)

Data is the fuel for analytics and AI workloads, but the challenges in using it are constant. Ziya Ma discusses how recent innovations from Intel in high-capacity persistent memory and open source software are accelerating production-scale deployments, delivering breakthrough optimizations and faster insights to a wide range of opportunities in the digital enterprise.

9:35am-9:55am (20m)

Quantifying forgiveness

Julia Angwin (ProPublica)

Algorithms are increasingly arbiters of forgiveness. Julia Angwin discusses what she has learned about forgiveness in her series of articles on algorithmic accountability and the lessons we all need to learn for the coming AI future.

9:55am-10:00am (5m) Sponsored

Smarter cities through Geotab with BigQuery ML and geospatial analytics (sponsored by Google Cloud)

Chad W. Jennings (Google)

Cities all over the world are using data and analytics to optimize infrastructure, but city planners are often held back by outdated data gathering methods and legacy analysis tools. Chad Jennings details how Geotab, a leader in IoT fleet logistics, brought BigQuery's unique machine learning and geospatial capabilities to its existing datasets to deliver a more capable solution to city planners.

10:05am-10:20am (15m) Ethics and Privacy

Brain-based human-machine interfaces: New developments, legal and ethical issues, and potential uses

Amanda Pustilnik (University of Maryland School of Law | Center for Law, Brain & Behavior, Mass. General Hospital)

Have you ever dreamed you could read minds? Do telekinesis? Maybe fly a magic carpet by thought alone? Until now, these powers have existed only in the realm of imagination or, more recently, video, AR, and VR games. Join Amanda Pustilnik to learn how brain-based human-machine interfaces are beginning to offer these powers in near-commercially-viable forms.

10:20am-10:25am (5m) Sponsored

The data imperative (sponsored by Zaloni)

Ben Sharma (Zaloni)

Once, a company could live 60-70 years on the S&P 500. Now it averages 15 years. If companies were people, this would be an epidemic on par with the Black Plague. But the same things that dragged humanity out of that dark age can drag companies out of this one.

10:25am-10:45am (20m) Ethics and Privacy

Black box: How AI will amplify the best and worst of humanity

Jacob Ward (CNN | Al Jazeera | PBS)

For most of us, our own mind is a black box—an all-powerful and utterly mysterious device that runs our lives for us, using rules and shortcuts of which we aren’t even aware. Jacob Ward reveals the relationship between the unconscious habits of our minds and the way that AI is poised to amplify them, alter them, maybe even reprogram them.

10:50am-11:20am (30m)

Break: Morning break sponsored by IBM

2:30pm-3:30pm (1h)

Break: Afternoon break sponsored by Google Cloud

8:00am-8:45am (45m)

Break: Morning Coffee

8:00am-8:30am (30m)

Speed Networking

Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees.

12:00pm-1:10pm (1h 10m)

Thursday Business Summit Lunch

Join Strata Business Summit speakers and attendees for a networking lunch on Thursday.

12:00pm-1:10pm (1h 10m)

Thursday Topic Tables at Lunch

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

Schedule List ViewGrid View

Topics

Sponsorship Opportunities

Partner Opportunities

Contact Us

Schedule List View Grid View