Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY
 

Schedule

< Filters

No Results Found

Clear all filters

Close

Filters

      Clear filters
      1A 06/07
      Add Griffin: Fast-tracking model development in Hadoop  to your personal schedule
      1:15pm Griffin: Fast-tracking model development in Hadoop Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
      Add Tensor abuse in the workplace to your personal schedule
      2:05pm Tensor abuse in the workplace Ted Dunning (MapR Technologies)
      Add Anomaly detection on live data to your personal schedule
      4:35pm Anomaly detection on live data Arun Kejariwal (MZ), Francois Orsini (MZ), Dhruv Choudhary (MZ)
      1A 08/10
      Add Leveraging open source automated data science tools to your personal schedule
      11:20am Leveraging open source automated data science tools Eduardo Arino de la Rubia (Domino Data Lab)
      Add Julia and Spark, better together to your personal schedule
      2:05pm Julia and Spark, better together Viral Shah (Julia Computing), Stefan Karpinski (The Julia Language)
      1A 12/14
      Add Deep learning for recommender systems to your personal schedule
      11:20am Deep learning for recommender systems Nick Pentreath (IBM)
      Add AI for business analytics to your personal schedule
      1:15pm AI for business analytics Richard Tibbetts (MIT)
      Add AI-driven next-generation developer tools to your personal schedule
      2:05pm AI-driven next-generation developer tools Bargava Subramanian (Independent), Harjindersingh Mistry (Ola)
      Add Deploying deep learning to assist the digital pathologist to your personal schedule
      4:35pm Deploying deep learning to assist the digital pathologist Jon Fuller (KNIME), Olivia Klose (Microsoft)
      1A 15/16/17
      Add Lessons from an AWS migration to your personal schedule
      11:20am Lessons from an AWS migration Chris Mills (The Meet Group)
      Add How to successfully run data pipelines in the cloud to your personal schedule
      2:55pm How to successfully run data pipelines in the cloud Jennifer Wu (Cloudera), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
      1A 18
      Add The EOI framework for big data analytics to drive business impact at scale to your personal schedule
      1:15pm The EOI framework for big data analytics to drive business impact at scale Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)
      Add Executive panel: Big data use cases around the world to your personal schedule
      2:55pm Executive panel: Big data use cases around the world Steven Totman (Cloudera), Siew Choo Soh (DBS Bank), Meena Ram (CIBC), David Leach (Qrious)
      Add The data lake: Improving the role of Hadoop in data-driven business management to your personal schedule
      4:35pm The data lake: Improving the role of Hadoop in data-driven business management Philip Russom (TDWI: The Data Warehousing Institute)
      1A 21/22
      Add Analytics at Wikipedia to your personal schedule
      1:15pm Analytics at Wikipedia Andrew Otto (Wikimedia Foundation), Fangjin Yang (Imply)
      Add Managing core data entities for internal customers at Spotify to your personal schedule
      2:05pm Managing core data entities for internal customers at Spotify Sneha Rao (Spotify), Joel Östlund (Spotify)
      Add HDFS on Kubernetes: Lessons learned to your personal schedule
      2:55pm HDFS on Kubernetes: Lessons learned Kimoon Kim (Pepperdata)
      1A 23/24
      Add Implementing Hadoop to save lives to your personal schedule
      1:15pm Implementing Hadoop to save lives Tony McAllister (Be the Match (National Marrow Donor Program))
      Add The columnar roadmap: Apache Parquet and Apache Arrow to your personal schedule
      2:05pm The columnar roadmap: Apache Parquet and Apache Arrow Julien Le Dem (Apache Parquet)
      Add Creating a DevOps practice for analytics to your personal schedule
      4:35pm Creating a DevOps practice for analytics Bob Eilbacher (Caserta)
      1E 07/08
      1E 09
      Add The sunset of lambda: New architectures amplify IoT impact to your personal schedule
      11:20am The sunset of lambda: New architectures amplify IoT impact Michael Crutcher (Cloudera), Ryan Lippert (Cloudera)
      Add Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog to your personal schedule
      1:15pm Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog Matteo Merli (Streamlio), Sijie Guo (Streamlio)
      Add Seeing everything so managers can act on anything: The IoT in DHL Supply Chain operations to your personal schedule
      2:05pm Seeing everything so managers can act on anything: The IoT in DHL Supply Chain operations Javier Esplugas (DHL Supply Chain), Kevin Parent (Conduce)
      Add IIoT data fusion: Bridging the gap from data to value to your personal schedule
      2:55pm IIoT data fusion: Bridging the gap from data to value Alexandra Gunderson (Arundo Analytics)
      Add How to build a digital twin  to your personal schedule
      4:35pm How to build a digital twin Lloyd Palum (Vnomics)
      1E 10/11
      Add The five dysfunctions of a data engineering team to your personal schedule
      2:55pm The five dysfunctions of a data engineering team Jesse Anderson (Big Data Institute)
      1E 12/13
      Add Executive Briefing: Talking to machines—Natural language today to your personal schedule
      11:20am Executive Briefing: Talking to machines—Natural language today Hilary Mason (Fast Forward Labs)
      Add Executive Briefing: Managing successful data projects—Technology selection and team building to your personal schedule
      4:35pm Executive Briefing: Managing successful data projects—Technology selection and team building Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
      1E 14
      Add Exactly once, more than once: Apache Kafka, Heron, and Apache Apex to your personal schedule
      11:20am Exactly once, more than once: Apache Kafka, Heron, and Apache Apex Dean Wampler (Lightbend), Jun Rao (Confluent), Karthik Ramasamy (Streamlio), Pramod Immaneni (DataTorrent)
      Add Ask me anything: Hadoop application architectures to your personal schedule
      1:15pm Ask me anything: Hadoop application architectures Mark Grover (Lyft), Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment), Gwen Shapira (Confluent)
      Add Ask me anything: Data & Society  to your personal schedule
      2:05pm Ask me anything: Data & Society danah boyd (Microsoft Research | Data & Society), Madeleine Elish (Data & Society)
      Add Ask me anything: Running data science in the enterprise and architecting data platforms to your personal schedule
      2:55pm Ask me anything: Running data science in the enterprise and architecting data platforms John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science)
      1E 15/16
      Add Data futures: Exploring the everyday implications of increasing access to our personal data to your personal schedule
      11:20am Data futures: Exploring the everyday implications of increasing access to our personal data Daniel Goddemeyer (OFFC NYC), Dominikus Baur (Freelance)
      Add GDPR: Getting your data ready for heavy, new EU privacy regulations to your personal schedule
      1:15pm GDPR: Getting your data ready for heavy, new EU privacy regulations Steven Ross (Cloudera), Mark Donsky (Cloudera)
      Add Show me my data, and I’ll tell you who I am. to your personal schedule
      2:05pm Show me my data, and I’ll tell you who I am. Majken Sander (TimeXtender)
      Add MacroBase: A search engine for fast data streams to your personal schedule
      2:55pm MacroBase: A search engine for fast data streams Sahaana Suri (Stanford University)
      Add Topic modeling openNASA data  to your personal schedule
      4:35pm Topic modeling openNASA data Noemi Derzsy (Rensselaer Polytechnic Institute)
      1A 04/05
      1E 17
      1A 01/02
      Add Taming the ever-evolving compliance beast: Lessons learned at LinkedIn to your personal schedule
      2:55pm Taming the ever-evolving compliance beast: Lessons learned at LinkedIn Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
      1A 03
      Add Streamline Data Science Pipeline with GPU Data Frame  (sponsored by NVIDIA) to your personal schedule
      11:20am Streamline Data Science Pipeline with GPU Data Frame (sponsored by NVIDIA) Jim McHugh (NVIDIA), Todd Mostak (MapD), Srisatish Ambati (0xdata Inc), Stanley Seibert (Anaconda)
      1E 06
      Add Automated data pipelines in hybrid environments: Myth or reality? (sponsored by BMC) to your personal schedule
      2:05pm Automated data pipelines in hybrid environments: Myth or reality? (sponsored by BMC) Basil Faruqui (BMC Software), Jon Ouimet (BMC Software)
      Add Thursday keynotes to your personal schedule
      3E
      8:50am Thursday keynotes Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
      Add The US EPA: Digital transformation through data science to your personal schedule
      8:55am The US EPA: Digital transformation through data science Robin Thottungal (US Environmental Protection Agency)
      Add A tale of two cafeterias: Focus on the line of business to your personal schedule
      9:10am A tale of two cafeterias: Focus on the line of business Tanvi Singh (Credit Suisse)
      Add How the IoT and machine learning keep America truckin' to your personal schedule
      9:25am How the IoT and machine learning keep America truckin' Mike Olson (Cloudera), Terry Kline (Navistar)
      Add The real project of AI ethics to your personal schedule
      9:35am The real project of AI ethics Joanna Bryson (University of Bath | Princeton Center for Information Technology Policy)
      Add Human-AI interaction: Autonomous service robots to your personal schedule
      10:00am Human-AI interaction: Autonomous service robots Manuela Veloso (Carnegie Mellon University)
      Add Your data is being manipulated. to your personal schedule
      10:20am Your data is being manipulated. danah boyd (Microsoft Research | Data & Society)
      Add WTF? What's the future and why it's up to us to your personal schedule
      10:35am WTF? What's the future and why it's up to us Tim O'Reilly (O'Reilly Media)
      Add Thursday Topic Tables at Lunch to your personal schedule
      12:00pm Lunch sponsored by Microsoft Thursday Topic Tables at Lunch | Room: 3A
      10:50am Morning break sponsored by Google | Room: Expo Hall
      3:35pm Afternoon break sponsored by Cisco | Room: Expo Hall
      Add Speed Networking to your personal schedule
      8:00am Speed Networking | Room: Crystal Palace
      11:20am-12:00pm (40m) Machine Learning & Data Science Text
      PyTextRank: Graph algorithms for enhanced natural language processing
      Paco Nathan (O'Reilly Media)
      Paco Nathan demonstrates how to use PyTextRank—an open source Python implementation of TextRank that builds atop spaCy, datasketch, NetworkX, and other popular libraries to prepare raw text for AI applications in media and learning—to move beyond outdated techniques such as stemming, n-grams, or bag-of-words while performing advanced NLP on single-server solutions.
      1:15pm-1:55pm (40m) Data engineering, Machine Learning & Data Science Architecture, Financial services
      Griffin: Fast-tracking model development in Hadoop
      Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
      Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results.
      2:05pm-2:45pm (40m) Data science & advanced analytics, Machine Learning & Data Science
      Tensor abuse in the workplace
      Ted Dunning (MapR Technologies)
      Ted Dunning offers an overview of tensor computing—covering, in practical terms, the high-level principles behind tensor computing systems—and explains how it can be put to good use in a variety of settings beyond training deep neural networks (the most common use case).
      2:55pm-3:35pm (40m) Data science & advanced analytics, Machine Learning & Data Science Text
      How machine learning with open source tools helps everyone build better products
      Michelle Casbon (Qordoba)
      Michelle Casbon explores the machine learning and natural language processing that enables teams to build products that feel native to every user and explains how Qordoba is tackling the underserved domain of localization using open source tools, including Kubernetes, Docker, Scala, Apache Spark, Apache Cassandra, and Apache PredictionIO (incubating).
      4:35pm-5:15pm (40m) Data science & advanced analytics, Machine Learning & Data Science, Real-time applications IoT, Streaming
      Anomaly detection on live data
      Arun Kejariwal (MZ), Francois Orsini (MZ), Dhruv Choudhary (MZ)
      Services such as YouTube, Netflix, and Spotify popularized streaming in different industry segments, but these services do not center around live data—best exemplified by sensor data—which will be increasingly important in the future. Arun Kejariwal, Francois Orsini, and Dhruv Choudhary demonstrate how to leverage Satori to collect, discover, and react to live data feeds at ultralow latencies.
      11:20am-12:00pm (40m) Data science & advanced analytics, Machine Learning & Data Science
      Leveraging open source automated data science tools
      Eduardo Arino de la Rubia (Domino Data Lab)
      The promise of the automated statistician is as old as statistics itself. Eduardo Arino de la Rubia explores the tools created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation. Along the way, Eduardo compares open source tools such as TPOT and auto-sklearn and discusses their place in the DS workflow.
      1:15pm-1:55pm (40m) Big data and the Cloud, Machine Learning & Data Science Cloud, R
      Using R and Spark to analyze data on Amazon S3
      Edgar Ruiz (RStudio)
      With R and sparklyr, a Spark standalone cluster can be used to analyze large datasets found in S3 buckets. Edgar Ruiz walks you through setting up a Spark standalone cluster using EC2 and offers an overview of S3 bucket folder and file setup, connecting R to Spark, the settings needed to read S3 data into Spark, and a data import and wrangle approach.
      2:05pm-2:45pm (40m) Spark & beyond
      Julia and Spark, better together
      Viral Shah (Julia Computing), Stefan Karpinski (The Julia Language)
      Spark is a fast and general engine for large-scale data. Julia is a fast and general engine for large-scale compute. Viral Shah and Stefan Karpinski explain how combining Julia's compute and Spark's data processing capabilities makes amazing things possible.
      2:55pm-3:35pm (40m) Emerging Technologies, Machine Learning & Data Science Deep learning
      Considerations for hardware-accelerated machine learning platforms
      Mike Pittaro (Dell EMC)
      The advances we see in machine learning would be impossible without hardware improvements, but building a high-performance hardware platform is tricky. It involves hardware choices, an understanding of software frameworks and algorithms, and how they interact. Mike Pittaro shares the secrets of matching the right hardware and tools to the right algorithms for optimal performance.
      4:35pm-5:15pm (40m) Emerging Technologies, Machine Learning & Data Science Cloud
      rosettaHUB: A global hub for reproducible and collaborative data science
      Karim Chine (RosettaHUB)
      Karim Chine offers an overview of rosettaHUB—which aims to establish a global open data science metacloud centered on usability, reproducibility, auditability, and shareability—and shares the results of the rosettaHUB/AWS Educate initiative, which involved 30 higher education institutions and research labs and over 3,000 researchers, educators, and students.
      11:20am-12:00pm (40m) Data science & advanced analytics, Machine Learning & Data Science ecommerce, Streaming
      Deep learning for recommender systems
      Nick Pentreath (IBM)
      In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice.
      1:15pm-1:55pm (40m) Artificial Intelligence, Machine Learning & Data Science AI
      AI for business analytics
      Richard Tibbetts (MIT)
      Businesses have spent decades trying to make better decisions by collecting and analyzing structured data. New AI technologies are beginning to transform this process. Richard Tibbetts explores AI that guides business analysts to ask statistically sensible questions and lets junior data scientists answer questions in minutes that previously took trained statisticians hours.
      2:05pm-2:45pm (40m) Data science & advanced analytics, Machine Learning & Data Science Deep learning, Platform
      AI-driven next-generation developer tools
      Bargava Subramanian (Independent), Harjindersingh Mistry (Ola)
      Bargava Subramanian and Harjinder Mistry explain how machine learning and deep learning techniques are helping Red Hat build smart developer tools to make software developers become more efficient.
      2:55pm-3:35pm (40m) Data science & advanced analytics, Machine Learning & Data Science Deep learning, Streaming
      Real-time image classification: Using convolutional neural networks on real-time streaming data
      Josh Patterson (Skymind), Kirit Basu (StreamSets )
      Enterprises building data lakes often have to deal with very large volumes of image data that they have collected over the years. Josh Patterson and Kirit Basu explain how some of the most sophisticated big data deployments are using convolutional neural nets to automatically classify images and add rich context about the content of the image, in real time, while ingesting data at scale.
      4:35pm-5:15pm (40m) Big data and the Cloud, Machine Learning & Data Science Deep learning, Healthcare
      Deploying deep learning to assist the digital pathologist
      Jon Fuller (KNIME), Olivia Klose (Microsoft)
      Jon Fuller and Olivia Klose explain how KNIME, Apache Spark, and Microsoft Azure enable fast and cheap automated classification of malignant lymphoma type in digital pathology images. The trained model is deployed to end users as a web application using the KNIME WebPortal.
      11:20am-12:00pm (40m) Big data and the Cloud, Data Engineering & Architecture Cloud
      Lessons from an AWS migration
      Chris Mills (The Meet Group)
      if(we)'s batch event processing pipeline is different from yours, but the process of migrating it from running in a data center to running in AWS is likely pretty similar. Chris Mills explains what was easier than expected, what was harder, and what the company wished it had known before starting the migration.
      1:15pm-1:55pm (40m) Big data and the Cloud, Data Engineering & Architecture Cloud
      Automating cloud cluster deployment: Beyond the book
      Bill Havanki (Cloudera)
      Speed and reliability in deploying big data clusters is key for effectiveness in the cloud. Drawing on ideas from his book Moving Hadoop to the Cloud, which covers essential practices like baking images and automating cluster configuration, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up.
      2:05pm-2:45pm (40m) Big data and the Cloud, Data Engineering & Architecture Cloud
      From notebooks to cloud native: A modern path for data-driven applications
      Michael McCune (Red Hat)
      Notebook interfaces like Apache Zeppelin and Project Jupyter are excellent starting points for sketching out ideas and exploring data-driven algorithms, but where does the process lead after the notebook work has been completed? Michael McCune offers some answers as they relate to cloud-native platforms.
      2:55pm-3:35pm (40m) Big data and the Cloud, Data Engineering & Architecture Architecture
      How to successfully run data pipelines in the cloud
      Jennifer Wu (Cloudera), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
      With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads. Jennifer Wu, Philip Langdale, and Kostas Sakellis explore the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers.
      4:35pm-5:15pm (40m) Data Engineering & Architecture, Data-driven business management Cloud
      What can we learn from 750 billion GitHub events and 42 TB of code?
      Felipe Hoffa (Google)
      With Google BigQuery anyone can easily analyze the more than five years of GitHub metadata and 42+ terabytes of open source code. Felipe Hoffa explains how to leverage this data to understand the community and code related to any language or project. Relevant for open source creators, users, and choosers, this is data that you can leverage to make better choices.
      11:20am-12:00pm (40m) Big data and the Cloud, Data Engineering & Architecture, Strata Business Summit Cloud
      Performance tuning your Hadoop/Spark clusters to use cloud storage
      Stephen Wu (Microsoft)
      Remote storage in the cloud provides an infinitely scalable, cost-effective, and performant solution for big data customers. Adoption is rapid due to the flexibility and cost savings associated with unlimited storage capacity when separating compute and storage. Stephen Wu demonstrates how to correctly performance tune your workloads when your data is stored in remote storage in the cloud.
      1:15pm-1:55pm (40m) Data-driven business management, Strata Business Summit Media
      The EOI framework for big data analytics to drive business impact at scale
      Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)
      Michael Li and Chi-Yi Kuan offer an overview of the EOI (enable-optimize-innovate) framework for big data analytics and explain how to leverage this framework to drive and grow business in key corporate functions, such as product, marketing, and sales.
      2:05pm-2:45pm (40m) Business case studies, Strata Business Summit
      Putting data to work: How to optimize workforce staffing to improve organization profitability
      Francesca Lazzeri (Microsoft), Hong Lu (Microsoft)
      New machine learning technologies allow companies to apply better staffing strategies by taking advantage of historical data. Francesca Lazzeri and Hong Lu share a workforce placement recommendation solution that recommends staff with the best professional profile for new projects.
      2:55pm-3:35pm (40m) Business case studies, Strata Business Summit Cloud, Financial services
      Executive panel: Big data use cases around the world
      Steven Totman (Cloudera), Siew Choo Soh (DBS Bank), Meena Ram (CIBC), David Leach (Qrious)
      Big data and the cloud have spread around the world, and Singapore, New Zealand, Australia, and Canada are already seeing dramatic investments and returns. In a panel moderated by Steve Totman, senior executives from a variety of leading companies, including DBS, CIBC, and Qrious, share use cases, challenges, and how to be successful.
      4:35pm-5:15pm (40m) Data-driven business management, Strata Business Summit Architecture
      The data lake: Improving the role of Hadoop in data-driven business management
      Philip Russom (TDWI: The Data Warehousing Institute)
      Philip Russom explains how a data lake can improve the role of Hadoop in data-driven business management. With the right end-user tools, a data lake can enable self-service data practices that wring business value from big data and modernize and extend programs for data warehousing, analytics, data integration, and other data-driven solutions.
      11:20am-12:00pm (40m) Data-driven business management, Strata Business Summit
      The five components of a data strategy
      Evan Levy (SAS)
      While it's clear organizations need to have a comprehensive data strategy, few have actually developed a plan to improve the access, sharing, and usage of data. Evan Levy discusses the five essential components that make up a data strategy and explores the individual attributes of each.
      1:15pm-1:55pm (40m) Big data and the Cloud, Data Engineering & Architecture Data for good, Media, Platform
      Analytics at Wikipedia
      Andrew Otto (Wikimedia Foundation), Fangjin Yang (Imply)
      The Wikimedia Foundation (WMF) is a nonprofit charitable organization. As the parent company of Wikipedia, one of the most visited websites in the world, WMF faces many unique challenges around its ecosystem of editors, readers, and content. Andrew Otto and Fangjin Yang explain how the WMF does analytics and offer an overview of the technology it uses to do so.
      2:05pm-2:45pm (40m) Data engineering, Data Engineering & Architecture, Law, ethics, governance
      Managing core data entities for internal customers at Spotify
      Sneha Rao (Spotify), Joel Östlund (Spotify)
      Spotify makes data-driven product decisions. As the company grows, the magnitude and complexity of the data it cares for the most is rapid increasing. Sneha Rao and Joel Östlund walk you through how Spotify stores and exposes audience data created from multiple internal producers within Spotify.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Spark & beyond
      HDFS on Kubernetes: Lessons learned
      Kimoon Kim (Pepperdata)
      There is growing interest in running Spark natively on Kubernetes. Spark applications often access data in HDFS, and Spark supports HDFS locality by scheduling tasks on nodes that have the task input data on their local disks. Kimoon Kim demonstrates how to run HDFS inside Kubernetes to speed up Spark.
      4:35pm-5:15pm (40m) Data Engineering & Architecture, Spark & beyond
      SETL: An efficient and predictable way to do Spark ETL
      Thiruvalluvan M G (Aqfer)
      Common ETL jobs used for importing log data into Hadoop clusters require a considerable amount of resources, which varies based on the input size. Thiruvalluvan M G shares a set of techniques—involving an innovative use of Spark processing and exploiting features of Hadoop file formats—that not only make these jobs much more efficient but also work well with fixed amounts of resources.
      11:20am-12:00pm (40m) Data Engineering & Architecture, Stream processing and analytics Architecture, Cloud, Streaming
      The three realities of modern programming: The cloud, microservices, and the explosion of data
      Gwen Shapira (Confluent)
      Gwen Shapira explains how the three realities of modern programming—the explosion of data and data systems, building business processes as microservices instead of monolithic applications, and the rise of the public cloud—affect how developers and companies operate today and why companies across all industries are turning to streaming data and Apache Kafka for mission-critical applications.
      1:15pm-1:55pm (40m) Data Engineering & Architecture, Hadoop platform & applications
      Implementing Hadoop to save lives
      Tony McAllister (Be the Match (National Marrow Donor Program))
      The National Marrow Donor Program (Be the Match) recently moved its core transplant matching platform onto Cloudera Hadoop. Tony McAllister explains why the program chose Cloudera Hadoop and shares its big data goals: to increase the number of donors and matches, make the process more efficient, and make transplants more effective.
      2:05pm-2:45pm (40m) Data Engineering & Architecture, Emerging Technologies
      The columnar roadmap: Apache Parquet and Apache Arrow
      Julien Le Dem (Apache Parquet)
      Julien Le Dem explains how Parquet is improving at the storage level, with metadata and statistics that will facilitate more optimizations in query engines in the future, how the new vectorized reader from Parquet to Arrow enables much faster reads by removing abstractions, and how standard Arrow-based APIs are paving the way to breaking the silos of big data.
      2:55pm-3:35pm (40m) Data engineering, Data Engineering & Architecture Architecture, Media, Platform
      Introducing Venice: A derived datastore for batch, streaming, and lambda architectures
      Felix GV (LinkedIn), Yan Yan (LinkedIn)
      Companies with batch and stream processing pipelines need to serve the insights they glean back to their users, an often-overlooked problem that can be hard to achieve reliably and at scale. Felix GV and Yan Yan offer an overview of Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency.
      4:35pm-5:15pm (40m) Data Engineering & Architecture, Enterprise adoption
      Creating a DevOps practice for analytics
      Bob Eilbacher (Caserta)
      Building an efficient analytics environment requires a strong infrastructure. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains.
      11:20am-12:00pm (40m) Data Engineering & Architecture, Stream processing and analytics Streaming
      Realizing the promise of portability with Apache Beam
      Reuven Lax (Google)
      Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. Reuven Lax offers an overview of Beam basic concepts and demonstrates that portability in action.
      1:15pm-1:55pm (40m) Data Engineering & Architecture, Stream processing and analytics Streaming
      Foundations of streaming SQL; or, How I learned to love stream and table theory
      Tyler Akidau (Google)
      What does it mean to execute streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing? And how does all of this relate to the programmatic frameworks we’re all familiar with? Tyler Akidau answers these questions and more as he walks you through key concepts underpinning data processing in general.
      2:05pm-2:45pm (40m) Data engineering, Data Engineering & Architecture
      One cluster does not fit all: Architecture patterns for multicluster Apache Kafka deployments
      Gwen Shapira (Confluent)
      There are many good reasons to run more than one Kafka cluster…and a few bad reasons too. Great architectures are driven by use cases, and multicluster deployments are no exception. Gwen Shapira offers an overview of several use cases, including real-time analytics and payment processing, that may require multicluster solutions, so you can better choose the right architecture for your needs.
      2:55pm-3:35pm (40m) Big data and the Cloud, Data Engineering & Architecture Streaming
      Heraclitus, the metaphysics of change, and Kafka Streams
      Tim Berglund (Confluent)
      Tim Berglund offers a thorough introduction to the Streams API, an important recent addition to Kafka that lets us build sophisticated stream processing systems that are as scalable and fault tolerant as Kafka itself—and also happen to align quite well with the microservices sensibilities that are so common in contemporary architectural thinking.
      4:35pm-5:15pm (40m) Stream processing and analytics
      Streaming visual analytics: What's possible today and what's coming tomorrow
      Shant Hovsepian (Arcadia Data)
      Streaming visual analytics is a technique for visualizing and interacting with streaming data in near real time. Shant Hovsepian explains how lambda- and polling-based architectures are being disrupted by reactive visualization systems, as streaming engines embrace the CQRS pattern, and offers analysis of visualizing streams from Apache Kafka, Apache Flink, and Apache Spark.
      11:20am-12:00pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet Architecture, IoT, Streaming
      The sunset of lambda: New architectures amplify IoT impact
      Michael Crutcher (Cloudera), Ryan Lippert (Cloudera)
      A long time ago in a data center far, far away, we deployed complex lambda architectures as the backbone of our IoT solutions. Though hard, they enabled collection of real-time sensor data and slightly delayed analytics. Michael Crutcher and Ryan Lippert explain why Apache Kudu, a relational storage layer for fast analytics on fast data, is the key to unlocking the value in IoT data.
      1:15pm-1:55pm (40m) Data Engineering & Architecture, Real-time applications Architecture, Streaming
      Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog
      Matteo Merli (Streamlio), Sijie Guo (Streamlio)
      Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Matteo Merli and Sijie Guo offer an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
      2:05pm-2:45pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet, Visualization & user experience ecommerce, Geospatial, IoT, Logistics, Platform, Retail
      Seeing everything so managers can act on anything: The IoT in DHL Supply Chain operations
      Javier Esplugas (DHL Supply Chain), Kevin Parent (Conduce)
      DHL has created an IoT initiative for its supply chain warehouse operations. Javier Esplugas and Kevin Parent explain how DHL has gained unprecedented insight—from the most comprehensive global view across all locations to a unique data feed from a single sensor—to see, understand, and act on everything that occurs in its warehouses with immersive operational data visualization.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet IoT
      IIoT data fusion: Bridging the gap from data to value
      Alexandra Gunderson (Arundo Analytics)
      One of the main challenges when working with industrial data is linking the large amount of data and extracting value. Alexandra Gunderson shares a comprehensive preprocessing methodology that structures and links data from different sources, converting the IIoT analytics process from an unorganized mammoth to one more likely to generate insight.
      4:35pm-5:15pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet IoT
      How to build a digital twin
      Lloyd Palum (Vnomics)
      A digital twin models a real-world physical asset using mobile data, cloud computing, and machine learning to track chosen characteristics. Lloyd Palum walks you through building a tractor trailer digital twin using Python and TensorFlow. You can then use the example model to track and optimize performance.
      11:20am-12:00pm (40m) Data-driven business management, Strata Business Summit
      From the weeds to the stars: How and why to think about bigger problems
      David Boyle (MasterClass)
      Too many brilliant analytical minds are wasted on interesting but ultimately less-impactful problems. They are stuck in the weeds of the data or the challenges of our day to day. Too few ask what it means to reach for the stars—the big, shiny, business-changing issues. David Boyle explains why you must start asking bigger questions and making a bigger difference.
      1:15pm-1:55pm (40m) Data engineering, Strata Business Summit Architecture, Media, Platform
      20 Netflix-style principles and practices to get the most out of your data platform
      Kurt Brown (Netflix)
      Kurt Brown explains how to get the most out of your data infrastructure with 20 principles and practices used at Netflix. Kurt covers each in detail and explores how they relate to the technologies used at Netflix, including S3, Spark, Presto, Druid, R, Python, and Jupyter.
      2:05pm-2:45pm (40m) Data-driven business management, Strata Business Summit
      The pitfalls of running a self-service big data platform
      Sander Kieft (Sanoma Media)
      Sanoma has been running big data as a self-service platform for over five years, mainly as a service for business analysts to work directly on the source data. The road to getting business analysts to directly do their analyses on Hadoop was far from smooth. Sander Kieft explores Sanoma's journey and shares some lessons learned along the way.
      2:55pm-3:35pm (40m) Data-driven business management, Strata Business Summit
      The five dysfunctions of a data engineering team
      Jesse Anderson (Big Data Institute)
      Early project success is predicated on management making sure a data engineering team is ready and has all of the skills needed. Jesse Anderson outlines five of the most common nontechnology reasons why data engineering teams fail.
      4:35pm-5:15pm (40m) Data-driven business management, Strata Business Summit
      How to hire and test for data skills: A one-size-fits-all interview kit
      Tanya Cashorali (TCB Analytics)
      Given the recent demand for data analytics and data science skills, adequately testing and qualifying candidates can be a daunting task. Interviewing hundreds of individuals of varying experience and skill levels requires a standardized approach. Tanya Cashorali explores strategies, best practices, and deceptively simple interviewing techniques for data analytics and data science candidates.
      11:20am-12:00pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Talking to machines—Natural language today
      Hilary Mason (Fast Forward Labs)
      Progress in machine learning has led us to believe we might soon be able to build machines that talk to us using the same interfaces that we use to talk to each other: natural language. But how close are we? Hilary Mason explores the current state of natural language technologies and some applications where this technology is thriving today and imagines what we might build in the next few years.
      1:15pm-1:55pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it
      Mike Olson (Cloudera)
      Mike Olson shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them.
      2:05pm-2:45pm (40m) Data-driven business management, Executive Briefing, Strata Business Summit
      Executive Briefing: Analytics centers of excellence as a way to accelerate big data adoption by business
      Carme Artigas (Synergic Partners)
      Big data technology is mature, but its adoption by business is slow, due in part to challenges like a lack of resources and the need for a cultural change. Carme Artigas explains why an analytics center of excellence (ACoE), whether internal or outsourced, is an effective way to accelerate adoption and shares an approach to implementing an ACoE.
      2:55pm-3:35pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Determining the economic value of your data (EvD)
      Bill Schmarzo (EMC)
      Organizations need a process and supporting frameworks to become more effective at leveraging data and analytics to transform their business models. Using the Big Data Business Model Maturity Index as a guide, William Schmarzo demonstrates how to assess business value and implementation feasibility with respect to the monetization potential of an organization’s business use cases.
      4:35pm-5:15pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Managing successful data projects—Technology selection and team building
      Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
      Recent years have seen dramatic advancements in the technologies available for managing and processing data. While these technologies provide powerful tools to build data applications, they also require new skills. Ted Malaska and Jonathan Seidman explain how to evaluate these new technologies and build teams to effectively leverage these technologies and achieve ROI with your data initiatives.
      11:20am-12:00pm (40m) Stream processing and analytics Streaming
      Exactly once, more than once: Apache Kafka, Heron, and Apache Apex
      Dean Wampler (Lightbend), Jun Rao (Confluent), Karthik Ramasamy (Streamlio), Pramod Immaneni (DataTorrent)
      In a series of three 11-minute presentations, key members of Apache Kafka, Heron, and Apache Apex discuss their respective implementations of exactly once delivery and semantics.
      1:15pm-1:55pm (40m)
      Ask me anything: Hadoop application architectures
      Mark Grover (Lyft), Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment), Gwen Shapira (Confluent)
      Mark Grover, Ted Malaska, Gwen Shapira, and Jonathan Seidman, the authors of Hadoop Application Architectures, share considerations and recommendations for the architecture and design of applications using Hadoop. Come with questions about your use case and its big data architecture or just listen in on the conversation.
      2:05pm-2:45pm (40m)
      Ask me anything: Data & Society
      danah boyd (Microsoft Research | Data & Society), Madeleine Elish (Data & Society)
      Data & Society's danah boyd and Madeleine Elish answer your questions and discuss topics such as the manipulation of data-driven and AI technologies, humans in the loop in automated systems, and the future of work.
      2:55pm-3:35pm (40m)
      Ask me anything: Running data science in the enterprise and architecting data platforms
      John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science)
      John Akred, Stephen O'Sullivan, and Heather Nelson field a wide range of detailed questions on topics such as managing data science in the enterprise, architecting a data platform, and creating a modern enterprise data strategy. Even if you don’t have a specific question, join in to hear what others are asking.
      4:35pm-5:15pm (40m)
      Ask me anything: Apache Kafka as a streaming platform
      Tim Berglund (Confluent)
      Tim Berglund answers your burning questions about Kafka architecture, the Streams API, KSQL, and message-based microservices integration. Even if you don't have a question of your own, stop by to hear what other people are asking.
      11:20am-12:00pm (40m) Law, ethics, governance Data for good, Smart cities
      Data futures: Exploring the everyday implications of increasing access to our personal data
      Daniel Goddemeyer (OFFC NYC), Dominikus Baur (Freelance)
      Increasing access to our personal data raises profound moral and ethical questions. Daniel Goddemeyer and Dominikus Baur share the findings from Data Futures, an MFA class in which students observed each other through their own data, and demonstrate the results with a live experiment with the audience that showcases some of the effects when personal data becomes accessible.
      1:15pm-1:55pm (40m) Law, ethics, governance
      GDPR: Getting your data ready for heavy, new EU privacy regulations
      Steven Ross (Cloudera), Mark Donsky (Cloudera)
      In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Steven Ross and Mark Donsky outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations.
      2:05pm-2:45pm (40m) Law, ethics, governance
      Show me my data, and I’ll tell you who I am.
      Majken Sander (TimeXtender)
      Personal data is increasingly spread across various services globally. But what do companies know about us? And how do we collect that knowledge, get ahold of our own data, and maybe even correct faulty perceptions by putting the right answers out there as a service? Majken Sander explains why we desperately need a personal Discovery Hub: a go-to place for knowledge about ourselves.
      2:55pm-3:35pm (40m) Stream processing and analytics Streaming
      MacroBase: A search engine for fast data streams
      Sahaana Suri (Stanford University)
      Sahaana Suri offers an overview of MacroBase, a new analytics engine from Stanford designed to prioritize the scarcest resource in large-scale, fast-moving data streams: human attention. MacroBase allows reconfigurable, real-time root-cause analyses that have already diagnosed issues in production streams in mobile, data center, and industrial applications.
      4:35pm-5:15pm (40m) Data science & advanced analytics Text
      Topic modeling openNASA data
      Noemi Derzsy (Rensselaer Polytechnic Institute)
      Open source data has enabled society to engage in community-based research and has provided government agencies with more visibility and trust from individuals. Noemi Derzsy offers an overview of the openNASA platform and discusses openNASA metadata analysis and tools for applying NLP and topic modeling techniques to understand open government dataset associations.
      11:20am-12:00pm (40m) Sponsored
      Emotional arithmetic: A deep dive into how machine learning and big data help you understand customers in real time (sponsored by Google)
      Chad W. Jennings (Google), Eric Schmidt (Google)
      Doing “algebra” with emotions can lead to new insights about customer behavior. Chad Jennings presents a serverless big data analytics platform that allows you to capture and analyze raw data and train machine learning models that can process text to discern not just the sentiment but also the underlying emotion driving that sentiment.
      1:15pm-1:55pm (40m) Sponsored
      Key big data architectural considerations for deploying in the cloud and on-premises (sponsored by NetApp)
      Karthikeyan Nagalingam (NetApp)
      When analytics applications become business critical, balancing cost with SLAs for performance, backup, dev, test, and recovery is difficult. Karthikeyan Nagalingam discusses big data architectural challenges and how to address them and explains how to create a cost-optimized solution for the rapid deployment of business-critical applications that meet corporate SLAs today and into the future.
      2:05pm-2:45pm (40m) Sponsored
      Meeting the challenges of the analytics economy (sponsored by SAS)
      Fiona McNeill (SAS)
      Much is being written about the economy of everything, but where does the analytics economy fit in? Fiona McNeill shares SAS's vision and roadmap for meeting the unique challenges of the analytics economy, including thoughts on intersections with related technologies like machine learning, deep learning, cognitive computing, and more.
      11:20am-12:00pm (40m) Sponsored
      Adaptive analytics: Transitioning from legacy systems to a modern platform with MicroStrategy and Cloudera (sponsored by MicroStrategy)
      Alex Gutow (Cloudera), David Harsh (Microstrategy)
      Alex Gutow discusses the importance of adaptive analytics and shares everything you need to know while transitioning from legacy data warehouses to Hadoop-based platforms. Join in to find out why you need modern platforms to move, host, and analyze your data with MicroStrategy and Cloudera.
      1:15pm-1:55pm (40m) Sponsored
      The unspoken truths of deploying and scaling ML in production (sponsored by ParallelM)
      NISHA TALAGALA (ParallelM)
      Deploying ML in production is challenging. Nisha Talagala shares solutions and techniques for effectively managing machine learning and deep learning in production with popular analytic engines such as Apache Spark, TensorFlow, and Apache Flink.
      2:05pm-2:45pm (40m) Sponsored
      A comprehensive, enterprise-grade, open Hadoop solution from Hewlett Packard Enterprise (sponsored by Hewlett Packard Enterprise)
      Bob Patterson (Hewlett Packard Enterprise (HPE))
      Bob Patterson offers an overview of Hewlett Packard Enterprise's enterprise-grade Hadoop solution, which has everything you need to accelerate your big data journey: innovative hardware architectures for diverse workloads certified for all leading distros, infrastructure software, services from HPE and partners, and add-ons like object storage.
      11:20am-12:00pm (40m) Sponsored
      Deploying to the edge, bringing AI everywhere (sponsored by Microsoft)
      Matt Winkler (Microsoft)
      Matt Winkler shares real-world case studies on how healthcare, agriculture, and manufacturing companies are creating, training, deploying, and managing AI models faster with Microsoft Azure and deploying them to the cloud, on-premises, and to the edge.
      1:15pm-1:55pm (40m) Sponsored
      The future of data science and machine learning (sponsored by IBM)
      carlo appugliese (IBM)
      A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Carlo Appugliese examines the impact these trends and changes will have on the future of data science and how machine learning is making data science available to all.
      2:05pm-2:45pm (40m) Sponsored
      Finally, an interactive experience for your data lake (sponsored by Datameer)
      John Morrell (Datameer)
      While companies have flooded data lakes with billions of records, the technical limitations of Hadoop have kept analysts from interactively exploring this data and delivering real value—until now. John Morrell explores a solution helping analysts interactively and rapidly explore billions of records in Hadoop, offering a truly interactive experience and ushering in the era of Data Lake 2.0.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Law, ethics, governance Media
      Taming the ever-evolving compliance beast: Lessons learned at LinkedIn
      Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
      Shirshanka Das and Tushar Shanbhag explore the big data ecosystem at LinkedIn and share its journey to preserve member privacy while providing data democracy. Shirshanka and Tushar focus on three foundational building blocks for scalable data management that can meet data compliance regulations: a central metadata system, an integrated data movement platform, and a unified data access layer.
      11:20am-12:00pm (40m) Sponsored
      Streamline Data Science Pipeline with GPU Data Frame (sponsored by NVIDIA)
      Jim McHugh (NVIDIA), Todd Mostak (MapD), Srisatish Ambati (0xdata Inc), Stanley Seibert (Anaconda)
      Joining Jim McHugh are founders of GOAI: - Todd Mostak, CEO of MapD - SriSatish Ambati, CEO and co-founder of H2O - Stan Seibert, Director of Community Innovation, Anaconda In this session, the speakers will provide an update on the latest advancement and customer use cases leveraging GOAI
      1:15pm-1:55pm (40m) Sponsored
      Extend on-premises Hadoop and Spark deployments across data centers and the cloud, including Microsoft Azure (sponsored by Microsoft and WANdisco)
      Jagane Sundar (WANdisco), Pranav Rastogi (Microsoft)
      Jagane Sundar and Pranav Rastogi explain how to meet your enterprise SLAs while making full use of resources with patented active data replication technology—something computer science still says is impossible.
      2:05pm-2:45pm (40m) Sponsored
      A governance checklist for making your big data into trusted data (sponsored by Syncsort)
      Keith Kohl (Syncsort)
      If users get conflicting analytics results, wild predictions, and crazy reports from the data in your data lake, they will lose trust. From the beginning of your data lake project, you need to build in solid business rules, data quality checking, and enhancement. Keith Kohl shares an actionable checklist that shows everyone in your enterprise that your big data can be trusted.
      11:20am-12:00pm (40m) Sponsored
      Continuous integration at scale: Streaming 50 billion events per day for real-time feedback with Kafka and Spark (sponsored by Pure Storage)
      Ivan Jibaja (Pure Storage)
      Ivan Jibaja explains offers an overview of Pure Storage's streaming big data analytics pipeline, which uses open source technologies like Spark and Kafka to process over 30 billion events per day and provide real-time feedback in under five seconds.
      1:15pm-1:55pm (40m) Sponsored
      Deploying an automated data platform, from data ingestion to consumption: A real-world enterprise example (sponsored by Infoworks)
      Ramesh Menon (Infoworks)
      Enterprises want to implement analytics use cases at the speed of business yet spend more time on complicated data management than on creating business value. The solution is automation. Ramesh Menon explains how a large enterprise automated data ingestion, data synchronization, and the building of data models and cubes to create a big data warehouse for the rapid deployment of analytics.
      2:05pm-2:45pm (40m) Sponsored
      Automated data pipelines in hybrid environments: Myth or reality? (sponsored by BMC)
      Basil Faruqui (BMC Software), Jon Ouimet (BMC Software)
      Are you building, running, or managing complex data pipelines across hybrid environments spanning multiple applications and data sources? Doing this successfully requires automating dataflows across the entire pipeline, ideally controlled through a single source. Basil Faruqui and Jon Ouimet walk you through a customer journey to automate data pipelines across a hybrid environment.
      8:50am-8:55am (5m)
      Thursday keynotes
      Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
      Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.
      8:55am-9:05am (10m)
      The US EPA: Digital transformation through data science
      Robin Thottungal (US Environmental Protection Agency)
      Data science is key to addressing national challenges with greater agility. At the EPA, the prime challenge is to provide the best value to American citizens in an ever-changing world. Robin Thottungal explains how the EPA addresses this challenge through digital and analytical services.
      9:05am-9:10am (5m) Sponsored keynote
      Emotional arithmetic: How machine learning helps you understand customers in real time (sponsored by Google)
      Chad W. Jennings (Google)
      Chad W. Jennings walks you through a serverless big data architecture on Google Cloud that helps unravel the mysteries of human emotion.
      9:10am-9:20am (10m) Strata Business Summit
      A tale of two cafeterias: Focus on the line of business
      Tanvi Singh (Credit Suisse)
      Tanvi Singh explores whether long-standing non-internet-based companies possess the evidence-driven culture and platforms required to derive benefit from big data tools and impact their line of business.
      9:20am-9:25am (5m) Sponsored keynote
      Harness the Power of AI and Deep Learning for Business (sponsored by NVIDIA)
      Jim McHugh (NVIDIA)
      AI is transforming industry and society. Accelerated computing, deep learning platforms, and intelligent machines supercharge digital transformation to harness the power of AI. This session will feature examples of AI-accelerated businesses and dive into specific approaches enterprises are taking to adopting AI and accelerated analytics.
      9:25am-9:35am (10m)
      How the IoT and machine learning keep America truckin'
      Mike Olson (Cloudera), Terry Kline (Navistar)
      Data is powering the largest trucks on America’s interstates, the buses that take our children to school, and the military vehicles that help protect our country. Terry Kline and Mike Olson look at how machine learning and predictive analytics keep more than 300,000+ connected vehicles rolling.
      9:35am-9:50am (15m) Strata Business Summit
      The real project of AI ethics
      Joanna Bryson (University of Bath | Princeton Center for Information Technology Policy)
      AI has been with us for hundreds of years; there's no "singularity" step change. Joanna Bryson explains that the main threat of AI is not that it will do anything to us but what we are already doing to each other with it—predicting and manipulating our own and others' behavior.
      9:50am-10:00am (10m) Sponsored keynote
      Will AI help save the snow leopard? (sponsored by Microsoft)
      Joseph Sirosh (Microsoft)
      Join Microsoft’s Joseph Sirosh for a surprising conversation about a volunteer’s dilemma, an engineer’s ingenuity, and how AI, the cloud, data, and devices came together to help save snow leopards.
      10:00am-10:15am (15m)
      Human-AI interaction: Autonomous service robots
      Manuela Veloso (Carnegie Mellon University)
      Manuela Veloso explores human-AI collaboration, particularly in terms of robots learning from human sources and robot explanation generation to respond to language-based requests about their autonomous experience. Manuela concludes with a further discussion of general human-AI interaction and the opportunities for transparency and trust building of AI systems.
      10:15am-10:20am (5m) Sponsored keynote
      Analytics everywhere, from things to cities (sponsored by Cisco)
      Raghunath Nambiar (Cisco)
      Endless possibilities when we connect the unconnected. Raghunath Nambiar discusses the magnitude of new challenges and new opportunities across industry segments.
      10:20am-10:35am (15m) Strata Business Summit
      Your data is being manipulated.
      danah boyd (Microsoft Research | Data & Society)
      The more that we rely on data to train our models and inform our systems, the more that this data becomes a target for those seeking to manipulate algorithmic systems and undermine trust in data. danah boyd explores how systems are being gamed, how data is vulnerable, and what we need to do to build technical antibodies.
      10:35am-10:45am (10m) Strata Business Summit
      WTF? What's the future and why it's up to us
      Tim O'Reilly (O'Reilly Media)
      Robots are going to take our jobs, they say. Tim O'Reilly says, "Only if that's what we ask them to do!" Tim has had his fill of technological determinism. He explains why technology is the solution to human problems and why we won't run out of work till we run out of problems.
      12:00pm-1:15pm (1h 15m)
      Thursday Topic Tables at Lunch
      Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
      10:50am-11:20am (30m)
      Break: Morning break sponsored by Google
      3:35pm-4:35pm (1h)
      Break: Afternoon break sponsored by Cisco
      8:00am-8:30am (30m)
      Speed Networking
      Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees.