Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY
 

Schedule

< Filters

No Results Found

Clear all filters

Close

Filters

      Clear filters
      1A 06/07
      Add Differentiating by data science to your personal schedule
      11:20am Differentiating by data science Eric Colson (Stitch Fix)
      Add Building a Rosetta Stone for business data to your personal schedule
      1:15pm Building a Rosetta Stone for business data Matthew Roche (Microsoft), Jennifer Marie Stevens (Microsoft)
      Add Learning location: Real-time feature extraction for mobile analytics to your personal schedule
      2:05pm Learning location: Real-time feature extraction for mobile analytics Sander Pick (Set), Andrew Hill (Set), Carson Farmer (Set)
      Add Interpretable AI: Not just for regulators to your personal schedule
      4:35pm Interpretable AI: Not just for regulators Patrick Hall (H2O.ai | George Washington University), Sri Satish (H2O.ai)
      1A 08/10
      Add Probabilistic programming in finance using Prophet to your personal schedule
      11:20am Probabilistic programming in finance using Prophet Justin Bleich (Coatue Management)
      Add Data science at team scale: Considerations for sharing, collaborating, and getting to production to your personal schedule
      1:15pm Data science at team scale: Considerations for sharing, collaborating, and getting to production Tristan Zajonc (Cloudera), Thomas Dinsmore (Cloudera), Lucas Glass (QuintilesIMS)
      Add JupyterLab: Building blocks for interactive computing to your personal schedule
      2:05pm JupyterLab: Building blocks for interactive computing Jason Grout (Bloomberg LP), Jessica Forde (Jupyter)
      Add Weld: Accelerating data science by 100x to your personal schedule
      4:35pm Weld: Accelerating data science by 100x Shoumik Palkar (Stanford University), Matei Zaharia (Stanford University)
      Add Boosting Spark MLlib performance with rich optimization algorithms to your personal schedule
      5:25pm Boosting Spark MLlib performance with rich optimization algorithms Seth Hendrickson (Cloudera), DB Tsai (Netflix)
      1A 12/14
      Add Deep learning in practice to your personal schedule
      11:20am Deep learning in practice Mikio Braun (Zalando SE)
      Add Automatic comments moderation with ModBot at the Washington Post to your personal schedule
      2:05pm Automatic comments moderation with ModBot at the Washington Post Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
      Add GPU-accelerating a deep learning anomaly detection platform to your personal schedule
      2:55pm GPU-accelerating a deep learning anomaly detection platform Joshua Patterson (NVIDIA), Michael Balint (NVIDIA), Satish Varma Dandu (NVIDIA)
      Add Fighting financial fraud at Danske Bank with artificial intelligence to your personal schedule
      4:35pm Fighting financial fraud at Danske Bank with artificial intelligence Nadeem Gulzar (Danske Bank Group), Sune Askjær (Think Big Analytics, a Teradata Company)
      Add Practical deep learning for understanding images to your personal schedule
      5:25pm Practical deep learning for understanding images Leo Dirac (Amazon Web Services)
      1A 15/16/17
      Add Apache Spark in the hands of data scientists to your personal schedule
      11:20am Apache Spark in the hands of data scientists Neelesh Srinivas Salian (Stitch Fix)
      Add Best practices for using Alluxio with Spark to your personal schedule
      1:15pm Best practices for using Alluxio with Spark Cheng Chang (Alluxio), Haoyuan Li (Alluxio)
      Add Rethinking data marts in the cloud: Common architectural patterns for analytics to your personal schedule
      2:05pm Rethinking data marts in the cloud: Common architectural patterns for analytics Henry Robinson (Cloudera), Greg Rahn (Cloudera)
      Add Creating a serverless real-time analytics platform powered by machine learning in the cloud to your personal schedule
      2:55pm Creating a serverless real-time analytics platform powered by machine learning in the cloud Roy Ben-Alta (Amazon Web Services), Allan MacInnis (Amazon Web Services)
      Add Why containers and microservices need streaming data  to your personal schedule
      4:35pm Why containers and microservices need streaming data Paul Curtis (MapR Technologies)
      Add Spotify in the cloud: The next evolution of data at Spotify to your personal schedule
      5:25pm Spotify in the cloud: The next evolution of data at Spotify Josh Baer (Spotify), Alison Gilles (Spotify)
      1A 18
      Add Enterprise digital transformation using big data to your personal schedule
      11:20am Enterprise digital transformation using big data Atul Dalmia (American Express)
      Add Next-generation data management to your personal schedule
      1:15pm Next-generation data management Milind Nagnur (Citigroup)
      Add Optimizing the data warehouse at Visa to your personal schedule
      2:05pm Optimizing the data warehouse at Visa Nandu Jayakumar (Visa), Justin Erickson (Cloudera)
      Add Big data analysis of futures trades to your personal schedule
      2:55pm Big data analysis of futures trades Tobi Bosede (Johns Hopkins)
      Add Learning from higher education: How Ivy Tech is using predictive analytics and data democracy to reverse decades of entrenched practices to your personal schedule
      4:35pm Learning from higher education: How Ivy Tech is using predictive analytics and data democracy to reverse decades of entrenched practices Brendan Aldrich (Ivy Tech Community College ), Lige Hensley (Ivy Tech Community College )
      Add What I learned from teaching 1,500 analytics students to your personal schedule
      5:25pm What I learned from teaching 1,500 analytics students Jerrard Gaertner (University of Toronto School of Continuing Studies)
      1A 21/22
      Add Extending Spark ML: Adding your own tools and algorithms to your personal schedule
      2:55pm Extending Spark ML: Adding your own tools and algorithms Holden Karau (Google), Seth Hendrickson (Cloudera)
      Add Using ML to solve failure problems with ML and AI apps in Spark to your personal schedule
      5:25pm Using ML to solve failure problems with ML and AI apps in Spark Adrian Popescu (Unravel Data Systems), Shivnath Babu (Unravel Data Systems)
      1A 23/24
      Add Geospatial big data analysis at Uber to your personal schedule
      11:20am Geospatial big data analysis at Uber Zhenxiao Luo (Uber), Wei Yan (Uber)
      Add Solving data cleaning and unification using human-guided machine learning to your personal schedule
      5:25pm Solving data cleaning and unification using human-guided machine learning Ihab Ilyas (University of Waterloo | Tamr)
      1E 07/08
      Add Stream all the things! to your personal schedule
      11:20am Stream all the things! Dean Wampler (Lightbend)
      Add When boring is awesome: Making PostgreSQL scale for time series data to your personal schedule
      1:15pm When boring is awesome: Making PostgreSQL scale for time series data Michael Freedman (TimescaleDB | Princeton)
      Add Stream analytics with SQL on Apache Flink to your personal schedule
      4:35pm Stream analytics with SQL on Apache Flink Fabian Hueske (data Artisans)
      Add Low-latency streaming: Twitter Heron on Infiniband to your personal schedule
      5:25pm Low-latency streaming: Twitter Heron on Infiniband Karthik Ramasamy (Streamlio), Supun Kamburugamuve (Indiana University)
      1E 09
      Add State-of-the-art robot predictive maintenance with real-time sensor data to your personal schedule
      11:20am State-of-the-art robot predictive maintenance with real-time sensor data Mateusz Dymczyk (H2O.ai), Mathieu Dumoulin (MapR Technologies)
      Add How an Italian company rules the world of insurance: Facing the technological challenges of turning data into value to your personal schedule
      1:15pm How an Italian company rules the world of insurance: Facing the technological challenges of turning data into value Riccardo Corbella (Data Reply IT), Beniamino Del Pizzo (Data Reply IT)
      Add Predicting tantrums with wearable data and real-time analytics to your personal schedule
      2:05pm Predicting tantrums with wearable data and real-time analytics Julie Lockner (17 Minds Corporation)
      Add Project Rainier: Saving lives one insight at a time to your personal schedule
      2:55pm Project Rainier: Saving lives one insight at a time Marc Carlson (Seattle Children's Research Institute), Sean Taylor (Seattle Children's Research Institute)
      Add Working within the Hadoop ecosystem to build a live-streaming data pipeline to your personal schedule
      4:35pm Working within the Hadoop ecosystem to build a live-streaming data pipeline Stephen Devine (Big Fish Games), Kalah Brown (Big Fish Games)
      Add An open source architecture for the IoT to your personal schedule
      5:25pm An open source architecture for the IoT Dave Shuman (Cloudera), James Kirkland (Red Hat)
      1E 10/11
      Add Accelerating the next generation of data companies to your personal schedule
      1:15pm Accelerating the next generation of data companies Chris Neumann (500 Startups), Carla Holtze (Parrable), Bradford Cross (DCVC), Kyle Wild (Keen IO), Tasso Argyros (‎ActionIQ)
      Add Where the puck is headed: A VC panel discussion to your personal schedule
      2:05pm Where the puck is headed: A VC panel discussion Michael Dauber (Amplify Partners), Sarah Catanzaro (Canvas Ventures), Katherine Boyle (General Catalyst), Lisha Li (Amplify Partners), Sandeep Bhadra (Vertex Ventures)
      Add Learning from customers, keeping humans in the loop to your personal schedule
      2:55pm Learning from customers, keeping humans in the loop Elsie Kenyon (Nara Logics)
      Add Retail's panacea: How machine learning is driving product development to your personal schedule
      5:25pm Retail's panacea: How machine learning is driving product development Hilary Milnes (Glossy), Karen Moon (Trendalytics), Jared Schiffman (Perch Interactive), Eric Colson (Stitch Fix), Catherine Twist (Xcel Brands (Isaac Mizrahi, C. Wonder, Halston, Judith Ripka))
      1E 12/13
      Add Executive Briefing: Artificial intelligence—The next digital frontier? to your personal schedule
      11:20am Executive Briefing: Artificial intelligence—The next digital frontier? Michael Chui (McKinsey Global Institute)
      Add Executive Briefing: Legal best practices for making data work  to your personal schedule
      1:15pm Executive Briefing: Legal best practices for making data work Alysa Z. Hutnik (Kelley Drye & Warren LLP)
      Add Executive Briefing: Data ecosystem strategy to your personal schedule
      5:25pm Executive Briefing: Data ecosystem strategy Jason McIntyre (Accenture), Mark Milazzo (Accenture)
      1E 14
      Add An authenticated journey through big data security at Walmart to your personal schedule
      1:15pm An authenticated journey through big data security at Walmart Matt Bolte (Walmart), Toni LeTempt (Walmart)
      Add Anonymized data fusion: Privacy versus utility to your personal schedule
      4:35pm Anonymized data fusion: Privacy versus utility Behrooz Hashemian (Massachusetts Institute of Technology)
      Add Interactive data exploration and analysis at enterprise scale to your personal schedule
      5:25pm Interactive data exploration and analysis at enterprise scale Sean Kandel (Trifacta), Kaushal Gandhi (Trifacta)
      1E 15/16
      Add The cognitive design principles of interactive analytics to your personal schedule
      11:20am The cognitive design principles of interactive analytics Mike Driscoll (Metamarkets)
      Add Improve business decision making with the science of human perception to your personal schedule
      1:15pm Improve business decision making with the science of human perception Sebastian Gutierrez (DashingD3js.com)
      Add Expanding data literacy with data visualizations  to your personal schedule
      2:55pm Expanding data literacy with data visualizations Julie Rodriguez (Eagle Investment Systems)
      Add Text analytics and new visualization techniques to your personal schedule
      4:35pm Text analytics and new visualization techniques Richard Brath (Uncharted Software), Scott Langevin (Uncharted Software)
      1A 04/05
      Add Building a real-time feedback loop for education (sponsored by MemSQL) to your personal schedule
      11:20am Building a real-time feedback loop for education (sponsored by MemSQL) David Mellor (Curriculum Associates)
      Add The essentials for digital growth (sponsored by MapR) to your personal schedule
      1:15pm The essentials for digital growth (sponsored by MapR) Jack Norris (MapR Technologies)
      Add Using an AI-driven approach to managing data lakes in the cloud or on-premises (sponsored by Informatica) to your personal schedule
      2:55pm Using an AI-driven approach to managing data lakes in the cloud or on-premises (sponsored by Informatica) Murthy Mathiprakasam (Informatica), Sravan Kasarla (Fidelity Investments)
      1E 17
      Add Building the IoT data lifecycle (sponsored by Cisco) to your personal schedule
      11:20am Building the IoT data lifecycle (sponsored by Cisco) Han Yang (Cisco Systems)
      1A 01/02
      Add Tracking the opioid-fueled HIV outbreak with big data (sponsored by Trifacta) to your personal schedule
      2:55pm Tracking the opioid-fueled HIV outbreak with big data (sponsored by Trifacta) Ells Campbell (CDC), Connor Carreras (Trifacta), Ryan Weil (Leidos)
      1A 03
      1E 06
      Add Accelerating insight with analytics and AI (sponsored by Intel) to your personal schedule
      11:20am Accelerating insight with analytics and AI (sponsored by Intel) Kevin Huiskes (Intel), Radhika Rangarajan (Intel)
      Add Wednesday keynotes to your personal schedule
      3E
      8:50am Wednesday keynotes Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
      Add Journey to consolidation to your personal schedule
      9:00am Journey to consolidation Mike Olson (Cloudera), Cesar Delgado (Apple)
      Add White Collar Crime Risk Zones to your personal schedule
      9:15am White Collar Crime Risk Zones Sam Lavigne (The New Inquiry)
      Add The age of machine learning to your personal schedule
      9:30am The age of machine learning Ben Lorica (O'Reilly Media)
      Add Music, the window into your soul  to your personal schedule
      9:50am Music, the window into your soul Christine Hung (Spotify)
      Add Data science for the most vulnerable at UNICEF Innovation to your personal schedule
      10:10am Data science for the most vulnerable at UNICEF Innovation Manuel García-Herranz (UNICEF Office of Innovation)
      Add Weapons of math destruction to your personal schedule
      10:20am Weapons of math destruction Cathy O'Neil (Weapons of Math Destruction)
      10:50am Morning break sponsored by MemSQL | Room: Break
      Add Wednesday Topic Tables at Lunch to your personal schedule
      12:00pm Lunch sponsored by MapR Wednesday Topic Tables at Lunch | Room: 3A
      3:35pm Afternoon break sponsored by Intel | Room: Expo Hall
      Add Booth Crawl to your personal schedule
      6:05pm Booth Crawl | Room: Expo Hall
      Add Speed Networking to your personal schedule
      8:00am Speed Networking | Room: Crystal Palace
      Add Data After Dark: City View to your personal schedule
      7:30pm Data After Dark: City View | Room: 230 Fifth Penthouse
      7:05pm Dinner | Room: On your own
      12:00pm
      11:20am-12:00pm (40m) Data-driven business management, Machine Learning & Data Science
      Differentiating by data science
      Eric Colson (Stitch Fix)
      While companies often use data science as a supportive function, the emergence of new business models has made it possible for some companies to differentiate via data science. Eric Colson explores what it means to differentiate by data science and explains why companies must now think very differently about the role and placement of data science in the organization.
      1:15pm-1:55pm (40m) Machine Learning & Data Science
      Building a Rosetta Stone for business data
      Matthew Roche (Microsoft), Jennifer Marie Stevens (Microsoft)
      The data-driven business must bridge the language gap between data scientists and business users. Matthew Roche and Jennifer Stevens walk you through building a business glossary that codifies your semantic layer and enables greater conversational fluency between business users and data scientists.
      2:05pm-2:45pm (40m) Machine Learning & Data Science
      Learning location: Real-time feature extraction for mobile analytics
      Sander Pick (Set), Andrew Hill (Set), Carson Farmer (Set)
      Location-based data is full of information about our everyday lives, but GPS and WiFi signals create extremely noisy mobile location data, making it hard to extract features, especially when working with real-time data. Andrew Hill and Sander Pick explore new strategies for extracting information from location data while remaining scalable, privacy focused, and contextually aware.
      2:55pm-3:35pm (40m) Data science & advanced analytics, Machine Learning & Data Science Data for good, ecommerce, Healthcare
      Challenges in using machine learning to direct healthcare services
      Brian Dalessandro (Zocdoc)
      Zocdoc is an online marketplace that allows easy doctor discovery and instant online booking. However, dealing with healthcare involves many constraints and challenges that render standard approaches to common problems infeasible. Brian Dalessandro surveys the various machine learning problems Zocdoc has faced and shares the data, legal, and ethical constraints that shape its solution space.
      4:35pm-5:15pm (40m) Law, ethics, governance, Machine Learning & Data Science
      Interpretable AI: Not just for regulators
      Patrick Hall (H2O.ai | George Washington University), Sri Satish (H2O.ai)
      Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. People who use these technologies have the right to trust and understand AI. Patrick Hall and Sri Satish share techniques for interpreting deep learning and machine learning models and telling stories from their results.
      5:25pm-6:05pm (40m) Data science & advanced analytics, Machine Learning & Data Science
      When models go rogue: Hard-earned lessons about using machine learning in production
      David Talby (Pacific AI)
      Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries.
      11:20am-12:00pm (40m) Machine Learning & Data Science Financial services
      Probabilistic programming in finance using Prophet
      Justin Bleich (Coatue Management)
      Prophet is a Bayesian nonlinear time series forecasting model recently released by Facebook. Justin Bleich explains how Coatue—a hedge fund that uses data science to drive investment decisions—extends Prophet to include exogenous covariates when generating forecasts and applies it to nowcasting macroeconomic series using higher-frequency data available from sources such as Google Trends.
      1:15pm-1:55pm (40m) Machine Learning & Data Science
      Data science at team scale: Considerations for sharing, collaborating, and getting to production
      Tristan Zajonc (Cloudera), Thomas Dinsmore (Cloudera), Lucas Glass (QuintilesIMS)
      Data science alone is easy. Data science with others, whether in the enterprise or on shared distributed systems, requires a bit more work. Tristan Zajonc and Thomas Dinsmore discuss common technology considerations and patterns for collaboration in large teams and for moving machine learning into production at scale.
      2:05pm-2:45pm (40m) Machine Learning & Data Science, Visualization & user experience
      JupyterLab: Building blocks for interactive computing
      Jason Grout (Bloomberg LP), Jessica Forde (Jupyter)
      With JupyterLab, users compute with multiple notebooks, editors, and consoles that work together in a tabbed layout. Jason Grout and Jessica Forde offer an overview of JupyterLab, the next generation of the Jupyter Notebook, demonstrate how to use third-party plugins to extend and customize many aspects of JupyterLab, and explain how it fits within the overall vision of Project Jupyter.
      2:55pm-3:35pm (40m) Data science & advanced analytics, Machine Learning & Data Science Pydata
      Dask: Flexible parallelism in Python for advanced analytics
      Matthew Rocklin (Anaconda)
      Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. Matthew Rocklin discusses the architecture and current applications of Dask used in the wild and explores computational task scheduling and parallel computing within Python generally.
      4:35pm-5:15pm (40m) Data science & advanced analytics, Machine Learning & Data Science Pydata
      Weld: Accelerating data science by 100x
      Shoumik Palkar (Stanford University), Matei Zaharia (Stanford University)
      Modern data applications combine functions from many optimized libraries (e.g., pandas and TensorFlow) and yet do not achieve peak hardware performance due to data movement across functions. Shoumik Palkar and Matei Zaharia offer an overview of Weld, a new interface to implement functions in these libraries while enabling optimizations across them.
      5:25pm-6:05pm (40m) Machine Learning & Data Science, Spark & beyond Media
      Boosting Spark MLlib performance with rich optimization algorithms
      Seth Hendrickson (Cloudera), DB Tsai (Netflix)
      Recent developments in Spark MLlib have given users the power to express a wider class of ML models and decrease model training times via the use of custom parameter optimization algorithms. Seth Hendrickson and DB Tsai explain when and how to use this new API and walk you through creating your own Spark ML optimizer. Along the way, they also share performance benefits and real-world use cases.
      11:20am-12:00pm (40m) Machine Learning & Data Science AI, Deep learning, ecommerce
      Deep learning in practice
      Mikio Braun (Zalando SE)
      Deep learning has become the go-to solution for many application areas, such as image classification or speech processing, but does it work for all application areas? Mikio Braun offers background on deep learning and shares his practical experience working with these exciting technologies.
      1:15pm-1:55pm (40m) Artificial Intelligence, Machine Learning & Data Science Deep learning
      Building advanced analytics and deep learning on Apache Spark with BigDL
      Yuhao Yang (Intel), Zhichao Li (Intel)
      Yuhao Yang and Zhichao Li discuss building end-to-end analytics and deep learning applications, such as speech recognition and object detection, on top of BigDL and Spark and explore recent developments in BigDL, including Python APIs, notebook and TensorBoard support, TensorFlow model R/W support, better recurrent and recursive net support, and 3D image convolutions.
      2:05pm-2:45pm (40m) Data science & advanced analytics, Machine Learning & Data Science Media, Text
      Automatic comments moderation with ModBot at the Washington Post
      Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
      The quality of online comments is critical to the Washington Post. However, the quality management of the comment section currently requires costly manual resources. Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production.
      2:55pm-3:35pm (40m) Data science & advanced analytics, Machine Learning & Data Science Deep learning
      GPU-accelerating a deep learning anomaly detection platform
      Joshua Patterson (NVIDIA), Michael Balint (NVIDIA), Satish Varma Dandu (NVIDIA)
      How can deep learning be employed to create a system that monitors network traffic, operations data, and system logs to reliably flag risk and unearth potential threats? Satish Dandu, Joshua Patterson, and Michael Balint explain how to bootstrap a deep learning framework to detect risk and threats in operational production systems, using best-of-breed GPU-accelerated open source tools.
      4:35pm-5:15pm (40m) Artificial Intelligence, Machine Learning & Data Science Financial services, Platform
      Fighting financial fraud at Danske Bank with artificial intelligence
      Nadeem Gulzar (Danske Bank Group), Sune Askjær (Think Big Analytics, a Teradata Company)
      Fraud in banking is an arms race, and criminals are now using machine learning to improve their attack effectiveness. Sune Askjaer and Nadeem Gulzar explore how Danske Bank uses deep learning for better fraud detection, covering model effectiveness, TensorFlow versus boosted decision trees, operational considerations in training and deploying models, and lessons learned along the way.
      5:25pm-6:05pm (40m) Artificial Intelligence, Machine Learning & Data Science Cloud, Deep learning
      Practical deep learning for understanding images
      Leo Dirac (Amazon Web Services)
      Leo Dirac demonstrates how to apply the latest deep learning techniques to semantically understand images. You'll learn what embeddings are, how to extract them from your images using deep convolutional neural networks (CNNs), and how they can be used to cluster and classify large datasets of images.
      11:20am-12:00pm (40m) Data engineering, Data Engineering & Architecture ecommerce
      Apache Spark in the hands of data scientists
      Neelesh Srinivas Salian (Stitch Fix)
      Neelesh Srinivas Salian offers an overview of the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem. Neelesh explains the development process and shares some lessons learned along the way.
      1:15pm-1:55pm (40m) Data Engineering & Architecture, Spark & beyond
      Best practices for using Alluxio with Spark
      Cheng Chang (Alluxio), Haoyuan Li (Alluxio)
      Alluxio (formerly Tachyon) is a memory-speed virtual distributed storage system that leverages memory for managing data across different storage. Many deployments use Alluxio with Spark because Alluxio helps Spark further accelerate applications. Haoyuan Li and Cheng Chang explain how Alluxio makes Spark more effective and share production deployments of Alluxio and Spark working together.
      2:05pm-2:45pm (40m) Big data and the Cloud, Data Engineering & Architecture Architecture, Cloud
      Rethinking data marts in the cloud: Common architectural patterns for analytics
      Henry Robinson (Cloudera), Greg Rahn (Cloudera)
      Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Stream processing and analytics
      Creating a serverless real-time analytics platform powered by machine learning in the cloud
      Roy Ben-Alta (Amazon Web Services), Allan MacInnis (Amazon Web Services)
      Speed matters. Today, decisions are made based on real-time insights, but in order to support the substantial growth of streaming data, companies are required to innovate. Roy Ben-Alta and Allan MacInnis explore AWS solutions powered by machine learning and artificial intelligence.
      4:35pm-5:15pm (40m) Data engineering, Data Engineering & Architecture Architecture, Streaming
      Why containers and microservices need streaming data
      Paul Curtis (MapR Technologies)
      A microservices architecture benefits from the agility of containers for convenient, predictable deployment of applications, while persistent, performant message streaming makes both work better. Paul Curtis explores these infrastructure components and discusses the design of highly scalable real-world systems that take advantage of this powerful triad.
      5:25pm-6:05pm (40m) Big data and the Cloud, Data Engineering & Architecture Cloud, Media, Platform
      Spotify in the cloud: The next evolution of data at Spotify
      Josh Baer (Spotify), Alison Gilles (Spotify)
      In early 2016, Spotify decided that it didn’t want to be in the data center business. The future was the cloud. Josh Baer and Alison Gilles explain what it took to move Spotify to the cloud, covering Spotify's technology choices, challenges faced, and the lessons Spotify learned along the way.
      11:20am-12:00pm (40m) Data-driven business management, Enterprise adoption, Strata Business Summit Financial services
      Enterprise digital transformation using big data
      Atul Dalmia (American Express)
      Big data decisioning is critical to driving real-time business decisions in our digital age. But how do you begin the transformation to big data? The key is enterprise adoption across a variety of end users. Atul Dalmia shares best practices learned from American Express's five-year journey, the biggest challenges you’ll face, and ideas on how to solve them.
      1:15pm-1:55pm (40m) Enterprise adoption, Strata Business Summit
      Next-generation data management
      Milind Nagnur (Citigroup)
      Milind Nagnur explores the requirements for a next-generation platform for data management, covering everything from controlled exploratory sandboxes to hosting transactional applications, and explains how modern, industry-leading data management tools and self-service analytics can address these needs.
      2:05pm-2:45pm (40m) Enterprise adoption, Strata Business Summit Financial services, Platform
      Optimizing the data warehouse at Visa
      Nandu Jayakumar (Visa), Justin Erickson (Cloudera)
      At Visa, the process of optimizing the enterprise data warehouse and consolidating data marts by migrating these analytic workloads to Hadoop has played a key role in the adoption of the platform and how data has transformed Visa as an organization. Nandu Jayakumar and Justin Erickson share Visa’s journey along with some best practices for organizations migrating workloads to Hadoop.
      2:55pm-3:35pm (40m) Data science & advanced analytics, Strata Business Summit Financial services
      Big data analysis of futures trades
      Tobi Bosede (Johns Hopkins)
      Whether an entity seeks to create trading algorithms or mitigate risk, predicting trade volume is an important task. Focusing on futures trading that relies on Apache Spark for processing the large amount data, Tobi Bosede considers the use of penalized regression splines for trade volume prediction and the relationship between price volatility and trade volume.
      4:35pm-5:15pm (40m) Data-driven business management, Strata Business Summit
      Learning from higher education: How Ivy Tech is using predictive analytics and data democracy to reverse decades of entrenched practices
      Brendan Aldrich (Ivy Tech Community College ), Lige Hensley (Ivy Tech Community College )
      As the largest community college in the US, Ivy Tech ingests over 100M rows of data a day. Brendan Aldrich and Lige Hensley explain how Ivy Tech is applying predictive technologies to establish a true data democracy—a self-service data analytics environment empowering thousands of users each day to improve operations, achieve strategic goals, and support student success.
      5:25pm-6:05pm (40m) Enterprise adoption, Strata Business Summit
      What I learned from teaching 1,500 analytics students
      Jerrard Gaertner (University of Toronto School of Continuing Studies)
      Engaging, teaching, mentoring, and advising mature, mostly employed, often enthusiastic and ambitious adult learners at University of Toronto has taught Jerrard Gaertner more about analytics in the real world than he ever imagined. Jerrard shares stories he learned about everything from hyped-up expectations and internal sabotage to organizational streamlining and creating transformative insight.
      11:20am-12:00pm (40m) Data engineering, Data Engineering & Architecture
      Working smarter, not harder: Driving data engineering efficiency at Netflix
      Michelle Ufford (Netflix)
      What if we used the wealth of data and experience at our disposal to drive improvements in data engineering? Michelle Ufford explains how Netflix is using data to find common patterns among the chaos that enable the company to automate repetitive and time-consuming tasks and discover ways to improve data quality, reduce costs, and quickly identify and respond to issues.
      1:15pm-1:55pm (40m) Big data and the Cloud, Data Engineering & Architecture Financial services, Platform
      Cloud data lakes: Analytic data warehouses in the cloud
      John Hitchingham (FINRA)
      John Hitchingham shares insights into the design and operation of FINRA's data lake in the AWS cloud, where FINRA extracts, transforms, and loads over 75B transactions per day. Users can query across petabytes of data in seconds on AWS S3 using Presto and Spark—all while maintaining security and data lineage.
      2:05pm-2:45pm (40m) Data Engineering & Architecture, Spark & beyond
      Exploring real-time capabilities with Spark SQL
      Lucy Yu (MemSQL)
      Lucy Yu demonstrates how to extend the Spark SQL abstraction to support more complex pushdown, such as group by, subqueries, and joins.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Spark & beyond
      Extending Spark ML: Adding your own tools and algorithms
      Holden Karau (Google), Seth Hendrickson (Cloudera)
      Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau and Seth Hendrickson introduce Spark’s ML pipelines and explain how to extend them with your own custom algorithms. Even if you don't have your own algorithm to add, you'll leave with a deeper understanding of Spark's ML pipelines.
      4:35pm-5:15pm (40m) Data Engineering & Architecture
      Scaling database and analytic workloads with Apache Kudu
      Zbigniew Baranowski (CERN)
      Apache Kudu is a new, innovative distributed storage that combines low-latency data ingestion, scalable analytics, and fast data lookups. But what does it deliver in practice? Zbigniew Baranowski explains how to use Apache Kudu for scale-out database-like systems, such as those used at CERN, covering the advantages and limitations and measuring performance.
      5:25pm-6:05pm (40m) Data Engineering & Architecture, Spark & beyond
      Using ML to solve failure problems with ML and AI apps in Spark
      Adrian Popescu (Unravel Data Systems), Shivnath Babu (Unravel Data Systems)
      A roadblock in the agility that comes with Spark is that application developers can get stuck with application failures and have a tough time finding and resolving the issue. Adrian Popescu and Shivnath Babu explain how to use the root cause diagnosis algorithm and methodology to solve failure problems with ML and AI apps in Spark.
      11:20am-12:00pm (40m) Data engineering, Data Engineering & Architecture Geospatial, Logistics, Platform
      Geospatial big data analysis at Uber
      Zhenxiao Luo (Uber), Wei Yan (Uber)
      Uber's geospatial data is increasing exponentially as the company grows. As a result, its big data systems must also grow in scalability, reliability, and performance to support business decisions, user recommendations, and experiments for geospatial data. Zhenxiao Luo and Wei Yan explain how Uber runs geospatial analysis efficiently in its big data systems, including Hadoop, Hive, and Presto.
      1:15pm-1:55pm (40m) Data Engineering & Architecture, Hadoop platform & applications Platform, Telecom
      How T-Mobile built a massive-scale network performance management platform on Hadoop
      Travis Bakeman (T-Mobile)
      Travis Bakeman shares how T-Mobile ported its large-scale network performance management platform, T-PIM, from a legacy database to a big data platform with Impala as the main reporting interface, covering the migration journey, including the challenges the team faced, how the team evaluated new technologies, lessons learned along the way, and the efficiencies gained as a result.
      2:05pm-2:45pm (40m) Data Engineering & Architecture, Real-time applications Streaming
      A brave new world in mutable big data: Relational storage
      Todd Lipcon (Cloudera)
      To date, mutable big data storage has primarily been the domain of nonrelational (NoSQL) systems such as Apache HBase. However, demand for real-time analytic architectures has led big data back to a familiar friend: relationally structured data storage systems. Todd Lipcon explores the advantages of relational storage and reviews new developments, including Google Cloud Spanner and Apache Kudu.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Enterprise adoption Platform, Sales
      The journey to Einstein: Building a multitenancy AI platform that powers hundreds of thousands of businesses
      Simon Chan (Salesforce)
      Salesforce recently released Einstein, which brings AI into its core platform to power every business. The secret behind Einstein is an underlying platform that accelerates AI development at scale for both internal and external data scientists. Simon Chan shares his experience building this unified platform for a multitenancy, multibusiness cloud enterprise.
      4:35pm-5:15pm (40m) Data engineering, Data Engineering & Architecture Architecture, Media, Platform
      End-to-end data discovery and lineage in a heterogeneous big data environment with Apache Atlas and Avro
      Barbara Eckman (Comcast)
      Barbara Eckman offers an overview of Comcast’s streaming data platform, comprised of a variety of ingest, transformation, and storage services, which uses Apache Avro schemas to support end-to-end data governance, Apache Atlas for data discovery and lineage, and custom asynchronous messaging libraries to notify Atlas of new data and schema entities and lineage links as they are created.
      5:25pm-6:05pm (40m) Data engineering, Data Engineering & Architecture
      Solving data cleaning and unification using human-guided machine learning
      Ihab Ilyas (University of Waterloo | Tamr)
      Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution.
      11:20am-12:00pm (40m) Data Engineering & Architecture, Stream processing and analytics Streaming
      Stream all the things!
      Dean Wampler (Lightbend)
      While stream processing is now popular, streaming architectures must be more reliable and scalable than ever before—more like microservice architectures in fact. Dean Wampler defines "stream" based on characteristics for such systems, using specific tools like Kafka, Spark, Flink, and Akka as examples, and argues that big data and microservices architectures are converging.
      1:15pm-1:55pm (40m) Data engineering, Data Engineering & Architecture, Stream processing and analytics Architecture, IoT, Streaming
      When boring is awesome: Making PostgreSQL scale for time series data
      Michael Freedman (TimescaleDB | Princeton)
      Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries.
      2:05pm-2:45pm (40m) Data Engineering & Architecture, Stream processing and analytics Streaming
      Mistakes were made, but not by us: Lessons from a year of supporting Apache Kafka
      Dustin Cote (Confluent)
      Dustin Cote shares his experience troubleshooting Apache Kafka in production environments and explains how to avoid pitfalls like message loss or performance degradation in your environment.
      2:55pm-3:35pm (40m) Big data and the Cloud, Data Engineering & Architecture
      A deep dive into Apache Kafka core internals
      Jun Rao (Confluent)
      Over the last few years, streaming platform Apache Kafka has been used extensively for real-time data collecting, delivering, and processing—particularly in the enterprise. Jun Rao leads a deep dive into some of the key internals that help make Kafka popular and provide strong reliability guarantees.
      4:35pm-5:15pm (40m) Data Engineering & Architecture, Stream processing and analytics Streaming
      Stream analytics with SQL on Apache Flink
      Fabian Hueske (data Artisans)
      Although the most widely used language for data analysis, SQL is only slowly being adopted by open source stream processors. One reason is that SQL's semantics and syntax were not designed with streaming data in mind. Fabian Hueske explores Apache Flink's two relational APIs for streaming analytics—standard SQL and the LINQ-style Table API—discussing their semantics and showcasing their usage.
      5:25pm-6:05pm (40m) Data Engineering & Architecture, Stream processing and analytics Financial services, Media, Streaming
      Low-latency streaming: Twitter Heron on Infiniband
      Karthik Ramasamy (Streamlio), Supun Kamburugamuve (Indiana University)
      Modern enterprises are data driven and want to move at light speed. To achieve real-time performance, financial applications use streaming infrastructures for low latency and high throughput. Twitter Heron is an open source streaming engine with low latency around 14 ms. Karthik Ramasamy and Supun Kamburugamuvee explain how they ported Heron to Infiniband to achieve latencies as low as 7 ms.
      11:20am-12:00pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet IoT
      State-of-the-art robot predictive maintenance with real-time sensor data
      Mateusz Dymczyk (H2O.ai), Mathieu Dumoulin (MapR Technologies)
      Mateusz Dymczyk and Mathieu Dumoulin showcase a working, practical, predictive maintenance pipeline in action and explain how they built a state-of-the-art anomaly detection system using big data frameworks like Spark, H2O, TensorFlow, and Kafka on the MapR Converged Data Platform.
      1:15pm-1:55pm (40m) Data Engineering & Architecture, Real-time applications Financial services, Logistics
      How an Italian company rules the world of insurance: Facing the technological challenges of turning data into value
      Riccardo Corbella (Data Reply IT), Beniamino Del Pizzo (Data Reply IT)
      With more than 4.5 million black boxes, Italian car insurance has the most telematics clients in the world. Riccardo Corbella and Beniamino Del Pizzo explore the data management challenges that occur in a streaming context when the amount of data to process is gigantic and share a data management model capable of providing the scalability and performance needed to support massive growth.
      2:05pm-2:45pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet Data for good, Healthcare, IoT
      Predicting tantrums with wearable data and real-time analytics
      Julie Lockner (17 Minds Corporation)
      How can we empower individuals with special needs to reach their full potential? Julie Lockner offers an overview of a project to develop collaboration applications that use wearable device data to improve the ability to develop the best possible care and education plans. Join in to learn how real-time IoT data analytics are making this possible.
      2:55pm-3:35pm (40m) Data Engineering & Architecture, Hadoop platform & applications, Spark & beyond
      Project Rainier: Saving lives one insight at a time
      Marc Carlson (Seattle Children's Research Institute), Sean Taylor (Seattle Children's Research Institute)
      Marc Carlson and Sean Taylor offer an overview of Project Rainier, which leverages the power of HDFS and the Hadoop and Spark ecosystem to help scientists at Seattle Children’s Research Institute quickly find new patterns and generate predictions that they can test later, accelerating important pediatric research and increasing scientific collaboration by highlighting where it is needed most.
      4:35pm-5:15pm (40m) Data Engineering & Architecture, Hadoop platform & applications Architecture, Platform, Streaming
      Working within the Hadoop ecosystem to build a live-streaming data pipeline
      Stephen Devine (Big Fish Games), Kalah Brown (Big Fish Games)
      Companies are increasingly interested in processing and analyzing live-streaming data. The Hadoop ecosystem includes platforms and software library frameworks to support this work, but these components require correct architecture, performance tuning, and customization. Stephen Devine and Kalah Brown explain how they used Spark, Flume, and Kafka to build a live-streaming data pipeline.
      5:25pm-6:05pm (40m) Data Engineering & Architecture, Sensors, IOT & Industrial Internet Architecture, IoT
      An open source architecture for the IoT
      Dave Shuman (Cloudera), James Kirkland (Red Hat)
      Eclipse IoT is an ecosystem of organizations that are working together to establish an IoT architecture based on open source technologies and standards. Dave Shuman and James Kirkland showcase an end-to-end architecture for the IoT based on open source standards, highlighting Eclipse Kura, an open source stack for gateways and the edge, and Eclipse Kapua, an open source IoT cloud platform.
      11:20am-12:00pm (40m) Strata Business Summit
      Data science for good: Benefit the world and your business at the same time
      Derek Ruths (CAI)
      Derek Ruths explains how volunteer efforts, when done the right way, can actually improve a data science team’s culture and productivity—motivating data scientists, sharpening their skills, providing exposure to new challenges, reducing turnover, and creating valuable recruiting opportunities.
      1:15pm-1:55pm (40m) Data-driven business management, Strata Business Summit
      Accelerating the next generation of data companies
      Chris Neumann (500 Startups), Carla Holtze (Parrable), Bradford Cross (DCVC), Kyle Wild (Keen IO), Tasso Argyros (‎ActionIQ)
      This panel brings together partners from some of the world’s leading startup accelerators and founders of up-and-coming enterprise data startups to discuss how we can help create the next generation of successful enterprise data companies.
      2:05pm-2:45pm (40m) Emerging Technologies, Strata Business Summit
      Where the puck is headed: A VC panel discussion
      Michael Dauber (Amplify Partners), Sarah Catanzaro (Canvas Ventures), Katherine Boyle (General Catalyst), Lisha Li (Amplify Partners), Sandeep Bhadra (Vertex Ventures)
      In a panel discussion, top-tier VCs look over the horizon and consider the big trends in big data, explaining what they think the field will look like a few years (or more) down the road.
      2:55pm-3:35pm (40m) Data-driven business management, Enterprise adoption, Strata Business Summit AI, Marketing
      Learning from customers, keeping humans in the loop
      Elsie Kenyon (Nara Logics)
      Enterprises today pursue AI applications to replace logic-based expert systems in order to learn from customer and operational signals. But training data is often limited or nonexistent, and applying or extrapolating the wrong dataset can be costly to a company's business and reputation. Elsie Kenyon explains how to harness institutional human knowledge to augment data in deployed AI solutions.
      4:35pm-5:15pm (40m) Business case studies, Strata Business Summit Healthcare
      Spark clinical surveillance: Saving lives and improving patient care
      Charles Boicey (Clearsense)
      Charles Boicey explains how Clearsense uses Spark Streaming to provide real-time updates to healthcare providers for critical healthcare needs, helping clinicians make timely decisions from the assessment of a patient's risk based on information gathered from streaming physiological monitoring along with streaming diagnostic data and the patient historical record.
      5:25pm-6:05pm (40m) Data-driven business management, Strata Business Summit Marketing, Retail
      Retail's panacea: How machine learning is driving product development
      Hilary Milnes (Glossy), Karen Moon (Trendalytics), Jared Schiffman (Perch Interactive), Eric Colson (Stitch Fix), Catherine Twist (Xcel Brands (Isaac Mizrahi, C. Wonder, Halston, Judith Ripka))
      Karen Moon, Jared Schiffman, Eric Colson, and Catherine Twist explore how the retail industry is embracing data to include consumers in the design and development process, tackling the challenges associated with the wealth of sources and the unstructured nature of the data they handle and process and how the data is turned into insights that are digestible and actionable.
      11:20am-12:00pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Artificial intelligence—The next digital frontier?
      Michael Chui (McKinsey Global Institute)
      After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we're still early in the cycle of adoption. Michael Chui explains where investment is going, patterns of AI adoption and value capture by enterprises, and how the value potential of AI across sectors and business functions is beginning to emerge.
      1:15pm-1:55pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Legal best practices for making data work
      Alysa Z. Hutnik (Kelley Drye & Warren LLP)
      Big data promises enormous benefits for companies. But what about privacy, data protection, and consumer laws? Having a solid understanding of the legal and self-regulatory rules of the road are key to maximizing the value of your data while avoiding data disasters. Alysa Hutnik shares legal best practices and practical tips to avoid becoming a big data “don’t.”
      2:05pm-2:45pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: From data insights to action—Developing a data-driven company culture
      Ashish Verma (Deloitte)
      Ashish Verma explores the challenges organizations face after investing in hardware and software to power their analytics projects and the missteps that lead to inadequate data practices. Ashish explains how to course-correct and implement an insight-driven organization (IDO) framework that enables you to derive tangible value from your data faster.
      2:55pm-3:35pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Conversational marketing for brands—Why it's better to talk to your customers than monitor them
      Andy Mauro (Automat)
      Andy Mauro explains why the last 15 years of digital marketing was really about monitoring customers and how recent advancements in artificial intelligence and the dominance of messaging as the primary consumer channel provide an opportunity to achieve every marketer's dream of simply talking to customers—providing a personalized experience that drives engagement, brand loyalty, and conversions.
      4:35pm-5:15pm (40m) Data-driven business management, Executive Briefing, Strata Business Summit
      Executive Briefing: Preparing your infrastructure for AI
      Edd Wilder-James (Google)
      Edd Wilder-James outlines a road map for executives who are beginning to consider their strategies for implementing artificial intelligence in their critical processes.
      5:25pm-6:05pm (40m) Executive Briefing, Strata Business Summit
      Executive Briefing: Data ecosystem strategy
      Jason McIntyre (Accenture), Mark Milazzo (Accenture)
      Whether you are a technology or a services provider, understanding your value in the ecosystem and focusing on the right partners to reach your market goals is critical. Jason McIntyre and Mark Milazzo share examples of teaming models and leading practices for accelerating value from your ecosystem strategy.
      11:20am-12:00pm (40m) Security
      Machine learning to spot cybersecurity incidents at scale
      Eddie Garcia (Cloudera)
      Machine data from firewalls, network switches, DNS servers, and many other devices in your organization may be untapped potential for cybersecurity threat analytics using machine learning. Eddie Garcia explores how companies are using Apache Hadoop-based approaches to protect their organizations and explains how Apache Spot is tackling this challenge head-on.
      1:15pm-1:55pm (40m) Security
      An authenticated journey through big data security at Walmart
      Matt Bolte (Walmart), Toni LeTempt (Walmart)
      In today’s world of data breaches and hackers, security is one of the most important components for big data systems, but unfortunately, it's usually the area least planned and architected. Matt Bolte and Toni LeTempt share Walmart's authentication journey, focusing on how decisions made early can have significant impact throughout the maturation of your big data environment.
      2:05pm-2:45pm (40m) Security
      Confounding factors galore: Using software ecosystem data to risk-rate code
      J. C. Herz (Ion Channel)
      Automating security for DevOps means continuous analysis of open source software dependencies, vulnerabilities, and ecosystem dynamics. But the data is confounding: a flurry of reported vulnerabilities or infrequent commits that could be good or bad, depending on a project's scope and lifecycle. JC Herz illuminates nonintuitive insights from the software supply chain.
      2:55pm-3:35pm (40m) Security Financial services
      Architecting security across the enterprise: Instilling confidence and stewardship every step of the way
      Nick Curcuru (Mastercard)
      Cybersecurity is now a topic in the boardroom, as organizations are scrambling to increase their security posture. To decrease breach threats, Mastercard brings data security into its system design process. Nick Curcuru shares best practices and lessons learned protecting 160 million transactions per hour over Mastercard's network and securing 16+ petabytes of data at rest.
      4:35pm-5:15pm (40m) Data science & advanced analytics, Sensors, IOT & Industrial Internet
      Anonymized data fusion: Privacy versus utility
      Behrooz Hashemian (Massachusetts Institute of Technology)
      People are leaving an increasing amount of digital traces in their everyday life. Since these traces are mostly anonymized, the information gained by advanced data analytics is limited to each individual trace. Behrooz Hashemian explains how to fuse various traces and build multidimensional insight by taking advantage of patterns in people's behavior.
      5:25pm-6:05pm (40m) Hadoop platform & applications, Visualization & user experience
      Interactive data exploration and analysis at enterprise scale
      Sean Kandel (Trifacta), Kaushal Gandhi (Trifacta)
      Sean Kandel and Kaushal Gandhi share best practices for building and deploying Hadoop applications to support large-scale data exploration and analysis across an organization.
      11:20am-12:00pm (40m) Data science & advanced analytics, Data-driven business management
      The cognitive design principles of interactive analytics
      Mike Driscoll (Metamarkets)
      Most analytics tools in use today provide static visuals that don’t reveal the full, real-time picture. Mike Driscoll shows how to take an interactive approach to analytics. From design techniques to discovering new forms of data exploration, he demonstrates how to put the full power of big data into the hands of the people who need it to make key business decisions.
      1:15pm-1:55pm (40m) Visualization & user experience
      Improve business decision making with the science of human perception
      Sebastian Gutierrez (DashingD3js.com)
      You likely already use business metrics and analytics to achieve success in your data-driven organization. Sebastian Gutierrez demonstrates how to use the science of human perception to drastically improve your data visualizations, reports, and dashboards to drive better decisions and results.
      2:05pm-2:45pm (40m) Visualization & user experience
      Design for nondesigners: Increasing revenue, usability, and utility within data analytics products
      Brian O'Neill (Designing for Analytics)
      Do you spend a lot of time explaining your data analytics product to your customers? Is your UI/UX or navigation overly complex? Are sales suffering due to complexity, or worse, are customers not using your product? Your design may be the problem. Brian O'Neill shares a secret: you don't have to be a trained designer to recognize design and UX problems and start correcting them today.
      2:55pm-3:35pm (40m) Visualization & user experience Financial services
      Expanding data literacy with data visualizations
      Julie Rodriguez (Eagle Investment Systems)
      While the value of data and its role in informing decisions and communications is well known, its meaning can be incorrectly interpreted without data visualizations that provide context and accurate representation of the underlying numbers. Julie Rodriguez shares new approaches and visual design methods that provide a greater perspective of the data.
      4:35pm-5:15pm (40m) Visualization & user experience Text
      Text analytics and new visualization techniques
      Richard Brath (Uncharted Software), Scott Langevin (Uncharted Software)
      Text analytics are advancing rapidly, and new visualization techniques for text are providing new capabilities. Richard Brath and Scott Langevin offer an overview of these new ways to organize massive volumes of text, characterize subjects, score synopses, and skim through lots of documents.
      5:25pm-6:05pm (40m) Visualization & user experience Financial services
      Discovering insights in financial data with immersive reality
      John Horcher (Virtual Cove)
      Immersive reality enables powerful new information design concepts. Most importantly, the new technology enables the telling of powerful stories using more insightful thinking. John Horcher explores how immersive reality deployments in financial markets have enabled quicker time to insight and therefore better decision making.
      11:20am-12:00pm (40m) Sponsored
      Building a real-time feedback loop for education (sponsored by MemSQL)
      David Mellor (Curriculum Associates)
      Curriculum Associates has a mission to make classrooms better places for teachers and students. To achieve this, the company introduces innovative and exciting new products that give every student the chance to succeed. David Mellor explains how Curriculum Associates developed a real-time data pipeline with MemSQL, which empowered teachers to provide immediate and accurate student feedback.
      1:15pm-1:55pm (40m) Sponsored
      The essentials for digital growth (sponsored by MapR)
      Jack Norris (MapR Technologies)
      Jack Norris shares lessons learned by leading companies leveraging data to transform customer experiences, operational results, and overall growth and details the infrastructure, development, and data management principles used by successful leaders to drive agility regardless of application volume or scale.
      2:05pm-2:45pm (40m) Sponsored
      A winning combination: The power of big data and the democracy of information (sponsored by Paxata)
      Santhosh Mahendiran (Standard Chartered Bank)
      Santhosh Mahendiran explains how financial services company Standard Chartered Bank is using self-service data prep and machine learning technologies to democratize its data lake, offering trusted information to analysts, subject-matter experts, and line-of-business executives across 70 countries to help monitor fraud, track money-laundering activities, and perform regulatory compliance reporting.
      2:55pm-3:35pm (40m) Sponsored
      Using an AI-driven approach to managing data lakes in the cloud or on-premises (sponsored by Informatica)
      Murthy Mathiprakasam (Informatica), Sravan Kasarla (Fidelity Investments)
      In the face of regulatory and competitive pressures, why not use artificial intelligence, along with smart best practices, to manage data lakes? Murthy Mathiprakasam shares a comprehensive approach to data lake management that ensures that you can quickly and flexibly ingest, cleanse, master, govern, secure, and deliver all types of data in the cloud or on-premises.
      4:35pm-5:15pm (40m) Sponsored
      Hybrid data lakes: Unlocking the inevitable (sponsored by Cask)
      Jonathan Gray (Cask)
      To take advantage of the latest big data technology options in the cloud, more and more enterprises are building hybrid, self-service data lakes. Jonathan Gray discusses the importance of a portability strategy, addresses implementation challenges, and shares customer use cases that will inspire enterprises to embark on a multi-environment data lake journey.
      5:25pm-6:05pm (40m) Sponsored
      AIG: Creating a data-driven customer service organization (sponsored by Talend)
      Kevin Stallings (AIG)
      Kevin Stallings provides an inside look at how AIG executed a technological and cultural transformation that had a powerful impact on business outcomes and bottom-line results and explains how to use these lessons to put enterprise-wide big data preparation and self-service analysis to great use within your organization and dramatically increase customer satisfaction and engagement.
      11:20am-12:00pm (40m) Sponsored
      Building the IoT data lifecycle (sponsored by Cisco)
      Han Yang (Cisco Systems)
      For many enterprises, the internet of things represents an opportunity to transform the business by examining its data from a holistic lifecycle perspective and generating, analyzing, and archiving the data to reengineer the enterprise. Han Yang explores the latest trends and the role of infrastructure in enabling such a transformation.
      1:15pm-1:55pm (40m) Sponsored
      Accelerate your analytics with a GPU Data Frame (sponsored by MapD)
      Todd Mostak (MapD)
      For all of the innovation occurring across the GPU software ecosystem, the platforms themselves still remain isolated from each other—until now. Todd Mostak debuts the GPU Open Analytics Initiative’s first project, the GPU Data Frame (GDF), and explains how GDF enables efficient intra-GPU communication between different processes running on the GPUs.
      2:05pm-2:45pm (40m) Sponsored
      Launching a breakthrough data lake platform for the enterprise information fabric (sponsored by Cambridge Semantics)
      Ben Szekely (Cambridge Semantics)
      Only with a rich and interactive semantic layer can the data and analytics stack deliver true on-demand access to data, answers, and insights, weaving data together from across the enterprise into an information fabric. Ben Szekely shares the capabilities of the newly launched Anzo Smart Data Lake 4.0, the only end-to-end platform for semantic layers based on open standards.
      2:55pm-3:35pm (40m) Sponsored
      Building enterprise OLAP on Hadoop in finance with Apache Kylin (sponsored by Kyligence)
      Luke Han (Kyligence)
      Luke Han offers an overview of Apache Kylin and its enterprise version KAP and shares a case study of how a top finance company migrated to Apache Kylin on top of Hadoop from its legacy Cognos and DB2 system.
      4:35pm-5:15pm (40m) Sponsored
      Using real-time machine learning and big data to drive customer engagement and digital transformation (sponsored by RedPoint Global)
      George Corugedo (RedPoint Global)
      Driving digital transformation is a vital component of continued organizational success and more personalized customer engagement. The best results will come from operationalizing data to automate decisions with machine learning. George Corugedo explains how RedPoint’s customers use connected enterprise data, machine learning, and analytics to impact their businesses.
      5:25pm-6:05pm (40m) Sponsored
      Powering business outcomes with data science in a connected world (sponsored by Hortonworks)
      Piet Loubser (Hortonworks)
      Data has become the new fuel for business success. As a result, business intelligence and analytics are among the top priorities for CIOs today. Piet Loubser outlines the tectonic shift currently taking place in the market and explains why next-gen connected architectures are crucial to meet the demands of an intelligent, connected world.
      11:20am-12:00pm (40m) Sponsored
      Data science platforms: Your key to actionable analytics (sponsored by DataScience.com)
      William Merchan (DataScience.com)
      The number of inefficiencies in the data science workflow is staggering. Data science platforms have emerged to combat these inefficiencies. William Merchan outlines the key components of a data science platform and demonstrates how these platforms are enabling organizations to realize the potential of their data science teams.
      1:15pm-1:55pm (40m) Sponsored
      Real-time recommendation engines using SAS technology (sponsored by SAS)
      Juthika Khargharia (SAS)
      How does your favorite website serve up the perfect content just for you? It's all based on machine learning. By continuously adjusting machine learning models based on real-time data, you can visualize changes and take action on the new information in real time. Juthika Khargharia explains how to build a recommendation engine to surface these recommendations on real-time data.
      2:05pm-2:45pm (40m) Sponsored
      How visual analytics drove data asset success at Procter & Gamble (sponsored by Arcadia Data)
      Michelle Tower (Procter & Gamble)
      The early stages of delivering on your data strategies are daunting. With many claims of failed data lakes or “data swamps,” the journey seems risky, which is why you need help from industry experts to get going. Michelle Tower explains how P&G is using big data, Apache Hadoop, and visual analytics to quickly discover new insights and optimize data models for analytics and data visualization.
      2:55pm-3:35pm (40m) Sponsored
      Tracking the opioid-fueled HIV outbreak with big data (sponsored by Trifacta)
      Ells Campbell (CDC), Connor Carreras (Trifacta), Ryan Weil (Leidos)
      Ells Campbell, Connor Carreras, and Ryan Weil explain how the Microbial Transmission Network Team (MTNT) at the Centers for Disease Control (CDC) is leveraging new techniques in data collection, preparation, and visualization to advance the understanding of the spread of HIV/AIDS.
      4:35pm-5:15pm (40m) Sponsored
      Protect IoT data and monetize it with analytics (sponsored by Micro Focus Security and Big Data Analytics)
      Phil Sewell (Micro Focus)
      Phil Sewell discusses standards, options, and use cases for extracting value and delivering business outcomes from data protected at the data level.
      5:25pm-6:05pm (40m) Sponsored
      How JW Player is powering the online video revolution with data analytics (sponsored by Snowflake Computing)
      Rick Okin (JW Player)
      Rick Okin explains how JW Player strategically leverages video data analytics to power industry- and customer-level insights for the evolving online video space.
      11:20am-12:00pm (40m) Sponsored
      How Vivint Smart Home made home security and automation even smarter with Tableau (sponsored by Tableau)
      Brandon Bunker (Vivint)
      Brandon Bunker explains how Vivint delivers fast analytics from big data on a bootstrap budget by leveraging Tableau as a strategic piece of its modern BI architecture. By interactively analyzing data as it lands in its Cloudera Hadoop data lake, Vivint is able to deliver security across homes and data alike, making smart homes even smarter and saving customers money in the process.
      1:15pm-1:55pm (40m) Sponsored
      Enabling data science self-service with the Elastic Data Platform (sponsored by Dell EMC)
      Bala Chandrasekaran (Barclays)
      Barclays and Dell EMC have partnered on the deployment of a solution called the Elastic Data Platform. Ankit Tharwani offers an overview of this platform, which gives data scientists the ability to self-serve sandbox environments, cutting down the time to provision environments from months to hours.
      2:05pm-2:45pm (40m) Sponsored
      (Big) data team productivity: A balancing act (sponsored by Dataiku)
      Kenneth Sanford (Dataiku)
      Fragmented data science and analytics teams result in duplicate work, poor collaboration, a lack of governance, insufficient adoption at scale, and significant key-man risk. Kenneth Sanford explains how to overcome these challenges and build a centralized analytics practice that empowers data-driven decision making.
      2:55pm-3:35pm (40m) Sponsored
      The converging world of big data and the IoT (sponsored by Pentaho)
      Chuck Yarbrough (Pentaho)
      The IoT can deliver real outcomes that can transform organizations—and societies—for the better. But the IoT is not transformative without the power of big data. Chuck Yarbrough shares examples of where the IoT and big data have combined to solve significant business challenges and take advantage of business opportunities.
      4:35pm-5:15pm (40m) Sponsored
      Smarter business apps with a modern GPU database (sponsored by Kinetica)
      Mate' Radalj (Kinetica)
      Infusing business apps with AI isn’t easy. Mate Radalj explains why you need to master the entire AI process from data to models to operationalization so you can build, train, and deploy predictive models that unleash smart business apps and enable data-driven decisions.  
      5:25pm-6:05pm (40m) Sponsored
      Big data, location analytics, and geoenrichment to drive better business outcomes (sponsored by Pitney Bowes)
      Tim McKenzie (Pitney Bowes)
      Organizations need to have a data strategy that includes the tools to derive location intelligence, enhance existing data with geographic enrichment (geoenrichment), and perform location analytics to reveal strategic and operational insights. Tim McKenzie shares new data quality and location intelligence approaches that operate natively within Hadoop and Spark environments.
      11:20am-12:00pm (40m) Sponsored
      Accelerating insight with analytics and AI (sponsored by Intel)
      Kevin Huiskes (Intel), Radhika Rangarajan (Intel)
      Kevin Huiskes and Radhika Rangarajan discuss Intel's strategy to lower barriers to advanced analytics and AI, make results faster and more efficient, and enable data scientists and developers to make better use of existing infrastructure, emphasizing solutions based on the latest Intel Xeon Scalable platform and the open source framework BigDL.
      1:15pm-1:55pm (40m) Sponsored
      How the separation of compute and storage impacts your big data analytics way of life (sponsored by Micro Focus Security and Big Data Analytics)
      Deepak Majeti (Vertica)
      Deepak Majeti explains why the separation of compute and storage has become critical to maximizing the benefits of cloud economics.
      2:05pm-2:45pm (40m) Sponsored
      Architect and operationalize your enterprise data lake (sponsored by Zaloni)
      Ben Sharma (Zaloni), Carlos Matos (AIG)
      Envision the next phase of your company’s data future: providing centralized data services for streamlined yet controlled access to data for end users across lines of business. Carlos Matos and Ben Sharma share strategies for developing an enterprise-wide data lake service to drive shared data insights across the organization. Are you ready?
      2:55pm-3:35pm (40m) Sponsored
      Orchestrating your complex data pipeline across your enterprise (sponsored by SAP)
      Michelle Mensing (SAP)
      Evolving big data architectures are creating an increasingly complex landscape. Michelle Mensing explains how to simplify data orchestration across various big data and enterprise sources, demonstrating how to create a complex pipeline and execute the pipeline in Kubernetes clusters, covering data acquisition, transformation, cleaning data, and running the algorithms.
      4:35pm-5:15pm (40m) Sponsored
      Data science beyond the sandbox (sponsored by Anaconda)
      Peter Wang (Anaconda)
      Peter Wang explores the typical problems data science teams experience when working with other teams and explains how these issues can be overcome through cohesive collaborative efforts among data scientists, business analysts, IT teams, and more.
      5:25pm-6:05pm (40m) Sponsored
      Serverless big data architectures: Design patterns and best practices (sponsored by AWS)
      Ben Snively (Amazon Web Services (AWS))
      How do you incorporate serverless concepts and technologies into your big data architectures? Ben Snively shares use cases, best practices, and a reference architecture to help you streamline data processing and improve analytics through a combination of cloud and open source serverless technologies.
      8:50am-9:00am (10m)
      Wednesday keynotes
      Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
      Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes.
      9:00am-9:15am (15m)
      Journey to consolidation
      Mike Olson (Cloudera), Cesar Delgado (Apple)
      Twenty years ago, a company implored us to “think different” about personal computers. Today, Apple continues to live and breathe that legacy. It’s evident in the machine learning and analytics architectures that power many of the company's most innovative applications. Cesar Delgado joins Mike Olson to discuss how Apple is using its big data stack and expertise to solve non-data problems.
      9:15am-9:20am (5m)
      White Collar Crime Risk Zones
      Sam Lavigne (The New Inquiry)
      Sam Lavigne offers an overview of White Collar Crime Risk Zones, a predictive policing application that uses industry-standard predictive policing methodologies to predict financial crime at the city-block level with an accuracy of 90.12%. Unlike typical predictive policing apps, which criminalize poverty, White Collar Crime Risk Zones criminalizes wealth.
      9:20am-9:30am (10m) Sponsored keynote
      A whole new way to think about your next-gen applications (sponsored by MapR Technologies)
      Anil Gadre (MapR)
      Businesses struggle to build applications that harness all their data. RDBMS cannot handle modern data-intensive workloads, and NoSQL doesn't provide the capabilities for diverse applications. Anil Gadre explains how customers using a converged data platform are succeeding at creating breakthrough new apps for the enterprise. 
      9:30am-9:35am (5m)
      The age of machine learning
      Ben Lorica (O'Reilly Media)
      Ben Lorica explores the age of machine learning.
      9:35am-9:45am (10m)
      Wild, wild data: Adventures with big data and the IoT in the Angolan highlands
      Jer Thorp (New York University)
      Keynote with Jer Thorp
      9:45am-9:50am (5m) Sponsored keynote
      Teaching databases to learn in the world of AI (sponsored by MemSQL)
      Nikita Shamgunov (MemSQL)
      Nikita Shamgunov discusses the future of databases for fast-learning adaptable applications.
      9:50am-10:05am (15m)
      Music, the window into your soul
      Christine Hung (Spotify)
      Have you ever wondered why Spotify just seems to know what you want? As a data-first company, Spotify is investing heavily in its analytics and machine learning capabilities to understand and predict user needs. Christine Hung shares how Spotify uses data and algorithms to improve user experience and drive business impact.
      10:05am-10:10am (5m) Sponsored keynote
      Unleashing intelligence and data analytics at scale (sponsored by Intel)
      马子雅 (Ziya Ma) (Intel)
      Advanced data analytics is reshaping the enterprise with new discoveries, better customer experiences, and improved products and services, all enabled by actionable insight. Ziya Ma shares how Intel is driving a holistic approach to powering advanced analytics and artificial intelligence workloads and unleashing intelligent and scalable insights from the edge to the cloud to the enterprise.
      10:10am-10:20am (10m)
      Data science for the most vulnerable at UNICEF Innovation
      Manuel García-Herranz (UNICEF Office of Innovation)
      The growing availability of data—along with advances in fields such as data science and artificial intelligence—has profoundly changed businesses. Manuel García-Herranz explains how to leverage these advances for the most vulnerable, while making sure that the existing data divide does not increase the gap in inequality, and integrate these advances into the humanitarian and development systems.
      10:20am-10:35am (15m) Strata Business Summit
      Weapons of math destruction
      Cathy O'Neil (Weapons of Math Destruction)
      Cathy O'Neil exposes the mathematical models that are shaping our future, both as individuals and as a society. These “weapons of math destruction” score teachers and students, sort résumés, grant (or deny) loans, evaluate workers, target voters, set parole, and monitor our health.
      10:50am-11:20am (30m)
      Break: Morning break sponsored by MemSQL
      12:00pm-1:15pm (1h 15m)
      Wednesday Topic Tables at Lunch
      Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
      3:35pm-4:35pm (1h)
      Break: Afternoon break sponsored by Intel
      6:05pm-7:05pm (1h)
      Booth Crawl
      Quench your thirst with vendor-hosted libations (plus snacks) while you check out all the exhibitors in the Expo Hall.
      8:00am-8:30am (30m)
      Speed Networking
      Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees.
      7:30pm-10:30pm (3h)
      Data After Dark: City View
      Join us for Data After Dark at Strata New York. Enjoy breathtaking views of Manhattan from New York's largest outdoor rooftop garden at 230 Fifth.
      7:05pm-7:30pm (25m)
      Break: Dinner
      12:00pm-1:15pm (1h 15m)
      Plenary
      To be confirmed