Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Sessions

Strata Data Conference sessions take place Wednesday, September 27 and Thursday, September 28.

Tuesday, September 26

Add to your personal schedule
9:00am5:00pm Tuesday, September 26, 2017
Location: 1E 09
Rose Winterton (Pitney Bowes), Audrey Spencer-Alvarado (Portland Trail Blazers), Amie Elcan (CenturyLink), Sean Power (Repable), Parisa Foster (Play The Future), Nick Selby (CJX, Inc. | Midlothian Police Department), Salema Rice (Allegis Group), Aneesh Karve (Quilt), Derek Ruths (CAI), Kristina Bergman (Integris Software), Natalia Adler (UNICEF HQ), Brandon O'Brien (Expedia, Inc)
In a series of 12 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further. Read more.

Wednesday, September 27

Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Eric Colson (Stitch Fix)
Average rating: ****.
(4.67, 3 ratings)
While companies often use data science as a supportive function, the emergence of new business models has made it possible for some companies to differentiate via data science. Eric Colson explores what it means to differentiate by data science and explains why companies must now think very differently about the role and placement of data science in the organization. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Financial services
Justin Bleich (Coatue Management)
Average rating: ****.
(4.00, 1 rating)
Prophet is a Bayesian nonlinear time series forecasting model recently released by Facebook. Justin Bleich explains how Coatue—a hedge fund that uses data science to drive investment decisions—extends Prophet to include exogenous covariates when generating forecasts and applies it to nowcasting macroeconomic series using higher-frequency data available from sources such as Google Trends. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  AI, Deep learning, ecommerce
Mikio Braun (Zalando SE)
Average rating: ***..
(3.71, 7 ratings)
Deep learning has become the go-to solution for many application areas, such as image classification or speech processing, but does it work for all application areas? Mikio Braun offers background on deep learning and shares his practical experience working with these exciting technologies. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  ecommerce
Average rating: ***..
(3.00, 1 rating)
Neelesh Srinivas Salian offers an overview of the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem. Neelesh explains the development process and shares some lessons learned along the way. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Financial services
Atul Dalmia (American Express)
Big data decisioning is critical to driving real-time business decisions in our digital age. But how do you begin the transformation to big data? The key is enterprise adoption across a variety of end users. Atul Dalmia shares best practices learned from American Express's five-year journey, the biggest challenges you’ll face, and ideas on how to solve them. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Intermediate
Michelle Ufford (Netflix)
Average rating: ****.
(4.78, 9 ratings)
What if we used the wealth of data and experience at our disposal to drive improvements in data engineering? Michelle Ufford explains how Netflix is using data to find common patterns among the chaos that enable the company to automate repetitive and time-consuming tasks and discover ways to improve data quality, reduce costs, and quickly identify and respond to issues. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Geospatial, Logistics, Platform
Zhenxiao Luo (Uber), Wei Yan (Uber)
Average rating: ****.
(4.43, 7 ratings)
Uber's geospatial data is increasing exponentially as the company grows. As a result, its big data systems must also grow in scalability, reliability, and performance to support business decisions, user recommendations, and experiments for geospatial data. Zhenxiao Luo and Wei Yan explain how Uber runs geospatial analysis efficiently in its big data systems, including Hadoop, Hive, and Presto. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Dean Wampler (Lightbend)
Average rating: ***..
(3.00, 3 ratings)
While stream processing is now popular, streaming architectures must be more reliable and scalable than ever before—more like microservice architectures in fact. Dean Wampler defines "stream" based on characteristics for such systems, using specific tools like Kafka, Spark, Flink, and Akka as examples, and argues that big data and microservices architectures are converging. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 09 Level: Intermediate
Secondary topics:  IoT
Mateusz Dymczyk (H2O.ai), Mathieu Dumoulin (MapR Technologies)
Average rating: ****.
(4.00, 2 ratings)
Mateusz Dymczyk and Mathieu Dumoulin showcase a working, practical, predictive maintenance pipeline in action and explain how they built a state-of-the-art anomaly detection system using big data frameworks like Spark, H2O, TensorFlow, and Kafka on the MapR Converged Data Platform. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 10/11
Derek Ruths (CAI)
Derek Ruths explains how volunteer efforts, when done the right way, can actually improve a data science team’s culture and productivity—motivating data scientists, sharpening their skills, providing exposure to new challenges, reducing turnover, and creating valuable recruiting opportunities. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 12/13
Michael Chui (McKinsey Global Institute)
Average rating: ****.
(4.00, 7 ratings)
After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we're still early in the cycle of adoption. Michael Chui explains where investment is going, patterns of AI adoption and value capture by enterprises, and how the value potential of AI across sectors and business functions is beginning to emerge. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 14 Level: Intermediate
Eddie Garcia (Cloudera)
Machine data from firewalls, network switches, DNS servers, and many other devices in your organization may be untapped potential for cybersecurity threat analytics using machine learning. Eddie Garcia explores how companies are using Apache Hadoop-based approaches to protect their organizations and explains how Apache Spot is tackling this challenge head-on. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Intermediate
Mike Driscoll (Metamarkets)
Average rating: ****.
(4.00, 3 ratings)
Most analytics tools in use today provide static visuals that don’t reveal the full, real-time picture. Mike Driscoll shows how to take an interactive approach to analytics. From design techniques to discovering new forms of data exploration, he demonstrates how to put the full power of big data into the hands of the people who need it to make key business decisions. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 03
Brandon Bunker (Vivint)
Average rating: ****.
(4.00, 1 rating)
Brandon Bunker explains how Vivint delivers fast analytics from big data on a bootstrap budget by leveraging Tableau as a strategic piece of its modern BI architecture. By interactively analyzing data as it lands in its Cloudera Hadoop data lake, Vivint is able to deliver security across homes and data alike, making smart homes even smarter and saving customers money in the process. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 01/02
William Merchan (DataScience.com)
Average rating: *****
(5.00, 2 ratings)
The number of inefficiencies in the data science workflow is staggering. Data science platforms have emerged to combat these inefficiencies. William Merchan outlines the key components of a data science platform and demonstrates how these platforms are enabling organizations to realize the potential of their data science teams. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 06
Kevin Huiskes and Radhika Rangarajan discuss Intel's strategy to lower barriers to advanced analytics and AI, make results faster and more efficient, and enable data scientists and developers to make better use of existing infrastructure, emphasizing solutions based on the latest Intel Xeon Scalable platform and the open source framework BigDL. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 17
Han Yang (Cisco Systems)
For many enterprises, the internet of things represents an opportunity to transform the business by examining its data from a holistic lifecycle perspective and generating, analyzing, and archiving the data to reengineer the enterprise. Han Yang explores the latest trends and the role of infrastructure in enabling such a transformation. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 04/05
David Mellor (Curriculum Associates)
Curriculum Associates has a mission to make classrooms better places for teachers and students. To achieve this, the company introduces innovative and exciting new products that give every student the chance to succeed. David Mellor explains how Curriculum Associates developed a real-time data pipeline with MemSQL, which empowered teachers to provide immediate and accurate student feedback. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Matthew Roche (Microsoft), Jennifer Marie Stevens (Microsoft)
Average rating: *****
(5.00, 1 rating)
The data-driven business must bridge the language gap between data scientists and business users. Matthew Roche and Jennifer Stevens walk you through building a business glossary that codifies your semantic layer and enables greater conversational fluency between business users and data scientists. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Tristan Zajonc (Cloudera), Thomas Dinsmore (Cloudera), Lucas Glass (QuintilesIMS)
Average rating: ***..
(3.00, 1 rating)
Data science alone is easy. Data science with others, whether in the enterprise or on shared distributed systems, requires a bit more work. Tristan Zajonc and Thomas Dinsmore discuss common technology considerations and patterns for collaboration in large teams and for moving machine learning into production at scale. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Yuhao Yang (Intel), Zhichao Li (Intel)
Average rating: ****.
(4.00, 2 ratings)
Yuhao Yang and Zhichao Li discuss building end-to-end analytics and deep learning applications, such as speech recognition and object detection, on top of BigDL and Spark and explore recent developments in BigDL, including Python APIs, notebook and TensorBoard support, TensorFlow model R/W support, better recurrent and recursive net support, and 3D image convolutions. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Cheng Chang (Alluxio), Haoyuan Li (Alluxio)
Alluxio (formerly Tachyon) is a memory-speed virtual distributed storage system that leverages memory for managing data across different storage. Many deployments use Alluxio with Spark because Alluxio helps Spark further accelerate applications. Haoyuan Li and Cheng Chang explain how Alluxio makes Spark more effective and share production deployments of Alluxio and Spark working together. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 18 Level: Advanced
Milind Nagnur (Citigroup)
Average rating: **...
(2.00, 2 ratings)
Milind Nagnur explores the requirements for a next-generation platform for data management, covering everything from controlled exploratory sandboxes to hosting transactional applications, and explains how modern, industry-leading data management tools and self-service analytics can address these needs. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Advanced
Secondary topics:  Financial services, Platform
Average rating: ****.
(4.57, 7 ratings)
John Hitchingham shares insights into the design and operation of FINRA's data lake in the AWS cloud, where FINRA extracts, transforms, and loads over 75B transactions per day. Users can query across petabytes of data in seconds on AWS S3 using Presto and Spark—all while maintaining security and data lineage. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Beginner
Secondary topics:  Platform, Telecom
Travis Bakeman (T-Mobile)
Average rating: **...
(2.00, 1 rating)
Travis Bakeman shares how T-Mobile ported its large-scale network performance management platform, T-PIM, from a legacy database to a big data platform with Impala as the main reporting interface, covering the migration journey, including the challenges the team faced, how the team evaluated new technologies, lessons learned along the way, and the efficiencies gained as a result. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Architecture, IoT, Streaming
Michael Freedman (TimescaleDB | Princeton)
Average rating: ****.
(4.50, 4 ratings)
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 09 Level: Intermediate
Secondary topics:  Financial services, Logistics
Riccardo Corbella (Data Reply IT), Beniamino Del Pizzo (Data Reply IT)
Average rating: ****.
(4.00, 2 ratings)
With more than 4.5 million black boxes, Italian car insurance has the most telematics clients in the world. Riccardo Corbella and Beniamino Del Pizzo explore the data management challenges that occur in a streaming context when the amount of data to process is gigantic and share a data management model capable of providing the scalability and performance needed to support massive growth. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 10/11
Chris Neumann (500 Startups), Carla Holtze (Parrable), Bradford Cross (DCVC), Kyle Wild (Keen IO), Tasso Argyros (‎ActionIQ)
This panel brings together partners from some of the world’s leading startup accelerators and founders of up-and-coming enterprise data startups to discuss how we can help create the next generation of successful enterprise data companies. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 12/13 Level: Non-technical
Alysa Z. Hutnik (Kelley Drye & Warren LLP)
Average rating: *****
(5.00, 2 ratings)
Big data promises enormous benefits for companies. But what about privacy, data protection, and consumer laws? Having a solid understanding of the legal and self-regulatory rules of the road are key to maximizing the value of your data while avoiding data disasters. Alysa Hutnik shares legal best practices and practical tips to avoid becoming a big data “don’t.” Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 14 Level: Beginner
Matt Bolte (Walmart), Toni LeTempt (Walmart)
Average rating: ***..
(3.75, 4 ratings)
In today’s world of data breaches and hackers, security is one of the most important components for big data systems, but unfortunately, it's usually the area least planned and architected. Matt Bolte and Toni LeTempt share Walmart's authentication journey, focusing on how decisions made early can have significant impact throughout the maturation of your big data environment. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Beginner
Sebastian Gutierrez (DashingD3js.com)
Average rating: ***..
(3.33, 9 ratings)
You likely already use business metrics and analytics to achieve success in your data-driven organization. Sebastian Gutierrez demonstrates how to use the science of human perception to drastically improve your data visualizations, reports, and dashboards to drive better decisions and results. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 03
Bala Chandrasekaran (Barclays)
Average rating: ****.
(4.00, 1 rating)
Barclays and Dell EMC have partnered on the deployment of a solution called the Elastic Data Platform. Ankit Tharwani offers an overview of this platform, which gives data scientists the ability to self-serve sandbox environments, cutting down the time to provision environments from months to hours. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 06
Deepak Majeti (Vertica)
Deepak Majeti explains why the separation of compute and storage has become critical to maximizing the benefits of cloud economics. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 17
Todd Mostak (MapD)
For all of the innovation occurring across the GPU software ecosystem, the platforms themselves still remain isolated from each other—until now. Todd Mostak debuts the GPU Open Analytics Initiative’s first project, the GPU Data Frame (GDF), and explains how GDF enables efficient intra-GPU communication between different processes running on the GPUs. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 04/05
Jack Norris (MapR Technologies)
Average rating: ****.
(4.00, 1 rating)
Jack Norris shares lessons learned by leading companies leveraging data to transform customer experiences, operational results, and overall growth and details the infrastructure, development, and data management principles used by successful leaders to drive agility regardless of application volume or scale. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 01/02
Average rating: *****
(5.00, 2 ratings)
How does your favorite website serve up the perfect content just for you? It's all based on machine learning. By continuously adjusting machine learning models based on real-time data, you can visualize changes and take action on the new information in real time. Juthika Khargharia explains how to build a recommendation engine to surface these recommendations on real-time data. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Sander Pick (Set), Andrew Hill (Set), Carson Farmer (Set)
Average rating: ****.
(4.00, 1 rating)
Location-based data is full of information about our everyday lives, but GPS and WiFi signals create extremely noisy mobile location data, making it hard to extract features, especially when working with real-time data. Andrew Hill and Sander Pick explore new strategies for extracting information from location data while remaining scalable, privacy focused, and contextually aware. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Beginner
Moderated by:
Jason Grout (Bloomberg LP)
Panelists:
Jessica Forde (Jupyter)
Average rating: ****.
(4.80, 5 ratings)
With JupyterLab, users compute with multiple notebooks, editors, and consoles that work together in a tabbed layout. Jason Grout and Jessica Forde offer an overview of JupyterLab, the next generation of the Jupyter Notebook, demonstrate how to use third-party plugins to extend and customize many aspects of JupyterLab, and explain how it fits within the overall vision of Project Jupyter. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Advanced
Secondary topics:  Media, Text
Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
Average rating: ****.
(4.50, 2 ratings)
The quality of online comments is critical to the Washington Post. However, the quality management of the comment section currently requires costly manual resources. Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture, Cloud
Henry Robinson (Cloudera), Greg Rahn (Cloudera)
Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Financial services, Platform
Nandu Jayakumar (Visa), Justin Erickson (Cloudera)
Average rating: *....
(1.00, 1 rating)
At Visa, the process of optimizing the enterprise data warehouse and consolidating data marts by migrating these analytic workloads to Hadoop has played a key role in the adoption of the platform and how data has transformed Visa as an organization. Nandu Jayakumar and Justin Erickson share Visa’s journey along with some best practices for organizations migrating workloads to Hadoop. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Intermediate
Lucy Yu (MemSQL)
Average rating: **...
(2.50, 6 ratings)
Lucy Yu demonstrates how to extend the Spark SQL abstraction to support more complex pushdown, such as group by, subqueries, and joins. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Streaming
Todd Lipcon (Cloudera)
Average rating: *****
(5.00, 3 ratings)
To date, mutable big data storage has primarily been the domain of nonrelational (NoSQL) systems such as Apache HBase. However, demand for real-time analytic architectures has led big data back to a familiar friend: relationally structured data storage systems. Todd Lipcon explores the advantages of relational storage and reviews new developments, including Google Cloud Spanner and Apache Kudu. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Dustin Cote (Confluent)
Average rating: ****.
(4.00, 2 ratings)
Dustin Cote shares his experience troubleshooting Apache Kafka in production environments and explains how to avoid pitfalls like message loss or performance degradation in your environment. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 09 Level: Non-technical
Secondary topics:  Data for good, Healthcare, IoT
Julie Lockner (17 Minds Corporation)
Average rating: ****.
(4.00, 1 rating)
How can we empower individuals with special needs to reach their full potential? Julie Lockner offers an overview of a project to develop collaboration applications that use wearable device data to improve the ability to develop the best possible care and education plans. Join in to learn how real-time IoT data analytics are making this possible. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 10/11
Michael Dauber (Amplify Partners), Sarah Catanzaro (Canvas Ventures), Katherine Boyle (General Catalyst), Lisha Li (Amplify Partners), Sandeep Bhadra (Vertex Ventures)
Average rating: ***..
(3.50, 2 ratings)
In a panel discussion, top-tier VCs look over the horizon and consider the big trends in big data, explaining what they think the field will look like a few years (or more) down the road. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 12/13
Ashish Verma (Deloitte)
Average rating: ***..
(3.60, 5 ratings)
Ashish Verma explores the challenges organizations face after investing in hardware and software to power their analytics projects and the missteps that lead to inadequate data practices. Ashish explains how to course-correct and implement an insight-driven organization (IDO) framework that enables you to derive tangible value from your data faster. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 14 Level: Beginner
J. C. Herz (Ion Channel)
Average rating: *****
(5.00, 2 ratings)
Automating security for DevOps means continuous analysis of open source software dependencies, vulnerabilities, and ecosystem dynamics. But the data is confounding: a flurry of reported vulnerabilities or infrequent commits that could be good or bad, depending on a project's scope and lifecycle. JC Herz illuminates nonintuitive insights from the software supply chain. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Non-technical
Brian O'Neill (Designing for Analytics)
Average rating: ****.
(4.75, 4 ratings)
Do you spend a lot of time explaining your data analytics product to your customers? Is your UI/UX or navigation overly complex? Are sales suffering due to complexity, or worse, are customers not using your product? Your design may be the problem. Brian O'Neill shares a secret: you don't have to be a trained designer to recognize design and UX problems and start correcting them today. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 06
Ben Sharma (Zaloni), Carlos Matos (AIG)
Average rating: ****.
(4.00, 2 ratings)
Envision the next phase of your company’s data future: providing centralized data services for streamlined yet controlled access to data for end users across lines of business. Carlos Matos and Ben Sharma share strategies for developing an enterprise-wide data lake service to drive shared data insights across the organization. Are you ready? Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 01/02
Michelle Tower (Procter & Gamble)
Average rating: ****.
(4.00, 1 rating)
The early stages of delivering on your data strategies are daunting. With many claims of failed data lakes or “data swamps,” the journey seems risky, which is why you need help from industry experts to get going. Michelle Tower explains how P&G is using big data, Apache Hadoop, and visual analytics to quickly discover new insights and optimize data models for analytics and data visualization. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 04/05
Santhosh Mahendiran (Standard Chartered Bank)
Santhosh Mahendiran explains how financial services company Standard Chartered Bank is using self-service data prep and machine learning technologies to democratize its data lake, offering trusted information to analysts, subject-matter experts, and line-of-business executives across 70 countries to help monitor fraud, track money-laundering activities, and perform regulatory compliance reporting. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1E 17
Ben Szekely (Cambridge Semantics)
Average rating: *****
(5.00, 1 rating)
Only with a rich and interactive semantic layer can the data and analytics stack deliver true on-demand access to data, answers, and insights, weaving data together from across the enterprise into an information fabric. Ben Szekely shares the capabilities of the newly launched Anzo Smart Data Lake 4.0, the only end-to-end platform for semantic layers based on open standards. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 03
Kenneth Sanford (Dataiku)
Fragmented data science and analytics teams result in duplicate work, poor collaboration, a lack of governance, insufficient adoption at scale, and significant key-man risk. Kenneth Sanford explains how to overcome these challenges and build a centralized analytics practice that empowers data-driven decision making. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Beginner
Secondary topics:  Data for good, ecommerce, Healthcare
Average rating: ****.
(4.67, 3 ratings)
Zocdoc is an online marketplace that allows easy doctor discovery and instant online booking. However, dealing with healthcare involves many constraints and challenges that render standard approaches to common problems infeasible. Brian Dalessandro surveys the various machine learning problems Zocdoc has faced and shares the data, legal, and ethical constraints that shape its solution space. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Pydata
Matthew Rocklin (Anaconda)
Average rating: ****.
(4.67, 3 ratings)
Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. Matthew Rocklin discusses the architecture and current applications of Dask used in the wild and explores computational task scheduling and parallel computing within Python generally. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Joshua Patterson (NVIDIA), Michael Balint (NVIDIA), Satish Varma Dandu (NVIDIA)
Average rating: ****.
(4.00, 1 rating)
How can deep learning be employed to create a system that monitors network traffic, operations data, and system logs to reliably flag risk and unearth potential threats? Satish Dandu, Joshua Patterson, and Michael Balint explain how to bootstrap a deep learning framework to detect risk and threats in operational production systems, using best-of-breed GPU-accelerated open source tools. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Roy Ben-Alta (Amazon Web Services), Allan MacInnis (Amazon Web Services)
Average rating: ****.
(4.33, 3 ratings)
Speed matters. Today, decisions are made based on real-time insights, but in order to support the substantial growth of streaming data, companies are required to innovate. Roy Ben-Alta and Allan MacInnis explore AWS solutions powered by machine learning and artificial intelligence. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Financial services
Tobi Bosede (Johns Hopkins)
Whether an entity seeks to create trading algorithms or mitigate risk, predicting trade volume is an important task. Focusing on futures trading that relies on Apache Spark for processing the large amount data, Tobi Bosede considers the use of penalized regression splines for trade volume prediction and the relationship between price volatility and trade volume. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Intermediate
Holden Karau (Google), Seth Hendrickson (Cloudera)
Average rating: *****
(5.00, 1 rating)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau and Seth Hendrickson introduce Spark’s ML pipelines and explain how to extend them with your own custom algorithms. Even if you don't have your own algorithm to add, you'll leave with a deeper understanding of Spark's ML pipelines. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Beginner
Secondary topics:  Platform, Sales
Simon Chan (Salesforce)
Average rating: *****
(5.00, 1 rating)
Salesforce recently released Einstein, which brings AI into its core platform to power every business. The secret behind Einstein is an underlying platform that accelerates AI development at scale for both internal and external data scientists. Simon Chan shares his experience building this unified platform for a multitenancy, multibusiness cloud enterprise. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Jun Rao (Confluent)
Average rating: *****
(5.00, 3 ratings)
Over the last few years, streaming platform Apache Kafka has been used extensively for real-time data collecting, delivering, and processing—particularly in the enterprise. Jun Rao leads a deep dive into some of the key internals that help make Kafka popular and provide strong reliability guarantees. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 09 Level: Beginner
Marc Carlson (Seattle Children's Research Institute), Sean Taylor (Seattle Children's Research Institute)
Average rating: *****
(5.00, 1 rating)
Marc Carlson and Sean Taylor offer an overview of Project Rainier, which leverages the power of HDFS and the Hadoop and Spark ecosystem to help scientists at Seattle Children’s Research Institute quickly find new patterns and generate predictions that they can test later, accelerating important pediatric research and increasing scientific collaboration by highlighting where it is needed most. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 10/11 Level: Beginner
Secondary topics:  AI, Marketing
Elsie Kenyon (Nara Logics)
Average rating: **...
(2.67, 3 ratings)
Enterprises today pursue AI applications to replace logic-based expert systems in order to learn from customer and operational signals. But training data is often limited or nonexistent, and applying or extrapolating the wrong dataset can be costly to a company's business and reputation. Elsie Kenyon explains how to harness institutional human knowledge to augment data in deployed AI solutions. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 12/13
Andy Mauro (Automat)
Average rating: ****.
(4.50, 2 ratings)
Andy Mauro explains why the last 15 years of digital marketing was really about monitoring customers and how recent advancements in artificial intelligence and the dominance of messaging as the primary consumer channel provide an opportunity to achieve every marketer's dream of simply talking to customers—providing a personalized experience that drives engagement, brand loyalty, and conversions. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 14 Level: Non-technical
Secondary topics:  Financial services
Nick Curcuru (Mastercard)
Average rating: *****
(5.00, 1 rating)
Cybersecurity is now a topic in the boardroom, as organizations are scrambling to increase their security posture. To decrease breach threats, Mastercard brings data security into its system design process. Nick Curcuru shares best practices and lessons learned protecting 160 million transactions per hour over Mastercard's network and securing 16+ petabytes of data at rest. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Beginner
Secondary topics:  Financial services
Julie Rodriguez (Eagle Investment Systems)
Average rating: ****.
(4.00, 3 ratings)
While the value of data and its role in informing decisions and communications is well known, its meaning can be incorrectly interpreted without data visualizations that provide context and accurate representation of the underlying numbers. Julie Rodriguez shares new approaches and visual design methods that provide a greater perspective of the data. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 04/05
Murthy Mathiprakasam (Informatica), Sravan Kasarla (Fidelity Investments)
In the face of regulatory and competitive pressures, why not use artificial intelligence, along with smart best practices, to manage data lakes? Murthy Mathiprakasam shares a comprehensive approach to data lake management that ensures that you can quickly and flexibly ingest, cleanse, master, govern, secure, and deliver all types of data in the cloud or on-premises. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 17
Luke Han (Kyligence)
Luke Han offers an overview of Apache Kylin and its enterprise version KAP and shares a case study of how a top finance company migrated to Apache Kylin on top of Hadoop from its legacy Cognos and DB2 system. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 01/02
Ells Campbell (CDC), Connor Carreras (Trifacta), Ryan Weil (Leidos)
Average rating: *****
(5.00, 1 rating)
Ells Campbell, Connor Carreras, and Ryan Weil explain how the Microbial Transmission Network Team (MTNT) at the Centers for Disease Control (CDC) is leveraging new techniques in data collection, preparation, and visualization to advance the understanding of the spread of HIV/AIDS. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 03
Chuck Yarbrough (Pentaho)
Average rating: **...
(2.00, 2 ratings)
The IoT can deliver real outcomes that can transform organizations—and societies—for the better. But the IoT is not transformative without the power of big data. Chuck Yarbrough shares examples of where the IoT and big data have combined to solve significant business challenges and take advantage of business opportunities. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 06
Average rating: **...
(2.00, 2 ratings)
Evolving big data architectures are creating an increasingly complex landscape. Michelle Mensing explains how to simplify data orchestration across various big data and enterprise sources, demonstrating how to create a complex pipeline and execute the pipeline in Kubernetes clusters, covering data acquisition, transformation, cleaning data, and running the algorithms. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Patrick Hall (H2O.ai | George Washington University), Sri Satish (H2O.ai)
Average rating: *****
(5.00, 1 rating)
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. People who use these technologies have the right to trust and understand AI. Patrick Hall and Sri Satish share techniques for interpreting deep learning and machine learning models and telling stories from their results. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Pydata
Shoumik Palkar (Stanford University), Matei Zaharia (Stanford University)
Average rating: *****
(5.00, 2 ratings)
Modern data applications combine functions from many optimized libraries (e.g., pandas and TensorFlow) and yet do not achieve peak hardware performance due to data movement across functions. Shoumik Palkar and Matei Zaharia offer an overview of Weld, a new interface to implement functions in these libraries while enabling optimizations across them. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Financial services, Platform
Nadeem Gulzar (Danske Bank Group), Sune Askjær (Think Big Analytics, a Teradata Company)
Average rating: *****
(5.00, 3 ratings)
Fraud in banking is an arms race, and criminals are now using machine learning to improve their attack effectiveness. Sune Askjaer and Nadeem Gulzar explore how Danske Bank uses deep learning for better fraud detection, covering model effectiveness, TensorFlow versus boosted decision trees, operational considerations in training and deploying models, and lessons learned along the way. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture, Streaming
Paul Curtis (MapR Technologies)
Average rating: ****.
(4.67, 3 ratings)
A microservices architecture benefits from the agility of containers for convenient, predictable deployment of applications, while persistent, performant message streaming makes both work better. Paul Curtis explores these infrastructure components and discusses the design of highly scalable real-world systems that take advantage of this powerful triad. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 18 Level: Intermediate
Brendan Aldrich (Ivy Tech Community College ), Lige Hensley (Ivy Tech Community College )
As the largest community college in the US, Ivy Tech ingests over 100M rows of data a day. Brendan Aldrich and Lige Hensley explain how Ivy Tech is applying predictive technologies to establish a true data democracy—a self-service data analytics environment empowering thousands of users each day to improve operations, achieve strategic goals, and support student success. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Beginner
Average rating: ****.
(4.00, 2 ratings)
Apache Kudu is a new, innovative distributed storage that combines low-latency data ingestion, scalable analytics, and fast data lookups. But what does it deliver in practice? Zbigniew Baranowski explains how to use Apache Kudu for scale-out database-like systems, such as those used at CERN, covering the advantages and limitations and measuring performance. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Advanced
Secondary topics:  Architecture, Media, Platform
Barbara Eckman (Comcast)
Average rating: ***..
(3.00, 2 ratings)
Barbara Eckman offers an overview of Comcast’s streaming data platform, comprised of a variety of ingest, transformation, and storage services, which uses Apache Avro schemas to support end-to-end data governance, Apache Atlas for data discovery and lineage, and custom asynchronous messaging libraries to notify Atlas of new data and schema entities and lineage links as they are created. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Fabian Hueske (data Artisans)
Average rating: ****.
(4.00, 1 rating)
Although the most widely used language for data analysis, SQL is only slowly being adopted by open source stream processors. One reason is that SQL's semantics and syntax were not designed with streaming data in mind. Fabian Hueske explores Apache Flink's two relational APIs for streaming analytics—standard SQL and the LINQ-style Table API—discussing their semantics and showcasing their usage. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 09 Level: Intermediate
Secondary topics:  Architecture, Platform, Streaming
Stephen Devine (Big Fish Games), Kalah Brown (Big Fish Games)
Companies are increasingly interested in processing and analyzing live-streaming data. The Hadoop ecosystem includes platforms and software library frameworks to support this work, but these components require correct architecture, performance tuning, and customization. Stephen Devine and Kalah Brown explain how they used Spark, Flume, and Kafka to build a live-streaming data pipeline. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 10/11 Level: Intermediate
Secondary topics:  Healthcare
Charles Boicey (Clearsense)
Charles Boicey explains how Clearsense uses Spark Streaming to provide real-time updates to healthcare providers for critical healthcare needs, helping clinicians make timely decisions from the assessment of a patient's risk based on information gathered from streaming physiological monitoring along with streaming diagnostic data and the patient historical record. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 12/13
Edd Wilder-James (Google)
Average rating: ****.
(4.00, 2 ratings)
Edd Wilder-James outlines a road map for executives who are beginning to consider their strategies for implementing artificial intelligence in their critical processes. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 14 Level: Non-technical
Behrooz Hashemian (Massachusetts Institute of Technology)
People are leaving an increasing amount of digital traces in their everyday life. Since these traces are mostly anonymized, the information gained by advanced data analytics is limited to each individual trace. Behrooz Hashemian explains how to fuse various traces and build multidimensional insight by taking advantage of patterns in people's behavior. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Beginner
Secondary topics:  Text
Richard Brath (Uncharted Software), Scott Langevin (Uncharted Software)
Average rating: ****.
(4.50, 2 ratings)
Text analytics are advancing rapidly, and new visualization techniques for text are providing new capabilities. Richard Brath and Scott Langevin offer an overview of these new ways to organize massive volumes of text, characterize subjects, score synopses, and skim through lots of documents. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 04/05
Jonathan Gray (Cask)
Average rating: **...
(2.50, 2 ratings)
To take advantage of the latest big data technology options in the cloud, more and more enterprises are building hybrid, self-service data lakes. Jonathan Gray discusses the importance of a portability strategy, addresses implementation challenges, and shares customer use cases that will inspire enterprises to embark on a multi-environment data lake journey. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 17
George Corugedo (RedPoint Global)
Driving digital transformation is a vital component of continued organizational success and more personalized customer engagement. The best results will come from operationalizing data to automate decisions with machine learning. George Corugedo explains how RedPoint’s customers use connected enterprise data, machine learning, and analytics to impact their businesses. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 06
Peter Wang (Anaconda)
Average rating: ****.
(4.00, 1 rating)
Peter Wang explores the typical problems data science teams experience when working with other teams and explains how these issues can be overcome through cohesive collaborative efforts among data scientists, business analysts, IT teams, and more. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 01/02
Phil Sewell (Micro Focus)
Phil Sewell discusses standards, options, and use cases for extracting value and delivering business outcomes from data protected at the data level. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 03
Mate' Radalj (Kinetica)
Infusing business apps with AI isn’t easy. Mate Radalj explains why you need to master the entire AI process from data to models to operationalization so you can build, train, and deploy predictive models that unleash smart business apps and enable data-driven decisions.   Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
David Talby (Pacific AI)
Average rating: *****
(5.00, 2 ratings)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Advanced
Secondary topics:  Media
Seth Hendrickson (Cloudera), DB Tsai (Netflix)
Average rating: *****
(5.00, 1 rating)
Recent developments in Spark MLlib have given users the power to express a wider class of ML models and decrease model training times via the use of custom parameter optimization algorithms. Seth Hendrickson and DB Tsai explain when and how to use this new API and walk you through creating your own Spark ML optimizer. Along the way, they also share performance benefits and real-world use cases. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Cloud, Deep learning
Leo Dirac (Amazon Web Services)
Average rating: *****
(5.00, 6 ratings)
Leo Dirac demonstrates how to apply the latest deep learning techniques to semantically understand images. You'll learn what embeddings are, how to extract them from your images using deep convolutional neural networks (CNNs), and how they can be used to cluster and classify large datasets of images. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud, Media, Platform
Josh Baer (Spotify), Alison Gilles (Spotify)
Average rating: ****.
(4.00, 1 rating)
In early 2016, Spotify decided that it didn’t want to be in the data center business. The future was the cloud. Josh Baer and Alison Gilles explain what it took to move Spotify to the cloud, covering Spotify's technology choices, challenges faced, and the lessons Spotify learned along the way. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 18 Level: Advanced
Jerrard Gaertner (University of Toronto School of Continuing Studies)
Average rating: *****
(5.00, 1 rating)
Engaging, teaching, mentoring, and advising mature, mostly employed, often enthusiastic and ambitious adult learners at University of Toronto has taught Jerrard Gaertner more about analytics in the real world than he ever imagined. Jerrard shares stories he learned about everything from hyped-up expectations and internal sabotage to organizational streamlining and creating transformative insight. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Advanced
Adrian Popescu (Unravel Data Systems), Shivnath Babu (Unravel Data Systems)
A roadblock in the agility that comes with Spark is that application developers can get stuck with application failures and have a tough time finding and resolving the issue. Adrian Popescu and Shivnath Babu explain how to use the root cause diagnosis algorithm and methodology to solve failure problems with ML and AI apps in Spark. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Intermediate
Ihab Ilyas (University of Waterloo | Tamr)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Beginner
Secondary topics:  Financial services, Media, Streaming
Karthik Ramasamy (Streamlio), Supun Kamburugamuve (Indiana University)
Modern enterprises are data driven and want to move at light speed. To achieve real-time performance, financial applications use streaming infrastructures for low latency and high throughput. Twitter Heron is an open source streaming engine with low latency around 14 ms. Karthik Ramasamy and Supun Kamburugamuvee explain how they ported Heron to Infiniband to achieve latencies as low as 7 ms. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 09 Level: Intermediate
Secondary topics:  Architecture, IoT
Dave Shuman (Cloudera), James Kirkland (Red Hat)
Eclipse IoT is an ecosystem of organizations that are working together to establish an IoT architecture based on open source technologies and standards. Dave Shuman and James Kirkland showcase an end-to-end architecture for the IoT based on open source standards, highlighting Eclipse Kura, an open source stack for gateways and the edge, and Eclipse Kapua, an open source IoT cloud platform. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 10/11 Level: Non-technical
Secondary topics:  Marketing, Retail
Moderated by:
Hilary Milnes (Glossy)
Panelists:
Karen Moon (Trendalytics), Jared Schiffman (Perch Interactive), Eric Colson (Stitch Fix), Catherine Twist (Xcel Brands (Isaac Mizrahi, C. Wonder, Halston, Judith Ripka))
Average rating: *....
(1.00, 1 rating)
Karen Moon, Jared Schiffman, Eric Colson, and Catherine Twist explore how the retail industry is embracing data to include consumers in the design and development process, tackling the challenges associated with the wealth of sources and the unstructured nature of the data they handle and process and how the data is turned into insights that are digestible and actionable. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 12/13
Jason McIntyre (Accenture), Mark Milazzo (Accenture)
Whether you are a technology or a services provider, understanding your value in the ecosystem and focusing on the right partners to reach your market goals is critical. Jason McIntyre and Mark Milazzo share examples of teaming models and leading practices for accelerating value from your ecosystem strategy. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 14 Level: Beginner
Sean Kandel (Trifacta), Kaushal Gandhi (Trifacta)
Sean Kandel and Kaushal Gandhi share best practices for building and deploying Hadoop applications to support large-scale data exploration and analysis across an organization. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Beginner
Secondary topics:  Financial services
John Horcher (Virtual Cove)
Average rating: *****
(5.00, 1 rating)
Immersive reality enables powerful new information design concepts. Most importantly, the new technology enables the telling of powerful stories using more insightful thinking. John Horcher explores how immersive reality deployments in financial markets have enabled quicker time to insight and therefore better decision making. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 01/02
Rick Okin (JW Player)
Rick Okin explains how JW Player strategically leverages video data analytics to power industry- and customer-level insights for the evolving online video space. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 03
Tim McKenzie (Pitney Bowes)
Organizations need to have a data strategy that includes the tools to derive location intelligence, enhance existing data with geographic enrichment (geoenrichment), and perform location analytics to reveal strategic and operational insights. Tim McKenzie shares new data quality and location intelligence approaches that operate natively within Hadoop and Spark environments. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 04/05
Kevin Stallings provides an inside look at how AIG executed a technological and cultural transformation that had a powerful impact on business outcomes and bottom-line results and explains how to use these lessons to put enterprise-wide big data preparation and self-service analysis to great use within your organization and dramatically increase customer satisfaction and engagement. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 06
Ben Snively (Amazon Web Services (AWS))
Average rating: ***..
(3.00, 3 ratings)
How do you incorporate serverless concepts and technologies into your big data architectures? Ben Snively shares use cases, best practices, and a reference architecture to help you streamline data processing and improve analytics through a combination of cloud and open source serverless technologies. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1E 17
Piet Loubser (Hortonworks)
Data has become the new fuel for business success. As a result, business intelligence and analytics are among the top priorities for CIOs today. Piet Loubser outlines the tectonic shift currently taking place in the market and explains why next-gen connected architectures are crucial to meet the demands of an intelligent, connected world. Read more.

Thursday, September 28

Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Text
Paco Nathan (O'Reilly Media)
Average rating: *****
(5.00, 3 ratings)
Paco Nathan demonstrates how to use PyTextRank—an open source Python implementation of TextRank that builds atop spaCy, datasketch, NetworkX, and other popular libraries to prepare raw text for AI applications in media and learning—to move beyond outdated techniques such as stemming, n-grams, or bag-of-words while performing advanced NLP on single-server solutions. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Eduardo Arino de la Rubia (Domino Data Lab)
Average rating: *****
(5.00, 5 ratings)
The promise of the automated statistician is as old as statistics itself. Eduardo Arino de la Rubia explores the tools created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation. Along the way, Eduardo compares open source tools such as TPOT and auto-sklearn and discusses their place in the DS workflow. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  ecommerce, Streaming
Average rating: *****
(5.00, 1 rating)
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud
Chris Mills (The Meet Group)
if(we)'s batch event processing pipeline is different from yours, but the process of migrating it from running in a data center to running in AWS is likely pretty similar. Chris Mills explains what was easier than expected, what was harder, and what the company wished it had known before starting the migration. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Cloud
Stephen Wu (Microsoft)
Average rating: ****.
(4.00, 1 rating)
Remote storage in the cloud provides an infinitely scalable, cost-effective, and performant solution for big data customers. Adoption is rapid due to the flexibility and cost savings associated with unlimited storage capacity when separating compute and storage. Stephen Wu demonstrates how to correctly performance tune your workloads when your data is stored in remote storage in the cloud. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Beginner
Evan Levy (SAS)
Average rating: *****
(5.00, 5 ratings)
While it's clear organizations need to have a comprehensive data strategy, few have actually developed a plan to improve the access, sharing, and usage of data. Evan Levy discusses the five essential components that make up a data strategy and explores the individual attributes of each. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 23/24 Level: Beginner
Secondary topics:  Architecture, Cloud, Streaming
Gwen Shapira (Confluent)
Average rating: ****.
(4.50, 2 ratings)
Gwen Shapira explains how the three realities of modern programming—the explosion of data and data systems, building business processes as microservices instead of monolithic applications, and the rise of the public cloud—affect how developers and companies operate today and why companies across all industries are turning to streaming data and Apache Kafka for mission-critical applications. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Beginner
Secondary topics:  Streaming
Reuven Lax (Google)
Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. Reuven Lax offers an overview of Beam basic concepts and demonstrates that portability in action. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 09 Level: Intermediate
Secondary topics:  Architecture, IoT, Streaming
Michael Crutcher (Cloudera), Ryan Lippert (Cloudera)
A long time ago in a data center far, far away, we deployed complex lambda architectures as the backbone of our IoT solutions. Though hard, they enabled collection of real-time sensor data and slightly delayed analytics. Michael Crutcher and Ryan Lippert explain why Apache Kudu, a relational storage layer for fast analytics on fast data, is the key to unlocking the value in IoT data. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 10/11 Level: Beginner
David Boyle (MasterClass)
Too many brilliant analytical minds are wasted on interesting but ultimately less-impactful problems. They are stuck in the weeds of the data or the challenges of our day to day. Too few ask what it means to reach for the stars—the big, shiny, business-changing issues. David Boyle explains why you must start asking bigger questions and making a bigger difference. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 12/13
Hilary Mason (Fast Forward Labs)
Average rating: *****
(5.00, 3 ratings)
Progress in machine learning has led us to believe we might soon be able to build machines that talk to us using the same interfaces that we use to talk to each other: natural language. But how close are we? Hilary Mason explores the current state of natural language technologies and some applications where this technology is thriving today and imagines what we might build in the next few years. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 15/16 Level: Beginner
Secondary topics:  Data for good, Smart cities
Daniel Goddemeyer (OFFC NYC), Dominikus Baur (Freelance)
Increasing access to our personal data raises profound moral and ethical questions. Daniel Goddemeyer and Dominikus Baur share the findings from Data Futures, an MFA class in which students observed each other through their own data, and demonstrate the results with a live experiment with the audience that showcases some of the effects when personal data becomes accessible. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 14
Secondary topics:  Streaming
Dean Wampler (Lightbend), Jun Rao (Confluent), Karthik Ramasamy (Streamlio), Pramod Immaneni (DataTorrent)
Average rating: ***..
(3.00, 1 rating)
In a series of three 11-minute presentations, key members of Apache Kafka, Heron, and Apache Apex discuss their respective implementations of exactly once delivery and semantics. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 03
Jim McHugh (NVIDIA), Todd Mostak (MapD), Srisatish Ambati (0xdata Inc), Stanley Seibert (Anaconda)
Average rating: ***..
(3.50, 2 ratings)
Joining Jim McHugh are founders of GOAI: - Todd Mostak, CEO of MapD - SriSatish Ambati, CEO and co-founder of H2O - Stan Seibert, Director of Community Innovation, Anaconda In this session, the speakers will provide an update on the latest advancement and customer use cases leveraging GOAI Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 01/02
Matt Winkler (Microsoft)
Matt Winkler shares real-world case studies on how healthcare, agriculture, and manufacturing companies are creating, training, deploying, and managing AI models faster with Microsoft Azure and deploying them to the cloud, on-premises, and to the edge. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 04/05
Chad W. Jennings (Google), Eric Schmidt (Google)
Average rating: *****
(5.00, 2 ratings)
Doing “algebra” with emotions can lead to new insights about customer behavior. Chad Jennings presents a serverless big data analytics platform that allows you to capture and analyze raw data and train machine learning models that can process text to discern not just the sentiment but also the underlying emotion driving that sentiment. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 06
Ivan Jibaja (Pure Storage)
Ivan Jibaja explains offers an overview of Pure Storage's streaming big data analytics pipeline, which uses open source technologies like Spark and Kafka to process over 30 billion events per day and provide real-time feedback in under five seconds. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 17
Alex Gutow (Cloudera), David Harsh (Microstrategy)
Alex Gutow discusses the importance of adaptive analytics and shares everything you need to know while transitioning from legacy data warehouses to Hadoop-based platforms. Join in to find out why you need modern platforms to move, host, and analyze your data with MicroStrategy and Cloudera. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Architecture, Financial services
Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
Average rating: *****
(5.00, 2 ratings)
Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Cloud, R
Edgar Ruiz (RStudio)
Average rating: ****.
(4.00, 1 rating)
With R and sparklyr, a Spark standalone cluster can be used to analyze large datasets found in S3 buckets. Edgar Ruiz walks you through setting up a Spark standalone cluster using EC2 and offers an overview of S3 bucket folder and file setup, connecting R to Spark, the settings needed to read S3 data into Spark, and a data import and wrangle approach. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Beginner
Secondary topics:  AI
Average rating: **...
(2.75, 4 ratings)
Businesses have spent decades trying to make better decisions by collecting and analyzing structured data. New AI technologies are beginning to transform this process. Richard Tibbetts explores AI that guides business analysts to ask statistically sensible questions and lets junior data scientists answer questions in minutes that previously took trained statisticians hours. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud
Bill Havanki (Cloudera)
Speed and reliability in deploying big data clusters is key for effectiveness in the cloud. Drawing on ideas from his book Moving Hadoop to the Cloud, which covers essential practices like baking images and automating cluster configuration, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Media
Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)
Average rating: *****
(5.00, 1 rating)
Michael Li and Chi-Yi Kuan offer an overview of the EOI (enable-optimize-innovate) framework for big data analytics and explain how to leverage this framework to drive and grow business in key corporate functions, such as product, marketing, and sales. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Data for good, Media, Platform
Andrew Otto (Wikimedia Foundation), Fangjin Yang (Imply)
The Wikimedia Foundation (WMF) is a nonprofit charitable organization. As the parent company of Wikipedia, one of the most visited websites in the world, WMF faces many unique challenges around its ecosystem of editors, readers, and content. Andrew Otto and Fangjin Yang explain how the WMF does analytics and offer an overview of the technology it uses to do so. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 23/24 Level: Intermediate
Tony McAllister (Be the Match (National Marrow Donor Program))
The National Marrow Donor Program (Be the Match) recently moved its core transplant matching platform onto Cloudera Hadoop. Tony McAllister explains why the program chose Cloudera Hadoop and shares its big data goals: to increase the number of donors and matches, make the process more efficient, and make transplants more effective. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Tyler Akidau (Google)
Average rating: ****.
(4.40, 5 ratings)
What does it mean to execute streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing? And how does all of this relate to the programmatic frameworks we’re all familiar with? Tyler Akidau answers these questions and more as he walks you through key concepts underpinning data processing in general. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 09 Level: Beginner
Secondary topics:  Architecture, Streaming
Matteo Merli (Streamlio), Sijie Guo (Streamlio)
Average rating: *****
(5.00, 2 ratings)
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Matteo Merli and Sijie Guo offer an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 10/11 Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Kurt Brown (Netflix)
Average rating: ****.
(4.40, 5 ratings)
Kurt Brown explains how to get the most out of your data infrastructure with 20 principles and practices used at Netflix. Kurt covers each in detail and explores how they relate to the technologies used at Netflix, including S3, Spark, Presto, Druid, R, Python, and Jupyter. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 12/13
Mike Olson (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
Mike Olson shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 15/16 Level: Intermediate
Steven Ross (Cloudera), Mark Donsky (Cloudera)
Average rating: ****.
(4.00, 3 ratings)
In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Steven Ross and Mark Donsky outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 03
Jagane Sundar (WANdisco), Pranav Rastogi (Microsoft)
Jagane Sundar and Pranav Rastogi explain how to meet your enterprise SLAs while making full use of resources with patented active data replication technology—something computer science still says is impossible. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 04/05
Average rating: *....
(1.00, 1 rating)
When analytics applications become business critical, balancing cost with SLAs for performance, backup, dev, test, and recovery is difficult. Karthikeyan Nagalingam discusses big data architectural challenges and how to address them and explains how to create a cost-optimized solution for the rapid deployment of business-critical applications that meet corporate SLAs today and into the future. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 01/02
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Carlo Appugliese examines the impact these trends and changes will have on the future of data science and how machine learning is making data science available to all. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 17
NISHA TALAGALA (ParallelM)
Deploying ML in production is challenging. Nisha Talagala shares solutions and techniques for effectively managing machine learning and deep learning in production with popular analytic engines such as Apache Spark, TensorFlow, and Apache Flink. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 06
Ramesh Menon (Infoworks)
Enterprises want to implement analytics use cases at the speed of business yet spend more time on complicated data management than on creating business value. The solution is automation. Ramesh Menon explains how a large enterprise automated data ingestion, data synchronization, and the building of data models and cubes to create a big data warehouse for the rapid deployment of analytics. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Ted Dunning (MapR Technologies)
Average rating: ****.
(4.50, 2 ratings)
Ted Dunning offers an overview of tensor computing—covering, in practical terms, the high-level principles behind tensor computing systems—and explains how it can be put to good use in a variety of settings beyond training deep neural networks (the most common use case). Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Viral Shah (Julia Computing), Stefan Karpinski (The Julia Language)
Spark is a fast and general engine for large-scale data. Julia is a fast and general engine for large-scale compute. Viral Shah and Stefan Karpinski explain how combining Julia's compute and Spark's data processing capabilities makes amazing things possible. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Beginner
Secondary topics:  Deep learning, Platform
Average rating: ***..
(3.00, 1 rating)
Bargava Subramanian and Harjinder Mistry explain how machine learning and deep learning techniques are helping Red Hat build smart developer tools to make software developers become more efficient. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Beginner
Secondary topics:  Cloud
Michael McCune (Red Hat)
Average rating: *****
(5.00, 2 ratings)
Notebook interfaces like Apache Zeppelin and Project Jupyter are excellent starting points for sketching out ideas and exploring data-driven algorithms, but where does the process lead after the notebook work has been completed? Michael McCune offers some answers as they relate to cloud-native platforms. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 18 Level: Intermediate
Francesca Lazzeri (Microsoft), Hong Lu (Microsoft)
Average rating: *****
(5.00, 1 rating)
New machine learning technologies allow companies to apply better staffing strategies by taking advantage of historical data. Francesca Lazzeri and Hong Lu share a workforce placement recommendation solution that recommends staff with the best professional profile for new projects. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Intermediate
Sneha Rao (Spotify), Joel Östlund (Spotify)
Spotify makes data-driven product decisions. As the company grows, the magnitude and complexity of the data it cares for the most is rapid increasing. Sneha Rao and Joel Östlund walk you through how Spotify stores and exposes audience data created from multiple internal producers within Spotify. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 23/24 Level: Advanced
Julien Le Dem (Apache Parquet)
Average rating: ****.
(4.75, 4 ratings)
Julien Le Dem explains how Parquet is improving at the storage level, with metadata and statistics that will facilitate more optimizations in query engines in the future, how the new vectorized reader from Parquet to Arrow enables much faster reads by removing abstractions, and how standard Arrow-based APIs are paving the way to breaking the silos of big data. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Intermediate
Gwen Shapira (Confluent)
Average rating: ***..
(3.33, 3 ratings)
There are many good reasons to run more than one Kafka cluster…and a few bad reasons too. Great architectures are driven by use cases, and multicluster deployments are no exception. Gwen Shapira offers an overview of several use cases, including real-time analytics and payment processing, that may require multicluster solutions, so you can better choose the right architecture for your needs. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 09 Level: Non-technical
Secondary topics:  ecommerce, Geospatial, IoT, Logistics, Platform, Retail
Javier Esplugas (DHL Supply Chain), Kevin Parent (Conduce)
DHL has created an IoT initiative for its supply chain warehouse operations. Javier Esplugas and Kevin Parent explain how DHL has gained unprecedented insight—from the most comprehensive global view across all locations to a unique data feed from a single sensor—to see, understand, and act on everything that occurs in its warehouses with immersive operational data visualization. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 10/11 Level: Non-technical
Sander Kieft (Sanoma Media)
Average rating: *****
(5.00, 1 rating)
Sanoma has been running big data as a self-service platform for over five years, mainly as a service for business analysts to work directly on the source data. The road to getting business analysts to directly do their analyses on Hadoop was far from smooth. Sander Kieft explores Sanoma's journey and shares some lessons learned along the way. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 12/13
Carme Artigas (Synergic Partners)
Average rating: ****.
(4.80, 5 ratings)
Big data technology is mature, but its adoption by business is slow, due in part to challenges like a lack of resources and the need for a cultural change. Carme Artigas explains why an analytics center of excellence (ACoE), whether internal or outsourced, is an effective way to accelerate adoption and shares an approach to implementing an ACoE. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 15/16 Level: Intermediate
Majken Sander (TimeXtender)
Average rating: ***..
(3.00, 3 ratings)
Personal data is increasingly spread across various services globally. But what do companies know about us? And how do we collect that knowledge, get ahold of our own data, and maybe even correct faulty perceptions by putting the right answers out there as a service? Majken Sander explains why we desperately need a personal Discovery Hub: a go-to place for knowledge about ourselves. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 06
Basil Faruqui (BMC Software), Jon Ouimet (BMC Software)
Are you building, running, or managing complex data pipelines across hybrid environments spanning multiple applications and data sources? Doing this successfully requires automating dataflows across the entire pipeline, ideally controlled through a single source. Basil Faruqui and Jon Ouimet walk you through a customer journey to automate data pipelines across a hybrid environment. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 17
Bob Patterson (Hewlett Packard Enterprise (HPE))
Bob Patterson offers an overview of Hewlett Packard Enterprise's enterprise-grade Hadoop solution, which has everything you need to accelerate your big data journey: innovative hardware architectures for diverse workloads certified for all leading distros, infrastructure software, services from HPE and partners, and add-ons like object storage. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 03
Keith Kohl (Syncsort)
If users get conflicting analytics results, wild predictions, and crazy reports from the data in your data lake, they will lose trust. From the beginning of your data lake project, you need to build in solid business rules, data quality checking, and enhancement. Keith Kohl shares an actionable checklist that shows everyone in your enterprise that your big data can be trusted. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 04/05
Much is being written about the economy of everything, but where does the analytics economy fit in? Fiona McNeill shares SAS's vision and roadmap for meeting the unique challenges of the analytics economy, including thoughts on intersections with related technologies like machine learning, deep learning, cognitive computing, and more. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 01/02
John Morrell (Datameer)
Average rating: *****
(5.00, 1 rating)
While companies have flooded data lakes with billions of records, the technical limitations of Hadoop have kept analysts from interactively exploring this data and delivering real value—until now. John Morrell explores a solution helping analysts interactively and rapidly explore billions of records in Hadoop, offering a truly interactive experience and ushering in the era of Data Lake 2.0. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Text
Michelle Casbon (Qordoba)
Average rating: ****.
(4.00, 4 ratings)
Michelle Casbon explores the machine learning and natural language processing that enables teams to build products that feel native to every user and explains how Qordoba is tackling the underserved domain of localization using open source tools, including Kubernetes, Docker, Scala, Apache Spark, Apache Cassandra, and Apache PredictionIO (incubating). Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Deep learning
Mike Pittaro (Dell EMC)
The advances we see in machine learning would be impossible without hardware improvements, but building a high-performance hardware platform is tricky. It involves hardware choices, an understanding of software frameworks and algorithms, and how they interact. Mike Pittaro shares the secrets of matching the right hardware and tools to the right algorithms for optimal performance. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Streaming
Josh Patterson (Skymind), Kirit Basu (StreamSets )
Enterprises building data lakes often have to deal with very large volumes of image data that they have collected over the years. Josh Patterson and Kirit Basu explain how some of the most sophisticated big data deployments are using convolutional neural nets to automatically classify images and add rich context about the content of the image, in real time, while ingesting data at scale. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture
Jennifer Wu (Cloudera), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads. Jennifer Wu, Philip Langdale, and Kostas Sakellis explore the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 18 Level: Non-technical
Secondary topics:  Cloud, Financial services
Moderated by:
Steven Totman (Cloudera)
Panelists:
Siew Choo Soh (DBS Bank), Meena Ram (CIBC), David Leach (Qrious)
Big data and the cloud have spread around the world, and Singapore, New Zealand, Australia, and Canada are already seeing dramatic investments and returns. In a panel moderated by Steve Totman, senior executives from a variety of leading companies, including DBS, CIBC, and Qrious, share use cases, challenges, and how to be successful. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Advanced
Kimoon Kim (Pepperdata)
There is growing interest in running Spark natively on Kubernetes. Spark applications often access data in HDFS, and Spark supports HDFS locality by scheduling tasks on nodes that have the task input data on their local disks. Kimoon Kim demonstrates how to run HDFS inside Kubernetes to speed up Spark. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Felix GV (LinkedIn), Yan Yan (LinkedIn)
Average rating: **...
(2.00, 1 rating)
Companies with batch and stream processing pipelines need to serve the insights they glean back to their users, an often-overlooked problem that can be hard to achieve reliably and at scale. Felix GV and Yan Yan offer an overview of Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Tim Berglund (Confluent)
Average rating: **...
(2.50, 2 ratings)
Tim Berglund offers a thorough introduction to the Streams API, an important recent addition to Kafka that lets us build sophisticated stream processing systems that are as scalable and fault tolerant as Kafka itself—and also happen to align quite well with the microservices sensibilities that are so common in contemporary architectural thinking. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1E 09 Level: Beginner
Secondary topics:  IoT
Alexandra Gunderson (Arundo Analytics)
One of the main challenges when working with industrial data is linking the large amount of data and extracting value. Alexandra Gunderson shares a comprehensive preprocessing methodology that structures and links data from different sources, converting the IIoT analytics process from an unorganized mammoth to one more likely to generate insight. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1E 10/11 Level: Non-technical
Jesse Anderson (Big Data Institute)
Average rating: *****
(5.00, 2 ratings)
Early project success is predicated on management making sure a data engineering team is ready and has all of the skills needed. Jesse Anderson outlines five of the most common nontechnology reasons why data engineering teams fail. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1E 12/13 Level: Non-technical
Average rating: *****
(5.00, 4 ratings)
Organizations need a process and supporting frameworks to become more effective at leveraging data and analytics to transform their business models. Using the Big Data Business Model Maturity Index as a guide, William Schmarzo demonstrates how to assess business value and implementation feasibility with respect to the monetization potential of an organization’s business use cases. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Streaming
Sahaana Suri (Stanford University)
Average rating: *****
(5.00, 1 rating)
Sahaana Suri offers an overview of MacroBase, a new analytics engine from Stanford designed to prioritize the scarcest resource in large-scale, fast-moving data streams: human attention. MacroBase allows reconfigurable, real-time root-cause analyses that have already diagnosed issues in production streams in mobile, data center, and industrial applications. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 01/02 Level: Intermediate
Secondary topics:  Media
Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
Shirshanka Das and Tushar Shanbhag explore the big data ecosystem at LinkedIn and share its journey to preserve member privacy while providing data democracy. Shirshanka and Tushar focus on three foundational building blocks for scalable data management that can meet data compliance regulations: a central metadata system, an integrated data movement platform, and a unified data access layer. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  IoT, Streaming
Average rating: *****
(5.00, 3 ratings)
Services such as YouTube, Netflix, and Spotify popularized streaming in different industry segments, but these services do not center around live data—best exemplified by sensor data—which will be increasingly important in the future. Arun Kejariwal, Francois Orsini, and Dhruv Choudhary demonstrate how to leverage Satori to collect, discover, and react to live data feeds at ultralow latencies. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Non-technical
Secondary topics:  Cloud
Karim Chine (RosettaHUB)
Karim Chine offers an overview of rosettaHUB—which aims to establish a global open data science metacloud centered on usability, reproducibility, auditability, and shareability—and shares the results of the rosettaHUB/AWS Educate initiative, which involved 30 higher education institutions and research labs and over 3,000 researchers, educators, and students. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Healthcare
Jon Fuller (KNIME), Olivia Klose (Microsoft)
Average rating: ***..
(3.00, 1 rating)
Jon Fuller and Olivia Klose explain how KNIME, Apache Spark, and Microsoft Azure enable fast and cheap automated classification of malignant lymphoma type in digital pathology images. The trained model is deployed to end users as a web application using the KNIME WebPortal. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud
Felipe Hoffa (Google)
Average rating: *****
(5.00, 1 rating)
With Google BigQuery anyone can easily analyze the more than five years of GitHub metadata and 42+ terabytes of open source code. Felipe Hoffa explains how to leverage this data to understand the community and code related to any language or project. Relevant for open source creators, users, and choosers, this is data that you can leverage to make better choices. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Architecture
Philip Russom (TDWI: The Data Warehousing Institute)
Philip Russom explains how a data lake can improve the role of Hadoop in data-driven business management. With the right end-user tools, a data lake can enable self-service data practices that wring business value from big data and modernize and extend programs for data warehousing, analytics, data integration, and other data-driven solutions. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Intermediate
Average rating: *****
(5.00, 1 rating)
Common ETL jobs used for importing log data into Hadoop clusters require a considerable amount of resources, which varies based on the input size. Thiruvalluvan M G shares a set of techniques—involving an innovative use of Spark processing and exploiting features of Hadoop file formats—that not only make these jobs much more efficient but also work well with fixed amounts of resources. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 23/24 Level: Non-technical
Bob Eilbacher (Caserta)
Building an efficient analytics environment requires a strong infrastructure. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Intermediate
Shant Hovsepian (Arcadia Data)
Average rating: ****.
(4.00, 1 rating)
Streaming visual analytics is a technique for visualizing and interacting with streaming data in near real time. Shant Hovsepian explains how lambda- and polling-based architectures are being disrupted by reactive visualization systems, as streaming engines embrace the CQRS pattern, and offers analysis of visualizing streams from Apache Kafka, Apache Flink, and Apache Spark. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1E 09 Level: Intermediate
Secondary topics:  IoT
Lloyd Palum (Vnomics)
Average rating: *****
(5.00, 2 ratings)
A digital twin models a real-world physical asset using mobile data, cloud computing, and machine learning to track chosen characteristics. Lloyd Palum walks you through building a tractor trailer digital twin using Python and TensorFlow. You can then use the example model to track and optimize performance. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1E 10/11 Level: Non-technical
Tanya Cashorali (TCB Analytics)
Average rating: *****
(5.00, 1 rating)
Given the recent demand for data analytics and data science skills, adequately testing and qualifying candidates can be a daunting task. Interviewing hundreds of individuals of varying experience and skill levels requires a standardized approach. Tanya Cashorali explores strategies, best practices, and deceptively simple interviewing techniques for data analytics and data science candidates. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1E 12/13 Level: Intermediate
Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
Average rating: ***..
(3.00, 2 ratings)
Recent years have seen dramatic advancements in the technologies available for managing and processing data. While these technologies provide powerful tools to build data applications, they also require new skills. Ted Malaska and Jonathan Seidman explain how to evaluate these new technologies and build teams to effectively leverage these technologies and achieve ROI with your data initiatives. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Text
Noemi Derzsy (Rensselaer Polytechnic Institute)
Open source data has enabled society to engage in community-based research and has provided government agencies with more visibility and trust from individuals. Noemi Derzsy offers an overview of the openNASA platform and discusses openNASA metadata analysis and tools for applying NLP and topic modeling techniques to understand open government dataset associations. Read more.