Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Tuesday, 09/11/2018

9:00am

9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 01/02 Level: Beginner
Ian Cook (Cloudera)
Average rating: ****.
(4.86, 7 ratings)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 03
Secondary topics:  Deep Learning
Dylan Bargteil (The Data Incubator)
The TensorFlow library provides for the use of data flow graphs for numerical computations, with automatic parallelization across several CPUs or GPUs. This architecture makes it ideal for implementing neural networks and other machine learning algorithms. Dylan Bargteil introduces TensorFlow's capabilities through its Python interface. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 04/05
Jerry Overton (DXC), Ashim Bose (DXC), Samir Sehovic (DXC)
Average rating: *****
(5.00, 1 rating)
Acquiring machine learning (ML) technology is relatively straightforward, but ML must be applied to be useful. In this one-day boot camp that is equal parts hackathon, presentation, and group participation, Jerry Overton, Ashim Bose, and Samir Sehovic teach you how to apply advanced analytics in ways that reshape the enterprise and improve outcomes. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 15/16 Level: Intermediate
Jesse Anderson (Big Data Institute)
Average rating: *....
(1.00, 1 rating)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks and explains how to choose the right one for your company. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 17
Kenneth Jones (Databricks, Inc.)
Ken Jones walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Spark’s streaming capabilities and machine learning APIs. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1E 17
Zachary Glassman (The Data Incubator)
Zachary Glassman leads a hands-on dive into building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using a real-world dataset. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Data Platforms
Mark Madsen (Teradata), Todd Walter (Archimedata)
Average rating: ***..
(3.50, 10 ratings)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 08
Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Amro Alkhatib (National Health Insurance Company-Daman), Mridul Mishra (Fidelity Investments), Patrick Angeles (Cloudera), James Psota (Panjiva ), Andreas Kohlmaier (Munich Re), Paul Lashmet (Arcadia Data), Nick Curcuru (Mastercard), Robin Way (Corios), Theresa Johnson (Airbnb), Jane Tran (Unqork), Swatee Singh (American Express)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 10 Level: Intermediate
David Arpin (Amazon Web Services)
Average rating: **...
(2.80, 10 ratings)
David Arpin walks you through building a machine learning application, from data manipulation to algorithm training to deployment to a real-time prediction endpoint, using Spark and Amazon SageMaker. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 12/14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Joshua Poduska (Domino Data Lab), Patrick Harrison (S&P Global)
Average rating: ****.
(4.29, 7 ratings)
The honeymoon era of data science is ending, and accountability is coming. Successful data science leaders deliver measurable impact on an increasing share of an enterprise’s KPIs. Joshua Poduska and Patrick Harrison detail how leading organizations have taken a holistic approach to people, process, and technology to build a sustainable competitive advantage Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Deep Learning, Text and Language processing and analysis
Garrett Hoffman (StockTwits)
Average rating: ****.
(4.75, 4 ratings)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Ethics and Privacy, Health and Medicine
Patrick Hall (bnh.ai | H2O.ai), Avni Wadhwa (H20.ai), Mark Chan (H2O.ai)
Average rating: ****.
(4.50, 4 ratings)
Transparency, auditability, and stability are crucial for business adoption and human acceptance of complex machine learning models. Patrick Hall, Avni Wadhwa, and Mark Chan share practical and productizable approaches for explaining, testing, and visualizing machine learning models using open source, Python-friendly tools such as GraphViz, H2O, and XGBoost. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 06 Level: Intermediate
Secondary topics:  Model lifecycle management
Dan Crankshaw (UC Berkeley RISELab)
Average rating: *****
(5.00, 1 rating)
Dan Crankshaw offers an overview of the current challenges in deploying machine applications into production and the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system and shows you how to get started. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 07/08 Level: Intermediate
Tim Berglund (Confluent)
Average rating: ****.
(4.33, 3 ratings)
Tim Berglund leads this solid introduction to Apache Kafka as a streaming data platform. You'll cover the internal architecture, APIs, and platform components like Kafka Connect and Kafka Streams, then finish with an exercise processing streaming data using KSQL, the new SQL-like declarative stream processing language for Kafka. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 09 Level: Intermediate
James Bednar (Anaconda)
Average rating: ****.
(4.60, 5 ratings)
Python lets you solve data science problems by stitching together packages from the Python ecosystem, but it can be difficult to assemble the right tools to solve real-world problems. James Bednar walks you through using the 15+ packages covered by the new PyViz.org initiative to make it simple to build interactive plots and dashboards, even for large, streaming, and highly multidimensional data. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1E 10
Paco Nathan (derwen.ai), Katharina Warzel (EveryMundo), Mike Berger (Mount Sinai Health System), Sam Helmich (Deere & Company), Stephanie Fischer (datanizing GmbH), Maryam Jahanshahi (TapRecruit), Greg Quist (SmartCover Systems), Ann Nguyen (Whole Whale), Steve Otto (Navistar), Jennifer Lim (Cerner), S Anand (Gramener), Ian Brooks (Cloudera)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 11 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
Mark Donsky (Okera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera), Ifigeneia Derekli (Cloudera), Camila Hiskey (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
New regulations such as GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Syed Rafice, Mubashir Kazia, Ifigeneia Derekli, and Camila Hiskey share hands-on best practices for meeting these challenges, with special attention paid to GDPR. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Data Platforms
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Average rating: ***..
(3.12, 8 ratings)
Arun Kejariwal and Karthik Ramasamy lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, covering messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. They also share case studies from the IoT, gaming, and healthcare and their experience operating these systems at internet scale. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 14 Level: Beginner
Viviana Acquaviva (CUNY New York City College of Technology)
Average rating: ****.
(4.75, 4 ratings)
Using interesting, diverse publicly available datasets and actual problems in astronomy research, Viviana Acquaviva leads an intermediate tutorial on machine learning. You'll learn how to customize algorithms and evaluation metrics required by scientific applications and discover best practices for choosing, developing, and evaluating machine learning algorithms in "real-world" datasets. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Recommendation Systems
Vijay Agneeswaran (Walmart Labs), Abhishek Kumar (Publicis Sapient)
Average rating: ****.
(4.40, 5 ratings)
Abhishek Kumar and Vijay Srinivas Agneeswaran offer an introduction to deep learning-based recommendation and learning-to-rank systems using TensorFlow. You'll learn how to build a recommender system based on intent prediction using deep learning that is based on a real-world implementation for an ecommerce client. Read more.

10:30am

10:30am–11:00am Tuesday, 09/11/2018
Location: 1A & 1E Halls
Morning Break (30m)

12:30pm

12:30pm–1:30pm Tuesday, 09/11/2018
Location: 3A
Lunch (1h)

1:30pm

1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 06/07 Level: Advanced
Secondary topics:  Data Platforms
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Average rating: ***..
(3.12, 8 ratings)
Using Customer 360 and the internet of things as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, including components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 10 Level: Intermediate
Jeroen Janssens (Data Science Workshops)
Average rating: ***..
(3.00, 3 ratings)
The Unix command line remains an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful command-line tools, you can quickly scrub, explore, and model your data as well as hack together prototypes. Join Jeroen Janssens for a hands-on workshop based on his book Data Science at the Command Line. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Bruno Goncalves (Data For Science)
Average rating: ***..
(3.14, 7 ratings)
Time series are everywhere around us. Understanding them requires taking into account the sequence of values seen in previous steps and even long-term temporal correlations. Join Bruno Gonçalves to learn how to use recurrent neural networks to model and forecast time series and discover the advantages and disadvantages of recurrent neural networks with respect to more traditional approaches. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)
Average rating: ***..
(3.00, 7 ratings)
David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial for scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 23/24 Level: Intermediate
Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
Average rating: ***..
(3.67, 3 ratings)
Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead. You'll also discover a few ML model serving ideas along the way. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 06 Level: Intermediate
Carolyn Duby (Cloudera)
Carolyn Duby shows you how to find the cybersecurity threat needle in your event haystack using Apache Metron: a real-time, horizontally scalable open source platform. After this interactive overview of the platform's major features, you'll be ready to analyze your own haystack back at the office. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Deep Learning
Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera)
Average rating: *....
(1.00, 1 rating)
Vartika Singh, Alan Silva, Alex Bleakley, Steven Totman, Mirko Kämpf, and Syed Nasar outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 09 Level: Intermediate
Secondary topics:  Model lifecycle management
Brian Foo (Google), Holden Karau (Independent), Jay Smith (Google)
Average rating: **...
(2.00, 7 ratings)
TensorFlow and Keras are popular libraries for training deep models due to hardware accelerator support. Brian Foo, Jay Smith, and Holden Karau explain how to bring deep learning models from training to serving in a cloud production environment. You'll learn how to unit-test, export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 11 Level: Intermediate
Secondary topics:  Ethics and Privacy
Aileen Nielsen (Skillman Consulting)
Average rating: ****.
(4.00, 4 ratings)
There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is reproducing or even amplifying existing prejudices and social inequalities. Aileen Nielsen demonstrates how to identify and avoid bias and other unfairness in your analyses. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 12/13 Level: Intermediate
Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Paul Sears (Amazon Web Services), Faria Bruno (Amazon Web Services)
Average rating: **...
(2.86, 7 ratings)
Want to learn how to use Amazon's big data web services to launch your first big data application in the cloud? Jorge Lopez, Radhika Ravirala, Paul Sears, and Bruno Faria walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 14 Level: Intermediate
Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera)
Attend this tutorial to learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows and explore considerations and best practices for data analytics pipelines in the cloud. Along the way, you'll see how to share metadata across workloads in a big data PaaS. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 15/16 Level: Beginner
Secondary topics:  Machine Learning in the enterprise
Average rating: **...
(2.67, 9 ratings)
Janet Forbes, Danielle Leighton, and Lindsay Brin lead a primer on crafting well-conceived data science projects that uncover valuable business insights. Using case studies and hands-on skills development, Janet, Danielle, and Lindsay walk you through essential techniques for effecting real business change. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 09/11/2018
Location: 1A & 1E Halls
Afternoon Break (30m)

5:00pm

5:00pm–6:30pm Tuesday, 09/11/2018
Location: 3B | Expo Hall
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Tuesday. Read more.

Wednesday, 09/12/2018

7:30am

7:30am–8:45am Wednesday, 09/12/2018
Location: 3E Foyer
Morning Coffee (1h 15m)

8:00am

8:00am–8:30am Wednesday, 09/12/2018
Location: Crystal Palace
Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:50am

8:50am–9:00am Wednesday, 09/12/2018
Location: 3E
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: **...
(2.88, 8 ratings)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:00am

9:00am–9:15am Wednesday, 09/12/2018
Location: 3E
Anupam Singh (Cloudera), brian coyne (PNC)
Average rating: ***..
(3.24, 17 ratings)
Data volumes don’t translate to business value. What matters is your data platform’s ability to support unprecedented numbers of business users and use cases. Anupam Singh and Brian Coyne look at some of the challenges posed by data-hungry organizations and share new techniques to extract meaningful insights at the speed of today’s modern business. Read more.

9:15am

9:15am–9:25am Wednesday, 09/12/2018
Location: 3E
Secondary topics:  Ethics and Privacy
Ben Lorica (O'Reilly)
Average rating: ***..
(3.92, 13 ratings)
As companies begin adopting machine learning, important considerations, including fairness, transparency, privacy, and security, need to be accounted for. Ben Lorica offers an overview of recent tools for building privacy-preserving and secure machine learning products and services. Read more.

9:25am

9:25am–9:35am Wednesday, 09/12/2018
Location: 3E
Ted Dunning (MapR, now part of HPE)
Average rating: **...
(2.79, 19 ratings)
There’s real value in big data and more waiting when you add real-time, but to get the payoff, you need successful deployments of your AI and data-intensive applications. You need to be ready with your current applications in production but must have an architecture and infrastructure that are ready for the next ones as well. Ted Dunning explores how others have fared in this journey. Read more.

9:35am

9:35am–9:50am Wednesday, 09/12/2018
Location: 3E
Secondary topics:  Financial Services, Machine Learning in the enterprise
Jeffrey Wecker (Goldman Sachs)
Average rating: ***..
(3.12, 26 ratings)
Jeffrey Wecker leads a deep dive on data in financial services, with perspectives on the evolving landscape of data science, the advent of alternative data, the importance of data centricity, and the future for machine learning and AI. Read more.

9:50am

9:50am–9:55am Wednesday, 09/12/2018
Location: 3E
DD Dasgupta (Cisco)
Average rating: ***..
(3.60, 15 ratings)
DD Dasgupta explores the exciting development of the edge-cloud continuum, which is redefining business models and technology strategies while creating a vast array of new applications that will power the digital age. The continuum is also destroying what we know about the centralized data centers and cloud computing infrastructures that were so vital to the success of the previous computing eras. Read more.

9:55am

9:55am–10:15am Wednesday, 09/12/2018
Location: 3E
Cassie Kozyrkov (Google)
Average rating: ****.
(4.67, 30 ratings)
Why do businesses fail at machine learning despite its tremendous potential and the excitement it generates? Is the answer always in data, algorithms, and infrastructure, or is there a subtler problem? Will things improve in the near future? Let's talk about some lessons learned at Google and what they mean for applied data science. Read more.

10:15am

10:15am–10:25am Wednesday, 09/12/2018
Location: 3E
Drew Paroski (MemSQL), Aatif Din (Fanatics)
Average rating: **...
(2.92, 13 ratings)
Today’s successful businesses utilize data better than their competitors; however, data sprawl and inefficient data infrastructure restrict what’s possible. Blending the best of the past with the software innovations of today will solve future data challenges. Drew Paroski shares how to develop modern database applications without sacrificing cost savings, data familiarity, and flexibility. Read more.

10:25am

10:25am–10:45am Wednesday, 09/12/2018
Location: 3E
Secondary topics:  Blockchain and decentralization, Financial Services
Joseph Lubin (Consensus Systems)
Average rating: ***..
(3.00, 12 ratings)
Ethereum is a world computer on top of a peer-to-peer network that runs smart contracts - applications that run exactly as programmed without the possibility of censorship, fraud, or third-party interference. Until now, businesses had to build their systems on database technologies that resulted in siloed and redundant information in typically adversarial contexts. Read more.

10:50am

10:50am–11:20am Wednesday, 09/12/2018
Location: 3B | Expo Hall
Morning break sponsored by Cisco (30m)

11:20am

11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 01/02
Jim Scott (NVIDIA)
Drawing on his experience working with customers across many industries, including chemical sciences, healthcare, and oil and gas, Jim Scott details the major impediments to successful completion of deep learning projects and solutions while walking you through a customer use case. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 03
Chiang Yang (Cisco)
Data is the lifeblood of an enterprise, and it's being generated everywhere. To overcome the challenges of data gravity, data analytics, including machine learning, is best done where the data is located: ubiquitous machine learning. Han Yang explains how to overcome the challenges of machine learning everywhere. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 04/05
Petrus Smith (PwC)
Peet Smith explains how PwC is using modern database tools with a combination of open source technologies to automate and scale data ingestion and transformation to get data to engagement teams to help them streamline and accelerate client service delivery. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Retail and e-commerce, Temporal data and time-series analytics
Mikio Braun (Zalando)
Average rating: ****.
(4.86, 7 ratings)
Time series data has many applications in industry, from analyzing server metrics to monitoring IoT signals and outlier detection. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn’t, and industry use cases. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 17
Tim Davis (IBM)
Average rating: ***..
(3.33, 3 ratings)
Tim Davis discusses key pain points and solutions to problems many enterprises face with data in silos, poor-quality data that cannot always be trusted, and managing and making large volumes of data available to derive more accurate insights and machine learning models. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Beginner
Secondary topics:  Deep Learning, Recommendation Systems
Shioulin Sam (Cloudera Fast Forward Labs)
Average rating: ***..
(3.25, 4 ratings)
Recent advances in deep learning allow us to use the semantic content of items in recommendation systems, addressing a weakness of traditional methods. Shioulin Sam explores the limitations of classical approaches and explains how using the content of items can help solve common recommendation pitfalls, such as the cold start problem, and open up new product possibilities. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 08 Level: Advanced
Secondary topics:  Media, Marketing, Advertising
Daniel Kang (Stanford University)
Average rating: ****.
(4.00, 2 ratings)
Daniel Kang offers an overview of exploratory video analytics engine BlazeIt, which offers FrameQL, a declarative SQL-like language for querying video, and a query optimizer for executing these queries. You'll see how FrameQL can capture a large set of real-world queries ranging from aggregation and scrubbing and how BlazeIt can execute certain queries up to 2,000x faster than a naive approach. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines, Transportation and Logistics
Felix Cheung (Uber)
Average rating: ****.
(4.60, 5 ratings)
Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Beginner
Secondary topics:  Health and Medicine
Olga Cuznetova (Optum), Manna Chang (Optum)
Average rating: ***..
(3.33, 3 ratings)
Olga Cuznetova and Manna Chang demonstrate supervised and unsupervised learning methods to work with claims data and explain how the methods complement each other. The supervised method looks at CKD patients at risk of developing end-stage renal disease (ESRD), while the unsupervised approach looks at the classification of patients that tend to develop this disease faster than others. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Ethics and Privacy
Felipe Hoffa (Google), Damien Desfontaines (Google | ETH Zürich)
Average rating: ****.
(4.00, 1 rating)
Before releasing a public dataset, practitioners need to thread the needle between utility and protection of individuals. Felipe Hoffa and Damien Desfontaines explore how to handle massive public datasets, taking you from theory to real life as they showcase newly available tools that help with PII detection and brings concepts like k-anonymity and l-diversity to the practical realm. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines
Gwen Shapira (Confluent)
Average rating: ****.
(4.00, 4 ratings)
Gwen Shapira shares design and architecture patterns that are used to modernize data engineering. You'll learn how modern engineering organizations use Apache Kafka, microservices, and event streams to efficiently build data pipelines that are scalable, reliable, and built to evolve. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 06
Paul Kent (SAS)
Average rating: *****
(5.00, 1 rating)
Software is eating the world, and open source is eating the software. Most contemporary analytics shops use a lot of open source software in their analytics platform. So where does commercial software like SAS fit? Paul Kent explains how you can achieve the best of both worlds by combining your favorite open source software with the power of SAS analytics. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Gerard Maas (Lightbend)
Average rating: *****
(5.00, 1 rating)
Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences with regard to key aspects of a streaming application: API usability, dealing with time, dealing with state and machine learning capabilities, and more. You'll learn when to pick one over the other or combine both to implement resilient streaming pipelines. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 09 Level: Beginner
Secondary topics:  Data Platforms
Cory Minton (Dell EMC), Colm Moynihan (Cloudera)
Average rating: *****
(5.00, 1 rating)
Cory Minton and Colm Moynihan explain how to choose the right deployment model for on-premises infrastructure to reduce risk, reduce costs, and be more nimble. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Non-technical
Secondary topics:  Data preparation, governance and privacy, Machine Learning in the enterprise
JF Gagne (Element AI)
Average rating: ***..
(3.50, 4 ratings)
JF Gagne explains why the CIO is going to need a broader mandate in the company to better align their AI training and outcomes with business goals and compliance. This mandate should include an AI governance team that is well staffed and deeply established in the company, in order to catch biases that can develop from faulty goals or flawed data. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Machine Learning in the enterprise
Jennifer Prendki (Figure Eight)
Average rating: ****.
(4.38, 8 ratings)
Agile methodologies have been widely successful for software engineering teams but seem inappropriate for data science teams, because data science is part engineering, part research. Jennifer Prendki demonstrates how, with a minimum amount of tweaking, data science managers can adapt Agile techniques and establish best practices to make their teams more efficient. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
Mark Donsky (Okera), Steven Ross (Cloudera)
In May 2018, the General Data Protection Regulation (GDPR) went into effect for firms doing business in the EU, but many companies still aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Steven Ross outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: Expo Hall Level: Non-technical
Secondary topics:  Data Integration and Data Pipelines, Financial Services
Usama Fayyad (Open Insights & OODA Health, Inc.), Troels Oerting (WEF Global Cybersecurity Center)
Average rating: ***..
(3.00, 1 rating)
Usama Fayyad and Troels Oerting share outcomes and lessons learned from building and deploying a global data fusion, incident analysis/visualization, and effective cybersecurity defense based on big data and AI at a major EU bank, in collaboration with several financial services institutions. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1 E15
Ward Eldred (NVIDIA)
Average rating: *****
(5.00, 2 ratings)
Ward Eldred offers an overview of the types of analytical problems that can be solved using deep learning and shares a set of heuristics that can be used to evaluate the feasibility of analytical AI projects. Read more.

12:00pm

12:00pm–1:15pm Wednesday, 09/12/2018
Location: Expo Hall (Hall 3B)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
12:00pm–1:15pm Wednesday, 09/12/2018
Location: 3D 10/11
Average rating: ***..
(3.20, 5 ratings)
If you’re looking to find like minds and make new professional connections, come to the women's networking lunch on Wednesday. Read more.
12:00pm–1:15pm Wednesday, 09/12/2018
Location: 3D 09
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.

1:15pm

1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 01/02
Skyler Thomas (MapR)
Average rating: *****
(5.00, 2 ratings)
In the past, there have been major challenges in quickly creating machine learning training environments and deploying trained models into production. Skyler Thomas details how Kubernetes helps data scientists and IT work in concert to speed model training and time-to-value. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 03
Srikanth Desikan (Oracle)
Average rating: *****
(5.00, 1 rating)
SparklineData is an in-memory distributed scale-out analytics platform built on Apache Spark to enable enterprises to query on data lakes directly with instant response times. Srikanth Desikan offers an overview of SparklineData and explains how it can enable new analytics use cases working on the most granular data directly on data lakes. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 04/05
Arakere Ramesh (Intel), Bharath Yadla (Aerospike)
Persistent memory accelerates analytics, database, and storage workloads across a variety of use cases, bringing new levels of speed and efficiency to the data center and to in-memory computing. Arakere Ramesh and Bharath Yadla offer an overview of the newly announced Intel Optane data center persistent memory and share the exciting potential of this technology in analytics solutions. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Recommendation Systems, Retail and e-commerce
Longqi Yang (Cornell Tech, Cornell University)
State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 17
Zhi Zhu (China Construction Bank ), Luke Han (Kyligence)
When China Construction Bank wanted to migrate 23,000+ reports to mobile, it chose Apache Kylin as the high-performance and high-concurrency platform to refactor its data warehouse architecture to serving 400K+ users. Zhi Zhu and Luke Han detail the necessary architecture and best practices for refactoring a data warehouse for mobile analytics. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Recommendation Systems, Text and Language processing and analysis
James Dreiss (Reuters)
Average rating: ***..
(3.67, 3 ratings)
James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Model lifecycle management
William Benton (Red Hat)
Average rating: *****
(5.00, 2 ratings)
Containers are a hot technology for application developers, but they also provide key benefits for data scientists. William Benton details the advantages of containers for data scientists and AI developers, focusing on high-level tools that will enable you to become more productive and collaborate more effectively. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Secondary topics:  Data Platforms
Ryan Blue (Netflix), Daniel Weeks (Netflix)
Average rating: *****
(5.00, 3 ratings)
In the last few years, Netflix's data warehouse has grown to more than 100 PB in S3. Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Temporal data and time-series analytics
Arun Kejariwal (Independent), Francois Orsini (MZ)
Average rating: ****.
(4.00, 1 rating)
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Blockchain and decentralization, Data preparation, governance and privacy
Minh Chau Nguyen (ETRI), Heesun Won (ETRI)
Average rating: **...
(2.20, 5 ratings)
Minh Chau Nguyen and Heesun Won explain how to implement analytics services in data marketplace systems on a single Hadoop cluster across distributed data centers. The solution extends the overall architecture of the Hadoop ecosystem with the blockchain so that multiple tenants and authorized third parties can securely access data while still maintaining privacy, scalability, and reliability. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines
Yaroslav Tkachenko (Activision)
Average rating: ****.
(4.67, 3 ratings)
What's easier than building a data pipeline? You add a few Apache Kafka clusters and a way to ingest data, design a way to route your data streams, add a few stream processors and consumers, integrate with a data warehouse. . .wait, this looks like a lot of things. Join Yaroslav Tkachenko to learn best practices for building a data pipeline, drawn from his experience at Demonware/Activision. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 06
Interested in how Ebates is using a hybrid on-premises and cloud implementation to scale out its centralized business intelligence and data hub? Mark Stange-Tregear shares the history, business context, and technical plan around Ebates’s hybrid Hadoop-AWS cloud approach. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Fabian Hueske (Ververica)
Average rating: *****
(5.00, 1 rating)
Fabian Hueske discusses why SQL is a great approach to unify batch and stream processing. He gives an update on Apache Flink's SQL support and shares some interesting use cases from large-scale production deployments. Finally, Fabian presents Flink's new query service that enables users and applications to submit streaming and batch SQL queries and retrieve low-latency updated results. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 09 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy
Andrew Brust (Blue Badge Insights | ZDNet)
Average rating: ****.
(4.50, 2 ratings)
Data governance has grown from a set of mostly data management-oriented technologies in the data warehouse era to encompass catalogs, glossaries, and more in the data lake era. Now new requirements are emerging, and new products are rising to meet the challenge. Andrew Brust tracks data governance's past and present and offers a glimpse of the future. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise, Retail and e-commerce
Erin Coffman (Airbnb)
Average rating: *****
(5.00, 7 ratings)
Airbnb has open-sourced many high-leverage data tools, including Airflow, Superset, and the Knowledge Repo, but adoption of these tools across the company was relatively low. Erin Coffman offers an overview of Data University, launched to make data more accessible and utilized in decision making at Airbnb. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Advanced
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
Les McMonagle (BlueTalon)
Average rating: *****
(5.00, 2 ratings)
Privacy by design is a fundamentally important approach to achieving compliance with GDPR and other data privacy or data protection regulations. Les McMonagle outlines how organizations can save time and money while improving data security and regulatory compliance and dramatically reduce the risk of a data breach or expensive penalties for noncompliance. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Tony Baer (dbInsight), Florian Douetteau (DATAIKU)
Average rating: ***..
(3.40, 5 ratings)
Tony Baer and Florian Douetteau share the results of research cosponsored by Ovum and Dataiku that surveyed a specially selected sample of chief data officers and data scientists on how to map roles and processes to make success with AI in the business repeatable. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: Expo Hall Level: Beginner
Jason Wang (Cloudera), Suraj Acharya (Cloudera), Tony Wu (Cloudera)
The largest infrastructure paradigm change of the 21st century is the shift to the cloud. Companies now face the difficult decision of which cloud to go with. This decision is not just financial and in many cases rests on the underlying infrastructure. Jason Wang, Suraj Acharya, and Tony Wu compare the relative strengths and weaknesses of AWS and Azure. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1 E15
Darrin Johnson (NVIDIA)
While every enterprise is on a mission to infuse its business with deep learning, few know how to build the infrastructure to get them there. Darrin Johnson shares insights and best practices learned from NVIDIA's deep learning deployments around the globe that you can leverage to shorten deployment timeframes, improve developer productivity, and streamline operations. Read more.

2:05pm

2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 01/02
Anand Raman (Microsoft), Wee Hyong Tok (Microsoft)
Anand Raman and Wee Hyong Tok walk you through applying AI technologies in the cloud. You'll learn how to add prebuilt AI capabilities like object detection, face understanding, translation, and speech to applications, build cognitive search applications that understand deep content in images, text, and other data, use the Azure platform to accelerate machine learning, and more. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 03
chris wojdak (Symcor)
Average rating: ****.
(4.67, 3 ratings)
Chris Wojdak explains how Symcor has transformed its big data architecture using Informatica’s comprehensive machine learning-based solutions for data integration, data quality, data cataloging, and data governance. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 04/05
Ben Sharma (Zaloni), Selwyn Collaco (TMX)
Average rating: *****
(5.00, 2 ratings)
Selwyn Collaco and Ben Sharma share insights from their real-world experience and discuss best practices for architecture, technology, data management, and governance to enable centralized data services and explain how to leverage the Zaloni Data Platform (ZDP), an integrated self-service data platform, to operationalize the enterprise data lake . Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Recommendation Systems, Temporal data and time-series analytics, Transportation and Logistics
Ankit Jain (Uber)
Average rating: ***..
(3.00, 3 ratings)
Personalization is a common theme in social networks and ecommerce businesses. Personalization at Uber involves an understanding of how each driver and rider is expected to behave on the platform. Ankit Jain explains how Uber employs deep learning using LSTMs and its huge database to understand and predict the behavior of each and every user on the platform. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 17
Intelligent enterprises—fueled by rapid advances in artificial intelligence (AI), machine learning (ML), and the internet of things (IoT)—promise significant business value. Richard Mooney explains how to achieve the game-changing outcomes of an intelligent enterprise, delivering value across business functions with the synergy of machine and human intelligence. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Recommendation Systems
Ahsan Ashraf (Pinterest)
Online recommender systems often rely heavily on user engagement features. This can cause a bias toward exploitation over exploration, overoptimizing on users' interests. Content diversification is important for user satisfaction, but measuring and evaluating impact is challenging. Ahsan Ashraf outlines techniques used at Pinterest that drove ~2–3% impression gains and a ~1% time-spent gain. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Data Platforms, Model lifecycle management, Retail and e-commerce
Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)
Average rating: *****
(5.00, 3 ratings)
Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 10 Level: Beginner
Sophie Watson (Red Hat)
Average rating: ***..
(3.50, 6 ratings)
Recommender systems enhance user experience and business revenue every day. Sophie Watson demonstrates how to develop a robust recommendation engine using a microservice architecture. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Retail and e-commerce, Temporal data and time-series analytics
Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )
Average rating: *****
(5.00, 3 ratings)
Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Non-technical
Secondary topics:  Blockchain and decentralization, Financial Services
Jim Scott (NVIDIA)
Average rating: **...
(2.67, 3 ratings)
Jim Scott details relevant use cases for blockchain-based solutions across a variety of industries, focusing on a suggested architecture to achieve high-transaction-rate private blockchains and decentralized applications backed by a blockchain. Along the way, Jim compares public and private blockchain architectures. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines
Danny Chen (Uber Technologies), Omkar Joshi (Uber), Eric Sayle (Uber Technologies)
Average rating: ***..
(3.80, 5 ratings)
Danny Chen, Omkar Joshi, and Eric Sayle offer an overview of Marmaray, a generic Hadoop ingestion and dispersal framework recently released to production at Uber. You'll learn how Marmaray can meet a team's data needs by ensuring that data can be reliably ingested into Hive or dispersed into online data stores and take a deep dive into the architecture to see how it all works. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 06
Randy Lea (Arcadia Data)
Average rating: *****
(5.00, 1 rating)
The use of data lakes continue to grow, and the right business intelligence (BI) and analytics tools on data lakes are critical to data lake success. Randy Lea explains why existing BI tools work well for data warehouses but not data lakes and why every organization should have two BI standards: one for data warehouses and one for data lakes. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Beginner
Karthik Ramasamy (Streamlio), Andrew Jorgensen (Google)
Average rating: ****.
(4.00, 1 rating)
Streaming systems like Apache Heron are being used for an increasingly broad array of applications. Karthik Ramasamy and Andrew Jorgensen offer an overview of Fabric Answers, which provides real-time insights to mobile developers to improve their product experience at Google Fabric using Apache Heron. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 09 Level: Advanced
Anant Chintamaneni (BlueData), Nanda Vijaydev (BlueData)
Average rating: *****
(5.00, 1 rating)
Kubernetes (K8s)—the open source container orchestration system for modern big data workloads—is increasingly popular. While the promised land is a unified platform for cloud-native stateless and stateful data services, stateful, multiservice big data cluster orchestration brings unique challenges. Anant Chintamaneni and Nanda Vijaydev outline the considerations for big data services for K8s. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Beginner
Lawrence Cowan (Cicero Group)
Average rating: ***..
(3.00, 3 ratings)
Firms are struggling to leverage their data. Lawrence Cowan outlines a methodology for assessing four critical areas that firms must consider when looking to make the analytical leap: data strategy, data culture, data analysis and implementation, and data management and architecture. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Beginner
Secondary topics:  Ethics and Privacy
Harry Glaser (Periscope Data)
Average rating: *****
(5.00, 2 ratings)
What is the moral responsibility of a data team today? As AI and machine learning technologies become part of our everyday life and as data becomes accessible to everyone, CDOs and data teams are taking on a very important moral role as the conscience of the corporation. Harry Glaser highlights the risks companies will face if they don't empower data teams to lead the way for ethical data use. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Secondary topics:  Machine Learning in the enterprise, Model lifecycle management
David Talby (Pacific AI)
Average rating: ****.
(4.40, 5 ratings)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: Expo Hall
Secondary topics:  Model lifecycle management
Mani Parkhe (Databricks), Andrew Chen (Databricks)
Successfully building and deploying a machine learning model is difficult to do once. Enabling other data scientists to reproduce your pipeline, compare the results of different versions, track what's running where, and redeploy and rollback updated models is much harder. Mani Parkhe and Andrew Chen offer an overview of MLflow—a new open source project from Databricks that simplifies this process. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1 E15
Michael Balint (NVIDIA)
Michael Balint explains how NVIDIA employs its own distribution of Kubernetes, in conjunction with DGX hardware, to make the most efficient use of GPU resources and scale its efforts across a cluster, allowing multiple users to run experiments and push their finished work to production. Read more.

2:55pm

2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 01/02
Sara Alavi (Bell Canada)
Bell Canada, Canada's largest communications company, leads the industry in providing world-class broadband communications services to consumers and business customers. Join Sara Alavi to learn how the network big data and AI team within Bell is using modern data environments and applying a startup mindset to transform traditional networks into insight-driven intelligent networks. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 03
Mathew Lodge (Anaconda)
Average rating: *****
(5.00, 1 rating)
The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Welcome to the future. Containers and Kubernetes make great language-agnostic distributed computing clusters: it's just as easy to deploy Python as it is Java. Mathew Lodge shows you how. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 04/05
Michael Mahoney (Kinetica)
Michael Mahoney demonstrates how to leverage the power of GPUs to converge streaming data analysis, location analysis, and streamlined machine learning with a single engine. Along the way, Michael shares real-world case studies on how Kinetica is used to solve complex data challenges. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Alex Heye (Cray), Ding Ding (Intel)
Precipitation nowcasting is used to predict the future rainfall intensity over a relatively short timeframe. The forecasting resolution and time accuracy required are much higher than for other traditional forecasting tasks. Alexander Heye and Ding Ding explain how to build a precipitation nowcasting system with recurrent neural networks using BigDL on Apache Spark. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 17
Chris Stirrat (Eagle Investment Systems)
Average rating: ***..
(3.00, 1 rating)
Eagle Investment Systems, a leading provider of financial services technology, is building a new Hadoop and cloud-based data management solution. Chris Stirrat explains how Eagle went from incubation to an enterprise-scale solution in just 10 months, using a Hadoop-based big data stack and multitenant architecture, transforming software creation, delivery, quality, technology, and culture. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Ethics and Privacy, Media, Marketing, Advertising, Recommendation Systems
Bonnie Barrilleaux (LinkedIn)
Average rating: ****.
(4.50, 4 ratings)
As LinkedIn encouraged members to join conversations, it found itself in danger of creating a "rich get richer" economy in which a few creators got an increasing share of all feedback. Bonnie Barrilleaux explains why you must regularly reevaluate metrics to avoid perverse incentives—situations where efforts to increase the metric cause unintended negative side effects. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Ethics and Privacy
Chang Liu (Georgian Partners )
Average rating: *****
(5.00, 1 rating)
Chang Liu offers an overview of a common problem faced by many software companies, the cold-start problem, and explains how Georgian Partners has been successful at solving this problem by transferring knowledge from existing data through differentially private data aggregation. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Greg Rahn (Cloudera)
Average rating: *****
(5.00, 1 rating)
Cloud object stores are becoming the bedrock of cloud data warehouses for modern data-driven enterprises, and it's become a necessity for data teams to have the ability to directly query data stored in S3 or ADLS. Greg Rahn and Mostafa Mokhtar discuss optimal end-to-end workflows and technical considerations for using Apache Impala over object stores for your cloud data warehouse. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Beginner
Jeroen Janssens (Data Science Workshops)
Average rating: *....
(1.50, 2 ratings)
"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Data Platforms, Retail and e-commerce
Varant Zanoyan (Airbnb)
Average rating: ****.
(4.33, 6 ratings)
Zipline is Airbnb’s soon to be open-sourced data management platform specifically designed for ML use cases. It has taken the task of feature generation from months to days and offers features to support end-to-end data management for machine learning. Varant Zanoyan covers Zipline's architecture and dives into how it solves ML-specific problems. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines
Average rating: **...
(2.67, 3 ratings)
Mauricio Aristizabal shares lessons learned from migrating Impact's traditional ETL platform to a real-time platform on Hadoop (leveraging the full Cloudera EDH stack). Mauricio also discusses the company's data lake in HBase, Spark Streaming jobs (with Spark SQL), using Kudu for "fast data" BI queries, and using Kafka's data bus for loose coupling between components. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 06
Basil Faruqui (BMC Software)
Average rating: **...
(2.00, 1 rating)
Basil Faruqui demonstrates how to simplify the automation and orchestration of an IoT-driven data pipeline in a cloud environment where machine learning algorithms predict failures. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Bill Chambers (Databricks)
Average rating: ***..
(3.00, 1 rating)
Streaming big data is a rapidly growing field but currently involves a lot of operational complexity and expertise. Bill Chambers shares a decision making framework for determining the best tools and technologies for successfully deploying and maintaining streaming data pipelines to solve business problems and offers an overview of Apache Spark’s Structured Streaming processing engine. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 09 Level: Beginner
Paul Curtis (Weaveworks)
Average rating: *****
(5.00, 2 ratings)
Once the data has been captured, how can the cloud, containers, and a data fabric combine to build the infrastructure to provide the business insights? Paul Curtis explores three customer deployments that leverage the best of the private clouds and containers to provide a flexible big data environment. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Non-technical
Friederike Schuur (Cloudera), Rita Ko (USA for UNHCR)
Average rating: *****
(5.00, 1 rating)
Friederike Schuur and Rita Ko explain how the Hive (an internal group at USA for UNHCR) and Cloudera Fast Forward Labs transformed USA for UNHCR, enabling the agency to use data science and machine learning (DS/ML) to address the refugee crisis. Along the way, they cover the development and implementation of a DS/ML strategy, identify use cases and success metrics, and showcase the value of DS/ML. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Non-technical
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
Andrew Burt (bnh.ai)
Average rating: *****
(5.00, 2 ratings)
Machine learning is becoming prevalent across industries, creating new types of risk. Managing this risk is quickly becoming the central challenge of major organizations, one that strains data science teams, legal personnel, and the C-suite alike. Andrew Burt shares lessons from past regulations focused on similar technology along with a proposal for new ways to manage risk in ML. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Secondary topics:  Machine Learning in the enterprise, Media, Marketing, Advertising
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Average rating: ****.
(4.00, 3 ratings)
Creating a successful big data practice in your organization presents new challenges in managing projects and teams. Ted Malaska and Jonathan Seidman share guidance and best practices to help technical leaders deliver successful projects from planning to implementation. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: Expo Hall Level: Intermediate
Michael Freedman (TimescaleDB)
Michael Freedman explains how to leverage Postgres for high-volume time series workloads using TimescaleDB, an open source time series database built as a Postgres plug-in. Michael covers the general architectural design principles and new time series data management features, including adaptive time partitioning and near-real-time continuous aggregations. Read more.

3:35pm

3:35pm–4:35pm Wednesday, 09/12/2018
Location: 3B | Expo Hall
Afternoon Break sponsored by Intel (1h)

4:35pm

4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 01/02
Average rating: *****
(5.00, 3 ratings)
TD Bank’s data analytics team has undertaken a multiyear journey to modernize its data infrastructure for today and future needs. Joseph DosSantos explains how the team built a governed data lake foundation, enabling business users to leverage its big data environment to extract analytical insights while minimizing risks. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 03
As the data authority for hybrid cloud for big data analytics and AI, NetApp understands the value of the access, management, and control of data. Karthikeyan Nagalingam discusses the NetApp Data Fabric, which provides a unified data management environment that spans edge devices, data centers, and multiple hyperscale clouds using ONTAP software, all-flash systems, ONTAP Select, and cloud volumes. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 04/05
Anand Raman (Impetus Technologies)
Average rating: *....
(1.00, 1 rating)
Is a single source of truth across the enterprise possible, or is it just an expensive myth? Anand Raman explains why you need a holistic decision framework that addresses multiple facets from platform to processes. Join in to explore EDW modernization strategies, self-service analytics, and interactive insights on big data and discover a process to get to a unified data model. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Retail and e-commerce
Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)
Average rating: *****
(5.00, 1 rating)
Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 17
Paul Scott-Murphy (WANdisco)
Average rating: ****.
(4.50, 2 ratings)
Every organization is considering its storage options, with an eye toward the cloud. Paul Scott-Murphy explores what makes different large-scale storage systems and services unique, their clear (and unexpected) differences, the options you have to use them, and the surprises you can expect along the way. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Beginner
Secondary topics:  Financial Services, Text and Language processing and analysis
Masha Westerlund (Investopedia)
Average rating: *****
(5.00, 2 ratings)
Businesses rely on user data to power their sites, products, and sales. Can we give back by sharing those insights with users? Masha Westerlund explains how Investopedia harnessed reader data to build an index that tracks market anxiety and moves with the VIX, a proprietary measure of market volatility. You'll see how thinking outside the box helps turn data into tools for users, not stakeholders. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 08 Level: Intermediate
Sumit Gulwani (Microsoft)
Programming by input-output examples (PBE) is a new frontier in AI, set to revolutionize the programming experience for the masses. It can enable end users—99% of whom are nonprogrammers—to create small scripts and make data scientists 10–100x more productive for many data wrangling tasks. Sumit Gulwani leads a deep dive into this new programming paradigm and explores the science behind it. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Jacques Nadeau (Dremio)
Average rating: *****
(5.00, 1 rating)
Jacques Nadeau leads a deep dive into a new Apache-licensed lightweight distributed in-memory cache that allows multiple applications to consume Arrow directly using the Arrow RPC and IPC protocols. You'll explore the system design and deployment architecture—including the cache life cycle, update patterns, cache cohesion, and appropriate use cases—learn how it all works, and see it in action. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 12/14
Sarah Catanzaro (Amplify Partners), Rama Sekhar (Norwest Venture Partners), Zavain Dar (Lux Capital), Jonathan Lehr (Work-Bench), Crystal Huang (NEA)
In this panel discussion, venture capital investors explain how startups can accelerate enterprise adoption of machine learning and explore the new tech trends that will give rise to the next transformation in the big data landscape. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Beginner
Secondary topics:  Data Platforms
Osman Sarood (Mist Systems)
Average rating: **...
(2.00, 1 rating)
Mist consumes several terabytes of telemetry data daily from its globally deployed wireless access points, a significant portion of which is consumed by ML algorithms. Last year, Mist saw 10x infrastructure growth. Osman Sarood explains how Mist runs 75% of its production infrastructure, reliably, on AWS EC2 spot instances, which has brought its annual AWS cost from $3 million to $1 million. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines, Data preparation, governance and privacy
Neelesh Salian (Stitch Fix)
Average rating: *....
(1.33, 3 ratings)
Neelesh Srinivas Salian explains how Stitch Fix built a service to better understand the movement and evolution of data within the company's data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh covers why and how Stitch Fix built the service and details some use cases. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 06
Faria Bruno (Amazon Web Services)
Bruno Faria explains how to identify the components and workflows in your current environment and shares best practices to migrate these workloads to AWS. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Brian Wu (AppNexus)
Average rating: *****
(5.00, 1 rating)
Automating the success of digital ad campaigns is complicated and comes with the risk of wasting the advertiser's budget or a trader's margin and time. Brian Wu describes the evolution of Inventory Discovery, a streaming control system of eligibility, prioritization, and real-time evaluation that helps digital advertisers hit their performance goals with AppNexus. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 09 Level: Intermediate
Secondary topics:  Model lifecycle management
Dave Shuman (Cloudera), Bryan Dean (Red Hat)
The focus on the IoT is turning increasingly to the edge, and the way to make the edge more intelligent is by building machine learning models in the cloud and pushing them back out to the edge. Dave Shuman and Bryan Dean explain how Cloudera and Red Hat executed this architecture at one of Europe's leading manufacturers, along with a demo highlighting this architecture. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Beginner
Adil Aijaz (Split Software)
Average rating: *****
(5.00, 1 rating)
Many products, whether data driven or not, chase “the one metric that matters.” It may be engagement, revenue, or conversion, but the common theme is the pursuit of improvement in one metric. Product development teams should instead focus on the design of metrics that measure our goals. Adil Aijaz shares an approach to designing metrics and discusses best practices and common pitfalls. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Non-technical
Secondary topics:  Ethics and Privacy, Machine Learning in the enterprise
Average rating: *****
(5.00, 1 rating)
Too often, the discussion of AI and ML includes an expectation—if not a requirement—for infallibility. But as we know, this expectation is not realistic. So what’s a company to do? While risk can’t be eliminated, it can be rationalized. Kimberly Nevala demonstrates how an unflinching risk assessment enables AI/ML adoption and deployment. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise, Media, Marketing, Advertising
Cassie Kozyrkov (Google)
Average rating: ****.
(4.30, 10 ratings)
Many organizations aren’t aware that they have a blindspot with respect to their lack of data effectiveness, and hiring experts doesn’t seem to help. Cassie Kozyrkov examines what it takes to build a truly data-driven organizational culture and highlights a vital yet often neglected job function: the data science manager. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Blockchain and decentralization, Data Platforms
Dan Harple (Context Labs)
Dan Harple explains how distributed systems are being influenced by and are influencing operational, financial, and social impact requirements of a wide range of enterprises and how trust in these distributed systems is being challenged, elevated, and resolved by engineers and architects today. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1 E15
Alen Capalik (FASTDATA.io), Jim McHugh (NVIDIA), SriSatish Ambati (H2O.ai), Tim Delisle (Datalogue)
Explore case studies from Datalogue, FASTDATA.io, and H20.ai that demonstrate how GPU-accelerated analytics, machine learning, and ETL help companies overcome slow queries and tedious data preparation process, dynamically correlate among data, and enjoy automatic feature engineering. Read more.

5:25pm

5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 01/02
Ivan Jibaja (Pure Storage)
Average rating: *****
(5.00, 1 rating)
Pure Storage runs over 70,000 tests per day. Using Spark’s flexible computing platform, the company can write a single application for both streaming and batch jobs so the company's team of triage engineers can understand the state of the continuous integration pipeline. Ivan Jibaja discusses the use case for big data analytics technologies, the architecture of the solution, and lessons learned. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 03
Dan Adams (Pitney Bowes)
The role of data and the demand to get it right, coupled with competitive pressures to move faster, have dramatically increased. Companies now recognize data as an asset and need to manage it that way. Join Dan Adams for the insights you need to ensure that your data addresses current and future needs and that your organization is set up for success. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 04/05
Mark Huang (Bell Canada)
Like all telecommunication giants, Bell Canada relies on huge volumes of data to make accurate business decisions and deliver better services. Mark Huang discusses why Bell Canada chose Kyvos’s OLAP on big data technology to achieve multidimensional analytics and how it helped the company deliver to its growing business reporting demands. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Financial Services
Joshua Patterson (NVIDIA), Onur Yilmaz (NVIDIA)
GPUs have allowed financial firms to accelerate their computationally demanding workloads. Today, the bottleneck has moved completely to ETL. The GPU Open Analytics Initiative (GoAi) is helping accelerate ETL while keeping the entire workflow on GPUs. Joshua Patterson and Onur Yilmaz discuss several GPU-accelerated data science tools and libraries. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 17
Sam Chance (Cambridge Semantics), Partha Bhattachargee (Cambridge Semantics)
Ben Szekely shares a vision for digital innovation: The data fabric connects enterprise data for unprecedented access in an overlay fashion that does not disrupt current investments. Interconnected and reliable data drives business outcomes by automating scalable AI and ML efforts. Graph technology is the way forward to realize this future. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Financial Services
Zachary Hanif (Capital One)
Average rating: ****.
(4.67, 3 ratings)
An understanding of graph-based analytical techniques can be extremely powerful when applied to modern practical problems, and modern frameworks and analytical techniques are making graph analysis methods viable for increasingly large, complex tasks. Zachary Hanif examines three prominent graph analytic methods, including graph convolutional networks, and applies them to concrete use cases. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Text and Language processing and analysis
Andreea Kremm (Netex Group), Mohammed Ibraaz Syed (UCLA)
Average rating: ****.
(4.00, 2 ratings)
Narrative economics studies the impact of popular narratives and stories on economic fluctuations in the context of human interests and emotions. Andreea Kremm and Mohammed Ibraaz Syed describe the use of emotion analysis, entity relationship extraction, and topic modeling in modeling narratives from written human communication. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 10 Level: Intermediate
Julien Le Dem (WeWork)
Average rating: *****
(5.00, 1 rating)
Big data infrastructure has evolved from flat files in a distributed filesystem to an efficient ecosystem to a fully deconstructed and open source database with reusable components. Julien Le Dem discusses the key open source components of the big data ecosystem and explains how they relate to each other and how they make the ecosystem more of a database and less of a filesystem. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Non-technical
Bethann Noble (Cloudera), Daniel Huss (State Street), Abhishek Kodi (State Street)
Average rating: ****.
(4.00, 1 rating)
Bethann Noble, Abhishek Kodi, and Daniel Huss share their experience and best practices for designing and executing on a roadmap for open data science and AI for business. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Model lifecycle management
Jay Kreps (Confluent)
Average rating: ****.
(4.00, 2 ratings)
Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. Jay Kreps explores some of the difficulties of building production machine learning systems and explains how Apache Kafka and stream processing can help. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Beginner
Secondary topics:  Data Integration and Data Pipelines, Financial Services
Do your analysts always trust the insights generated by your data platform? Ensuring insights are always reliable is critical for use cases in the financial sector. Sandeep Uttamchandani outlines a circuit breaker pattern developed for data pipelines, similar to the common design pattern used in service architectures, that detects and corrects problems and ensures always reliable insights. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 06
Randy Lea (Arcadia Data)
The use of data lakes continue to grow, and the right business intelligence (BI) and analytics tools on data lakes are critical to data lake success. Randy Lea explains why existing BI tools work well for data warehouses but not data lakes and why every organization should have two BI standards: one for data warehouses and one for data lakes. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Beginner
Secondary topics:  Data Integration and Data Pipelines
Nishith Agarwal (Uber), Balaji Varadarajan (Uber), Vinoth Chandar (Apache Hudi)
Uber has a real need to provide faster, fresher data to its data consumers and products, which are running hundreds of thousands of analytical queries every day. Nishith Agarwal, Balaji Varadarajan, and Vinoth Chandar share the design, architecture, and use cases of the second-generation of Hudi, an analytical storage engine designed to serve such needs and beyond. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 09 Level: Intermediate
Owen O'Malley (Cloudera), Ryan Blue (Netflix)
Average rating: ****.
(4.33, 3 ratings)
Owen O'Malley and Ryan Blue offer an overview of Iceberg, a new open source project that defines a new table layout with properties specifically designed for cloud object stores, such as S3. It provides a common set of capabilities such as partition pruning, schema evolution, atomic additions, removal, or replacements of files regardless of whether the data is stored in Avro, ORC, or Parquet. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Intermediate
Secondary topics:  Model lifecycle management
Diego Oppenheimer (Algorithmia)
Average rating: ****.
(4.50, 2 ratings)
After big investments in collecting and cleaning data and building machine learning (ML) models, enterprises face big challenges in deploying models to production and managing a growing portfolio of ML models. Diego Oppenheimer covers the strategic and technical hurdles each company must overcome and the best practices developed while deploying over 4,000 ML models for 70,000 engineers. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Non-technical
Secondary topics:  Media, Marketing, Advertising
John Thuma (Arcadia Data)
Average rating: *****
(5.00, 1 rating)
Forget about the fake news; data and analytics in politics is what drives elections. John Thuma shares ethical dilemmas he faced while proposing analytical solutions to the RNC and DNC. Not only did he help causes he disagreed with, but he also armed politicians with real-time data to manipulate voters. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy
Sanjeev Mohan (Gartner)
Average rating: *****
(5.00, 1 rating)
If the last few years were spent proving the value of data lakes, the emphasis now is to monetize the big data architecture investments. The rallying cry is to onboard new workloads efficiently. But how do you do so if you don’t know what data is in the lake, the level of its quality, or the trustworthiness of models? Sanjeev Mohan explains why data governance is the linchpin to success. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: Expo Hall
Secondary topics:  Machine Learning in the enterprise, Text and Language processing and analysis
Mike Tung (Diffbot)
Mike Tung offers an overview of available open source and commercial knowledge graphs and explains how consumer and business applications are already taking advantage of them to provide intelligent experiences and enhanced business efficiency. Mike then discusses what's coming in the future. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1 E15
Renee Yao (NVIDIA)
Average rating: *****
(5.00, 1 rating)
Renee Yao explains how generative adversarial networks (GAN) are successfully used to improve data generation and explores specific real-world examples where customers have deployed GANs to solve challenges in healthcare, space, transportation, and retail industries. Read more.

6:05pm

6:05pm–7:05pm Wednesday, 09/12/2018
Location: Expo Hall
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

7:05pm

7:05pm–7:30pm Wednesday, 09/12/2018
Location: TBD
Grey space closer slot only TBC

7:30pm

7:30pm–10:30pm Wednesday, 09/12/2018
Location: TAO Downtown
Average rating: *....
(1.00, 1 rating)
Don't miss an exciting evening filled with cocktails, food, and entertainment at Data After Dark at Strata in New York. Read more.

Thursday, 09/13/2018

8:00am

8:00am–8:45am Thursday, 09/13/2018
Location: 3E Foyer
Morning Coffee (45m)
8:00am–8:30am Thursday, 09/13/2018
Location: Crystal Palace
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:50am

8:50am–9:00am Thursday, 09/13/2018
Location: 3E
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.50, 2 ratings)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

9:00am

9:00am–9:15am Thursday, 09/13/2018
Location: 3E
Amber Case (MIT Media Lab)
Average rating: ****.
(4.65, 20 ratings)
Amber Case outlines several methods that product designers and managers can use to improve everyday interactions through an understanding and application of sound design. Read more.

9:15am

9:15am–9:20am Thursday, 09/13/2018
Location: 3E
Average rating: **...
(2.87, 15 ratings)
IBM Analytics’s Dinesh Nirmal solves school lunch and the struggle to keep ahead of regulations. With AI tech like deep learning and NLG, supplying meals to California’s kids leaps from enriching metadata for compliance to actionable insights for the business. Read more.

9:20am

9:20am–9:30am Thursday, 09/13/2018
Location: 3E
Hilary Mason (Cloudera Fast Forward Labs)
Average rating: ****.
(4.00, 11 ratings)
Machine learning and artificial intelligence are exciting technologies, but real value comes from marrying those capabilities with the right business problems. Hilary Mason explores the current state of these technologies, investigates what's coming next in applied machine learning, and explains how to identify and execute on the right business opportunities at the right time. Read more.

9:30am

9:30am–9:35am Thursday, 09/13/2018
Location: 3E
Average rating: ***..
(3.67, 9 ratings)
Data is the fuel for analytics and AI workloads, but the challenges in using it are constant. Ziya Ma discusses how recent innovations from Intel in high-capacity persistent memory and open source software are accelerating production-scale deployments, delivering breakthrough optimizations and faster insights to a wide range of opportunities in the digital enterprise. Read more.

9:35am

9:35am–9:55am Thursday, 09/13/2018
Location: 3E
Julia Angwin (ProPublica)
Average rating: ****.
(4.95, 21 ratings)
Algorithms are increasingly arbiters of forgiveness. Julia Angwin discusses what she has learned about forgiveness in her series of articles on algorithmic accountability and the lessons we all need to learn for the coming AI future. Read more.

9:55am

9:55am–10:00am Thursday, 09/13/2018
Location: 3E
Chad W. Jennings (Google)
Average rating: ***..
(3.45, 11 ratings)
Cities all over the world are using data and analytics to optimize infrastructure, but city planners are often held back by outdated data gathering methods and legacy analysis tools. Chad Jennings details how Geotab, a leader in IoT fleet logistics, brought BigQuery's unique machine learning and geospatial capabilities to its existing datasets to deliver a more capable solution to city planners. Read more.

10:05am

10:05am–10:20am Thursday, 09/13/2018
Location: 3E
Secondary topics:  Ethics and Privacy
Amanda Pustilnik (University of Maryland School of Law | Center for Law, Brain & Behavior, Mass. General Hospital)
Average rating: ****.
(4.50, 12 ratings)
Have you ever dreamed you could read minds? Do telekinesis? Maybe fly a magic carpet by thought alone? Until now, these powers have existed only in the realm of imagination or, more recently, video, AR, and VR games. Join Amanda Pustilnik to learn how brain-based human-machine interfaces are beginning to offer these powers in near-commercially-viable forms. Read more.

10:20am

10:20am–10:25am Thursday, 09/13/2018
Location: 3E
Ben Sharma (Zaloni)
Average rating: ***..
(3.00, 12 ratings)
Once, a company could live 60-70 years on the S&P 500. Now it averages 15 years. If companies were people, this would be an epidemic on par with the Black Plague. But the same things that dragged humanity out of that dark age can drag companies out of this one. Read more.

10:25am

10:25am–10:45am Thursday, 09/13/2018
Location: 3E
Secondary topics:  Ethics and Privacy
Jacob Ward (CNN | Al Jazeera | PBS)
Average rating: ****.
(4.73, 15 ratings)
For most of us, our own mind is a black box—an all-powerful and utterly mysterious device that runs our lives for us, using rules and shortcuts of which we aren’t even aware. Jacob Ward reveals the relationship between the unconscious habits of our minds and the way that AI is poised to amplify them, alter them, maybe even reprogram them. Read more.

10:50am

10:50am–11:20am Thursday, 09/13/2018
Location: 3B | Expo Hall
Morning break sponsored by IBM (30m)

11:20am

11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 01/02
Jennifer Shin (8 Path Solutions | NYU Stern | IBM)
Common wisdom dictates that we should never make assumptions, but assumptions are essential in the creation of statistical models. Jennifer Shin explores how assumptions fit into the creation of a statistical model, the pitfalls of applying a model to data without taking the underlying assumptions into account, and how to identify datasets where the model and its assumptions are applicable. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Beginner
Secondary topics:  Deep Learning
Lars Hulstaert (Microsoft)
Average rating: *****
(5.00, 1 rating)
Transfer learning allows data scientists to leverage insights from large labeled datasets. The general idea of transfer learning is to use knowledge learned from tasks for which a lot of labeled data is available in settings where little labeled data is available. Lars Hulstaert explains what transfer learning is and how it can boost your NLP or CV pipelines. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Text and Language processing and analysis
Andrew Montalenti (Parse.ly )
Average rating: *****
(5.00, 1 rating)
What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how Parse.ly has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Secondary topics:  Temporal data and time-series analytics
Cris Lowery (Baringa Partners), Marc Warner (ASI)
Average rating: ****.
(4.00, 1 rating)
In EU households, heating and hot water alone account for 80% of energy usage. Cristobal Lowery and Marc Warner explain how future home energy management systems could improve their energy efficiency by predicting resident needs through utilities data, with a particular focus on the key data features, the need for data compression, and the data quality challenges. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 10 Level: Intermediate
Secondary topics:  Data Platforms, Deep Learning
Jonathan Hung (LinkedIn), Keqiu Hu (LinkedIn), Zhe Zhang (LinkedIn)
Jonathan Hung, Keqiu Hu, and Zhe Zhang offer an overview of TensorFlow on YARN (TonY), a framework to natively run TensorFlow on Hadoop. TonY enables running TensorFlow distributed training as a new type of Hadoop application. Its native Hadoop connector, together with other features, aims to run TensorFlow jobs as reliably and flexibly as other first-class citizens on Hadoop. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Jeffrey Heer (Trifacta | University of Washington)
Average rating: ****.
(4.75, 4 ratings)
Jeffrey Heer offers an overview of Vega and Vega-Lite—high-level declarative languages for interactive visualization that support exploratory data analysis, communication, and the development of new visualization tools. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Holden Karau (Independent), Rachel Warren (Salesforce Einstein), Anya Bida (Salesforce)
Average rating: ****.
(4.00, 2 ratings)
Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Advanced
Ted Dunning (MapR, now part of HPE)
Average rating: ****.
(4.00, 4 ratings)
Stateful containers are a well-known anti-pattern, but the standard solution—managing state in a separate storage tier—is costly and complex. Recent developments have changed things dramatically for the better. In particular, you can now manage a high-performance software-defined-storage tier entirely in Kubernetes. Ted Dunning describes what's new and how it makes big data easier on Kubernetes. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 06
Arun Murugan (GE Digital), Jeff Miller (GE)
Average rating: **...
(2.00, 2 ratings)
Arun Murugan and Jeff Miller detail how complex relationships are discovered and modeled to simplify analytics while keeping an Agile architecture for data acquisition. You’ll see how GE uses machine learning (powered by Io-Tahoe) in data discovery and profiling for data engineering of the development of a standard data model essential to enterprise use cases. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Beginner
Secondary topics:  Temporal data and time-series analytics, Transportation and Logistics
Thomas Weise (Lyft), Mark Grover (Lyft)
Average rating: **...
(2.50, 2 ratings)
Thomas Weise and Mark Grover explain how Lyft uses its streaming platform to detect and respond to anomalous events, using data science tools for machine learning and a process that allows for fast and predictable deployment. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 09 Level: Advanced
Secondary topics:  Data Integration and Data Pipelines, Data preparation, governance and privacy, Media, Marketing, Advertising
Barbara Eckman (Comcast)
Average rating: ****.
(4.33, 6 ratings)
Comcast’s streaming data platform comprises ingest, transformation, and storage services in the public cloud, with Apache Atlas for data discovery and lineage. Barbara Eckman explains how Comcast recently integrated on-prem data sources, including traditional data warehouses and RDBMSs, which required its data governance strategy to include relational and JSON schemas in addition to Apache Avro. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise, Retail and e-commerce, Transportation and Logistics
Average rating: ****.
(4.75, 4 ratings)
Data scientists are hard to hire. But too often, companies struggle to find the right talent only to make avoidable mistakes that cause their best data scientists to leave. From org structure and leadership to tooling, infrastructure, and more, Michelangelo D'Agostino shares concrete (and inexpensive) tips for keeping your data scientists engaged, productive, and adding business value. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Beginner
Secondary topics:  Ethics and Privacy
Nuria Ruiz (Wikimedia)
The Wikipedia community feels strongly that you shouldn’t have to provide personal information to participate in the free knowledge movement. Nuria Ruiz discusses the challenges that this strong privacy stance poses for the Wikimedia Foundation, including how it affects data collection, and details some creative workarounds that allow WMF to calculate metrics in a privacy-conscious way. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 14 Level: Intermediate
Secondary topics:  Machine Learning in the enterprise, Retail and e-commerce
Mikio Braun (Zalando)
Average rating: **...
(2.75, 4 ratings)
In order to become "AI ready," an organization not only has to provide the right technical infrastructure for data collection and processing but also must learn new skills. Mikio Braun highlights three pieces companies often miss when trying to become AI ready: making the connection between business problems and AI technology, implementing AI-driven development, and running AI-based projects. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Data Platforms
Michelle Ufford (Netflix)
Average rating: ****.
(4.40, 5 ratings)
Michelle Ufford shares some of the cool things Netflix is doing with data and the big bets the company is making on data infrastructure, covering workflow orchestration, machine learning, interactive notebooks, centralized alerting, event-based processing, platform intelligence, and more. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 03/04/05
Bob Bradley (Geotab), Chad W. Jennings (Google)
Average rating: ****.
(4.50, 4 ratings)
If your company isn’t good at analytics, it’s not ready for AI. Bob Bradley and Chad W. Jennings explain how the right data strategy can set you up for success in machine learning and artificial intelligence—the new ground for gaining competitive edge and creating business value. You'll then see an in-depth demonstration of Google technology from smart cities innovator Geotab. Read more.

12:00pm

12:00pm–1:10pm Thursday, 09/13/2018
Location: Expo Hall (Hall 3B)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
12:00pm–1:10pm Thursday, 09/13/2018
Location: 3D 09
Average rating: *****
(5.00, 1 rating)
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.

1:10pm

1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 01/02
Shivnath Babu (Unravel Data Systems | Duke University), Madhusudan Tumma (TIAA)
Average rating: ****.
(4.00, 1 rating)
Operationalizing big data apps in a quick, reliable, and cost-effective manner remains a daunting task. Shivnath Babu and Madhusudan Tumma outline common problems and their causes and share best practices to find and fix these problems quickly and prevent such problems from happening in the first place. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Data Platforms, Deep Learning
Moty Fania (Intel), Sergei Kom (Intel)
Average rating: *****
(5.00, 1 rating)
Moty Fania and Sergei Kom share their experience and lessons learned implementing an AI inference platform to enable internal visual inspection use cases. The platform is based on open source technologies and was designed for real-time, streaming, and online actuation. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Health and Medicine, Text and Language processing and analysis
David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)
Average rating: ***..
(3.00, 4 ratings)
David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols outline a question-answering system for accurately extracting facts from free-text patient records and share best practices for training domain-specific deep learning NLP models. The solution is based on Spark NLP, an extension of Spark ML that provides state-of-the-art performance and accuracy for natural language understanding. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 08 Level: Non-technical
Secondary topics:  Data preparation, governance and privacy
Ihab Ilyas (University of Waterloo)
Average rating: *****
(5.00, 2 ratings)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas explains why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 10 Level: Intermediate
Secondary topics:  Data Platforms, Deep Learning, Model lifecycle management
Wangda Tan (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
In order to train deep learning and machine learning models, you must leverage applications such as TensorFlow, MXNet, Caffe, and XGBoost. Wangda Tan discusses new features in Apache Hadoop 3.x to better support deep learning workloads and demonstrates how to run these applications on YARN. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Secondary topics:  Ethics and Privacy, Financial Services, Media, Marketing, Advertising
Bob Levy (Virtual Cove, Inc.)
Average rating: ***..
(3.00, 1 rating)
Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Data Platforms, Transportation and Logistics
Milene Darnis (Uber)
Average rating: ****.
(4.22, 9 ratings)
Every new launch at Uber is vetted via robust A/B testing. Given the pace at which Uber operates, the metrics needed to assess the impact of experiments constantly evolve. Milene Darnis explains how the team built a scalable and self-serve platform that lets users plug in any metric to analyze. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Intermediate
Kaushik Deka (Novantas), Ted Gibson (Novantas)
Average rating: ****.
(4.50, 2 ratings)
Kaushik Deka and Ted Gibson share a large-scale optimization architecture in Spark for a consumer product portfolio optimization use case in retail banking. The architecture combines a simulator that distributes computation of complex real-world scenarios and a constraint optimizer that uses business rules as constraints to meet growth targets. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 06
Dave Huh (Hitachi Vantara), Kevin Haas (Hitachi Vantara)
Data in most organizations today is massive, messy, and often found in silos. With so many sources to analyze, data engineers need to construct robust data pipelines using automation and minimize duplicate processes, as computation is costly for big data. David Huh shares strategies to construct data pipelines for machine learning, including one to reduce time to insight from weeks to hours. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Intermediate
Jun Rao (Confluent)
Average rating: ****.
(4.00, 1 rating)
The controller is the brain of Apache Kafka and is responsible for maintaining the consistency of the replicas. Jun Rao outlines the main data flow in the controller, then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 09 Level: Non-technical
Secondary topics:  Transportation and Logistics
Shawn Terry (Komatsu Mining Corp)
Average rating: ****.
(4.50, 2 ratings)
Global heavy equipment manufacturer Komatsu is using IoT data to continuously monitor some of the largest mining equipment to ultimately improve mine performance and efficiencies. Shawn Terry details the company's data journey and explains how it is using advanced analytics and predictive modeling to drive insights on terabytes of IoT data from connected mining equipment. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 10/11
Faria Bruno (Amazon Web Services)
Average rating: ****.
(4.00, 1 rating)
Bruno Faria explains how to identify the components and workflows in your current environment and shares best practices to migrate these workloads to AWS. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
Average rating: ***..
(3.50, 2 ratings)
GDPR is more than another regulation to be handled by your back office. Enacting the GDPR's Data Subject Access Rights (DSAR) requires practical actions. Jean-Michel Franco outlines the practical steps to deploy governed data services. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise, Transportation and Logistics
Brandy Freitas (Pitney Bowes)
Average rating: ****.
(4.50, 6 ratings)
Data science is an approachable field given the right framing. Often, though, practitioners and executives are describing opportunities using completely different languages. Join Brandy Freitas to develop context and vocabulary around data science topics to help build a culture of data within your organization. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: Expo Hall Level: Beginner
Umur Cubukcu (Citus Data)
PostgreSQL is often regarded as the world’s most advanced open source database—and it’s on fire. Umur Cubukcu moves beyond the typical list of features in the next release to explore why so many new projects “just use Postgres” as their system of record (or system of engagement) at scale. Along the way, you’ll learn how PostgreSQL’s extension APIs are fueling innovations in relational databases. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 03/04/05
Ian Swanson (Oracle)
Ian Swanson explores why and how data scientists and line-of-business leaders must treat AI as a team sport and explains what tools are needed to deploy models and applications that truly inform decision making. Read more.

2:00pm

2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 01/02
Patrick Nussbaumer (Alteryx)
There is a lot of buzz around data science and machine learning in the world today. Unfortunately, to truly innovate with data and advanced capabilities, organizations need to expand their focus beyond just a few specialists. Patrick Nussbaumer details how focusing on people can help improve analytic value and drive innovation. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising
Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )
Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Transportation and Logistics
Ted Malaska (Capital One), Mark Grover (Lyft)
Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Financial Services
Archana Anandakrishnan (American Express)
Average rating: ***..
(3.20, 5 ratings)
Building accurate machine learning models hinges on the quality of the data. Errors and anomalies get in the way of data scientists doing their best work. Archana Anandakrishnan explains how American Express created an automated, scalable system for measurement and management of data quality. The methods are modular and adaptable to any domain where accurate decisions from ML models are critical. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 10 Level: Intermediate
Secondary topics:  Model lifecycle management
Michelle Casbon (Google)
Average rating: *****
(5.00, 2 ratings)
Michelle Casbon demonstrates how to build a machine learning application with Kubeflow. Kubeflow makes it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and supports the full lifecycle of an ML product, including iteration via Jupyter notebooks. Join Michelle to find out what Kubeflow currently supports and the long-term vision for the project. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Non-technical
Brent Dykes (Domo)
Average rating: ****.
(4.78, 9 ratings)
Companies collect all kinds of data and use advanced tools and techniques to find insights, but they often fail in the last mile: communicating insights effectively to drive change. Brent Dykes discusses the power that stories wield over statistics and explores the art and science of data storytelling—an essential skill in today’s data economy. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Data Platforms, Health and Medicine
Occhio Orsini (Aetna)
Occhio Orsini offers an overview of Aetna's Data Fabric platform. Join in to learn the needs and desires that led to the creation of the advanced analytics platform, explore the platform's architecture, technology, and capabilities, and understand the key technologies and capabilities that made it possible to build a hybrid solution across on-premises and cloud-hosted data centers. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Beginner
Secondary topics:  Data Platforms, Financial Services
Financial service clients demand increased data-driven personalization, faster insight-based decisions, and multichannel real-time access. Tim Walpole details how organizations can deliver real-time, vendor-agnostic, personalized chat services and explores issues around security, privacy, legal sign-off, data compliance, and how the internet of things can be used as a delivery platform. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 06
Deborah Reynolds (Pfizer), Kurt Muehmel (Dataiku)
Average rating: ****.
(4.00, 2 ratings)
By creating a collaborative and interactive analytic environment, a forward-thinking company may harness the best capabilities of its business analysts and data scientists to answer the company’s most pressing business questions. Deborah Reynolds and Kurt Muehmel explain how large enterprises can successfully put data at the core of everyday business decisions. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Beginner
Karthik Ramasamy (Streamlio), Matteo Merli (Streamlio)
Average rating: ****.
(4.50, 2 ratings)
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 09 Level: Beginner
Secondary topics:  Data Platforms, Retail and e-commerce, Transportation and Logistics
tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)
Average rating: ***..
(3.00, 1 rating)
Tao Huang, Mang Zhang, and 白冰 explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Beginner
Josh Laurito (Squarespace)
Joshua Laurito explores systems Squarespace built for acquiring and enforcing consistency on obtained data and for inferring conclusions from a company’s marketing and product initiatives. Joshua discusses the intricacies of gathering and evaluating marketing and user data, from raising awareness to driving purchases, and shares results of previous analyses. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Non-technical
Secondary topics:  Health and Medicine, Text and Language processing and analysis
Chiny Driscoll (MetiStream), Jawad Khan (Rush University Medical Center )
Average rating: ****.
(4.00, 5 ratings)
Chiny Driscoll and Jawad Khan offer an overview of a solution by Cloudera and MetiStream that lets healthcare providers automate the extraction, processing, and analysis of clinical notes within an electronic health record in batch or real time, improving care, identifying errors, and recognizing efficiencies in billing and diagnoses. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 14 Level: Non-technical
Dean Wampler (Anyscale)
Streaming data systems, so called "fast data," promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler shares what you need to know to exploit fast data successfully. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Model lifecycle management
Chris Fregly (Amazon Web Services)
Average rating: ***..
(3.50, 2 ratings)
Chris Fregly details a full-featured, open source end-to-end TensorFlow model training and deployment system, using the latest advancements with Kubernetes, TensorFlow, and GPUs. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 03/04/05
Kyle Davis (Redis Labs)
Average rating: *****
(5.00, 1 rating)
Kyle Davis explains how Redis can be used for ingesting high-velocity data from large-scale platforms and IoT data collections as well as for storing and querying data using probabilistic data structures that trade some precision for both higher speed and lower storage requirements. Along the way, Kyle shares examples and a demo of the solution. Read more.

2:30pm

2:30pm–3:30pm Thursday, 09/13/2018
Location: 3B | Expo Hall
Afternoon break sponsored by Google Cloud (1h)

3:30pm

3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 01/02 Level: Intermediate
Ajay Kulkarni (TimescaleDB)
Average rating: ****.
(4.00, 2 ratings)
Ajay Kulkarni explores the underlying changes that are characterizing the next wave of computing and shares several ways in which individual businesses and overall industries will be transformed. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Advanced
Secondary topics:  Deep Learning
Ash Munshi (Pepperdata)
Ash Munshi outlines a technique for labeling applications using runtime measurements of CPU, memory, and network I/O along with a deep neural network. This labeling groups the applications into buckets that have understandable characteristics, which can then be used to reason about the cluster and its performance. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Beginner
Secondary topics:  Temporal data and time-series analytics
Jared Lander (Lander Analytics)
Average rating: *****
(5.00, 3 ratings)
Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 08 Level: Non-technical
Secondary topics:  Financial Services
Emily Riederer (Capital One)
Emily Riederer explains how best practices from data science, open source, and open science can solve common business pain points. Using a case example from Capital One, Emily illustrates how designing empathetic analytical tools and fostering a vibrant InnerSource community are keys to developing reproducible and extensible business analysis. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 10 Level: Intermediate
Oleksii Kachaiev (Attendify)
Average rating: ***..
(3.50, 2 ratings)
When we talk about microservices, we usually focus on the communication layer. In practice, data is the much harder and often overlooked problem. Splitting applications into independent units leads to increased complexity, such as structural and semantic changes, knowledge sharing, and data discovery. Join Alexey Kachayev to explore emerging technologies created to tackle these challenges. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Anna Nicanorova (Annalect)
Average rating: ***..
(3.00, 3 ratings)
Data visualization is supposed to be our map to information. However, contemporary charting techniques have a few shortcomings, including context reduction, hard numeric grasp, and perceptual dehumanization. Anna Nicanorova explains how augmented reality can solve these issues by presenting an intuitive and interactive environment for data exploration. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Ramesh Krishnan (lmco), Steven Morgan (Lockheed Martin)
Average rating: ****.
(4.00, 1 rating)
Lockheed Martin is a data-driven company with a massive variety and volume of data. To extract the most value from its information assets, the company is constantly exploring ways to enable effective self-service scenarios. Ramesh Krishnan and Steve Morgan discuss Lockheed Martin's journey into modern analytics and explore its analytics platform focused on leveraging AWS GovCloud. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Beginner
Jonathan Ellis (DataStax)
Average rating: ****.
(4.50, 2 ratings)
Is open source Apache Cassandra still relevant in an era of hosted cloud databases? Jonathan Ellis discusses Cassandra’s strengths and weaknesses relative to Amazon DynamoDB, Microsoft CosmosDB, and Google Cloud Spanner. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 06
Antonio Fragoso (Globant)
Average rating: *....
(1.00, 1 rating)
Antonio Fragoso explores the key aspects of implementing a natural language processing project within your organization and reveals the necessary steps for making it a success. Antonio focuses on how to leverage an iterative process that can pave the way toward building a successful product. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Temporal data and time-series analytics
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
The StreamDM library provides the largest collection of data stream mining algorithms for Spark. Heitor Murilo Gomes and Albert Bifet explain how to use StreamDM and Structured Streaming to develop, apply, and evaluate learning models specially for nonstationary streams (i.e., those with concept drifts). Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 09 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines, Data Platforms, Financial Services
Kevin Lu (PayPal), Maulin Vasavada (PayPal), Na Yang (PayPal)
Average rating: ****.
(4.00, 3 ratings)
PayPal is one of the biggest Kafka users in the industry; it manages and maintains over 40 production Kafka clusters in three geodistributed data centers and supports 400 billion Kafka messages a day. Kevin Lu, Maulin Vasavada, and Na Yang explore the management and monitoring PayPal applies to Kafka, from client-perceived statistics to configuration management, failover, and data loss auditing. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Non-technical
Secondary topics:  Data Platforms, Media, Marketing, Advertising, Retail and e-commerce
Francesco Mucio (Francescomuc.io)
Average rating: ***..
(3.50, 2 ratings)
Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Ethics and Privacy
LaVonne Reimer, JD (Lumenous)
GDPR asks us to rethink personal data systems—viewing UI/UX, consent management, and value-add data services through the eyes of subjects of the data. LaVonne Reimer explains why the opportunity in the $150B credit and risk industry is to deploy data governance technologies that balance the interests of individuals to control their own data with requirements for trusted data. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 14 Level: Non-technical
Paco Nathan (derwen.ai)
Average rating: ***..
(3.00, 1 rating)
Deep learning works well when you have large labeled datasets, but not every team has those assets. Paco Nathan offers an overview of active learning, an ML variant that incorporates human-in-the-loop computing. Active learning focuses input from human experts, leveraging intelligence already in the system, and provides systematic ways to explore and exploit uncertainty in your data. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 03/04/05 Level: Intermediate
Secondary topics:  Financial Services, Temporal data and time-series analytics
Revant Nayar (FMI Technologies LLC )
Average rating: *....
(1.50, 2 ratings)
Machine learning has so far underperformed in time series prediction (slowness and overfitting), and classical methods are ineffective at capturing nonlinearity. Revant Nayar shares an alternative approach that is faster and more transparent and does not overfit. It can also pick up regime changes in the time series and systematically captures all the nonlinearity of a given dataset. Read more.

4:20pm

4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 01/02
Jennifer Shin (8 Path Solutions | NYU Stern | IBM)
Common wisdom dictates that we should never make assumptions, but assumptions are essential in the creation of statistical models. Jennifer Shin explores how assumptions fit into the creation of a statistical model, the pitfalls of applying a model to data without taking the underlying assumptions into account, and how to identify datasets where the model and its assumptions are applicable. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Beginner
Secondary topics:  Deep Learning
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
Average rating: *****
(5.00, 3 ratings)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Bill Franks (International Institute For Analytics)
Drawing on a recent study of the analytics maturity level of large enterprises by the International Institute for Analytics, Bill Franks discusses how maturity varies by industry, shares key steps organizations can take to move up the maturity scale, and explains how the research correlates analytics maturity with a wide range of success metrics, including financial and reputational measures. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Secondary topics:  Financial Services, Model lifecycle management
Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)
Large financial institutions have many data science teams (e.g., those for fraud, credit risk, and marketing), each often using diverse set of tools to build predictive models. There are many challenges involved in productionizing these predictive AI models. Harish Doddi and Jerry Xu share challenges and lessons learned deploying AI models to production in large financial institutions. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 10 Level: Intermediate
Amandeep Khurana shares critical data management practices for easy and unified data access that meets security and regulatory compliance, helping you avoid the pitfalls that could lead to complex expensive architectures. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Brian O'Neill (Designing for Analytics)
Average rating: *****
(5.00, 5 ratings)
Gartner says 85%+ of big data projects will fail, despite the fact your company may have invested millions on engineering implementation. Why are customers and employees not engaging with these products and services? Brian O'Neill explains why a "people first, technology second" mission—a design strategy, in other words—enables the best UX and business outcomes possible. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Recommendation Systems
Nir Yungster (JW Player), Kamil Sindi (JW Player)
JW Player—the world’s largest network-independent video platform, representing 5% of global internet video—provides on-demand recommendations as a service to thousands of media publishers. Nir Yungster and Kamil Sindi explain how the company is systematically improving model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 23/24 Level: Beginner
Secondary topics:  Data Integration and Data Pipelines
Kenji Hayashida (Recruit Lifestyle co., ltd.), Toru Sasaki (NTT DATA Corporation)
Average rating: ****.
(4.50, 2 ratings)
Recruit Group and NTT DATA Corporation have developed a platform based on a data hub, utilizing Apache Kafka. This platform can handle around 1 TB/day of application logs generated by a number of services in Recruit Group. Kenji Hayashida and Toru Sasaki share best practices for and lessons learned about topics such as schema evolution and network architecture. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Beginner
TIMOTHY SPANN (Cloudera)
Average rating: ****.
(4.00, 2 ratings)
Timothy Spann leads a hands-on deep dive into using Apache MiniFi with Apache MXNet and other deep learning libraries on edge devices. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1E 09 Level: Intermediate
Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn)
Average rating: *****
(5.00, 1 rating)
Have you ever tuned a Spark or MR job? If the answer is yes, you already know how difficult it is to tune more than hundred parameters to optimize the resources used. Manoj Kumar, Pralabh Kumar, and Arpan Agrawal offer an overview of TuneIn, an auto-tuning tool developed to minimize the resource usage of jobs. Experiments have shown up to a 50% reduction in resource usage. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Beginner
Secondary topics:  Transportation and Logistics
Yasuyuki Kataoka (NTT Innovation Institute, Inc.)
Average rating: ***..
(3.00, 4 ratings)
One of the challenges of sports data analytics is how to deliver machine intelligence beyond a mere real-time monitoring tool. Yasuyuki Kataoka highlights various real-time machine learning models in both IndyCar and Tour de France, sharing real-time data processing architectures, machine learning models, and demonstrations that deliver meaningful insights for players and fans. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Beginner
Secondary topics:  Machine Learning in the enterprise
Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)
Average rating: **...
(2.67, 6 ratings)
With the growing buzz around data science, many professionals want to learn how to become a data scientist—the role Harvard Business Review called the "sexiest job of the 21st century." Francesca Lazzeri and Jaya Mathew explain what it takes to become a data scientist and how artificial intelligence solutions have started to reinvent businesses. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1E 14
Mathew Lodge (Anaconda)
Average rating: *****
(5.00, 1 rating)
The days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. Welcome to the future. Containers and Kubernetes make great language-agnostic distributed computing clusters: it's just as easy to deploy Python as it is Java. Mathew Lodge shows you how. Read more.