Sep 23–26, 2019

Monday, 09/23/2019

9:00am

9:00am–5:00pm Monday, September 23, 2019
Michael Li (The Data Incubator), Gonzalo Diaz (The Data Incubator)
Michael Li and Gonzalo Diaz provide a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and use their input and analysis for your business’s strategic priorities and decision making. Read more.
9:00am–5:00pm Monday, September 23, 2019
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ)
Recommendation systems play a significant role—for users, a new world of options; for companies, it drives engagement and satisfaction. Amit Kapoor and Bargava Subramanian walk you through the different paradigms of recommendation systems and introduce you to deep learning-based approaches. You'll gain the practical hands-on knowledge to build, select, deploy, and maintain a recommendation system. Read more.
9:00am–5:00pm Monday, September 23, 2019
Training
Sponsored
Jeff Davis (Google Cloud)
Jeff Davis provides a hands-on introduction to designing and building machine learning models on structured data on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you'll learn machine learning (ML) concepts and how to implement them using both BigQuery Machine Learning and TensorFlow and Keras. Read more.
9:00am–5:00pm Monday, September 23, 2019
Michael Cullan (The Data Incubator)
Michael Cullan walks you through developing a machine learning pipeline from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
9:00am–5:00pm Monday, September 23, 2019
Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Nikki Rouda (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join the AWS team to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
9:00am–5:00pm Monday, September 23, 2019
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but you face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by outlining the abstractions common to these systems. You'll go hands-on exercises to overcome obstacles to getting started using new tools. Read more.
9:00am–5:00pm Monday, September 23, 2019
Jesse Anderson (Big Data Institute)
Jesse Anderson offers you an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it, as well as how to create consumers and publishers. You'll take a look Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.
9:00am–5:00pm Monday, September 23, 2019
Dylan Bargteil (The Data Incubator)
The TensorFlow library provides for the use of computational graphs with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil explores TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.

10:30am

10:30am–11:00am Monday, September 23, 2019
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, September 23, 2019
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, September 23, 2019
Afternoon break (30m)

7:00pm

7:00pm–9:00pm Monday, September 23, 2019
Event
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine New York has to offer. Read more.

Tuesday, 09/24/2019

9:00am

9:00am–5:00pm Tuesday, September 24, 2019
Training
Sponsored
Matt Kirk (YourChiefScientist.com), Miguel Maldonado (IBM)
Note: This free workshop, courtesy of IBM, is open to the first 50 registrants. You'll take a fascinating deep dive into the power and applications of machine learning in the enterprise. Read more.
9:00am–5:00pm Tuesday, September 24, 2019
David Boyle (Audience Strategies), Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Capital One), Victoriya Kalmanovich (Navy), Tusharadri Mukherjee (Lenovo), David Boyle (Audience Strategies), Richard Evans (Statistics Canada), Leah Xu (Spotify), Victoriya Kalmanovich (Navy), Moise Convolbo (Rakuten), Martin Mendez-Costabel (Bayer Crop Science), gloria macia (Roche AG), Gwen Campbell (Revibe Technologies), Moise Convolbo (Rakuten), Muhammed Idris (Capria VC | TeraCrunch)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata. Read more.
9:00am–5:00pm Tuesday, September 24, 2019
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Brian Lynch (TD Bank Group), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Karan Jaswal (Cinchy), Moto Tohda (Tokyo Century (USA)), Viridiana Lourdes (Ayasdi), Peter Swartz (Altana Trade), Mikheil Nadareishvili (TBC Bank)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Secondary topics:  Culture and Organization
Rossella Blatt Vital (Wonderlic), Ross Piper (Wonderlic), Daniel Schmerling (Wonderlic)
Creating and leading a successful ML strategy is an elegant orchestration of many components: master key ML concepts, operationalize ML workflow, prioritize highest-value projects, build a high-performing team, nurture strategic partnerships, align with the company’s mission, etc. Rossella Blatt Vital details insights and lessons learned in how to create and lead a flourishing ML practice. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Sourav Dey (Manifold), Jakov Kucan (Manifold)
Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You'll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but you need to decide what to do when you run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, explores some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python, including NLTK, TextBlob, spaCy, and gensim. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Viktor Gamov (Confluent)
Building stream processing applications is certainly one of the hot topics in the IT community. But if you've ever thought you needed to be a programmer to do stream processing and build stream processing data pipelines, think again. Viktor Gamov explores KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera), Andre Araujo (Cloudera), Hemanth Yamijala (Cloudera)
There are too many edge devices and agents, and you need to control and manage them. Purnima Reddy Kuchikulla, Timothy Spann, Abdelkrim Hadjidj, and Andre Araujo walk you through handling the difficulty in collecting real-time data and the trouble with updating a specific set of agents with edge applications. Get your hands dirty with CEM, which addresses these challenges with ease. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Secondary topics:  Deep Learning
Bruno Goncalves (Data For Science, Inc)
You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Gonçalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
James Morantus (Cloudera), Tony Huinker (Cloudera), Naren Koneru (Cloudera), Ramachandran Venkatesh (Cloudera), Gunther Hagleitner (Cloudera), Olli Draese (Cloudera)
Organizations now run diverse, multidisciplinary, big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature. There are many challenges with moving these workloads to the cloud. In this talk we start off with a technical deep... Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Secondary topics:  Privacy and Security
Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera)
New regulations drive compliance, governance, and security challenges for big data. Infosec and security groups must ensure a secured and governed environment across workloads that span on-premises, private cloud, multicloud, and hybrid cloud. Mark Donsky, Lars George, Michael Ernest, and Ifigeneia Derekli outline hands-on best practices for meeting these challenges with special attention to CCPA. Read more.
9:00am–12:30pm Tuesday, September 24, 2019
Jules Damji (Databricks)
ML development brings many new complexities beyond the software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information. Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. Read more.

10:30am

10:30am–11:00am Tuesday, September 24, 2019
Morning break sponsored by Microsoft (30m)

12:30pm

12:30pm–1:30pm Tuesday, September 24, 2019
Lunch (1h)

1:30pm

1:30pm–5:00pm Tuesday, September 24, 2019
Secondary topics:  Culture and Organization
Alexander Izydorczyk (Coatue Managment), Benjamin Singleton (JetBlue), Joshua Poduska (Domino Data Lab)
The honeymoon era of data science is ending and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. The speakers explore how leading organizations take a holistic approach to people, process, and technology to build a sustainable advantage. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include Word2Vec, recurrent neural networks (RNNs) and variants (long short-term memory [LSTM] and gated recurrent unit [GRU]), and convolutional neural networks. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, Saif Addin Ellafi, and Claudiu Branzan walk you through state-of-the-art natural language processing (NLP) using the highly performant, highly scalable open source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Secondary topics:  Privacy and Security
Carolyn Duby (Cloudera), Madhan Neethiraj (Cloudera), Michael Gregory (Cloudera), Sangeeta Doraiswamy (cloudera)
Bring your laptop, roll up your sleeves, and get ready to crunch some cybersecurity events with Apache Metron, an open source big data cybersecurity platform. Carolyn Duby walks you through how Metron finds actionable events in real time. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Secondary topics:  Culture and Organization
Ted Malaska (Capital One), Jonathan Seidman (Cloudera), Matthew Schumpert (Cloudera, Inc.), Raman Rajasekhar (Cloudera Inc), Krishna Maheshwari (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this has led to new challenges for organizations in creating successful data practices. Ted Malaska and Jonathan Seidman detail guidelines and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Sophie Watson (Red Hat), William Benton (Red Hat)
Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Mark Madsen (Teradata), Todd Walter (Archimedata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera), Attila Kanto (Cloudera), Tony Wu (Cloudera)
Kafka is omnipresent and the backbone of streaming analytics applications and data lakes. The challenge is understanding what's going on overall in the Kafka cluster, including performance, issues, and message flows. Purnima Reddy Kuchikulla and Dan Chaffelson walk you through a hands-on experience to visualize the entire Kafka environment end-to-end and simplify Kafka operations via SMM. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
Boris Lublinsky and Dean Wampler examine ML use in streaming data pipelines, how to do periodic model retraining, and low-latency scoring in live streams. Learn about Kafka as the data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, metadata tracking, and more. Read more.
1:30pm–5:00pm Tuesday, September 24, 2019
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.

3:00pm

3:00pm–3:30pm Tuesday, September 24, 2019
Afternoon break sponsored by Dataiku (30m)

5:00pm

5:00pm–6:30pm Tuesday, September 24, 2019
Event
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Tuesday. Read more.

Wednesday, 09/25/2019

8:00am

8:00am–8:30am Wednesday, September 25, 2019
Event
Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:30am

8:30am–8:45am Wednesday, September 25, 2019
Early morning coffee (8:00am - 8:45am) (15m)

8:45am

8:45am–8:50am Wednesday, September 25, 2019
Keynote
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

8:50am

8:50am–9:05am Wednesday, September 25, 2019
Keynote
Mick Hollison (Cloudera), Hillery Hunter (IBM)
Learn how IBM and Cloudera are fueling innovation in IoT, streaming, data warehouse and machine learning, and making their customer’s digital transformation journey easier, faster and safer. Read more.

9:05am

9:05am–9:15am Wednesday, September 25, 2019
Keynote
Ben Lorica (O'Reilly)
Ben Lorica dives into emerging technologies for building data infrastructures and machine learning platforms. Read more.

9:15am

9:15am–9:30am Wednesday, September 25, 2019
Keynote
Sara Menker (Gro Intelligence), Nemo Semret (Gro Intelligence)
Sara Menker, CEO, Gro Intelligence Read more.

9:30am

9:30am–9:40am Wednesday, September 25, 2019
Keynote
James Malone (Google)
Open source has always been a core pillar of Google Cloud’s data and analytics strategy. James Malone examines how, as the community continues to set industry standards, the company continues to integrate those standards into its services so organizations around the world can unlock the value of data faster. Read more.

9:40am

9:40am–10:00am Wednesday, September 25, 2019
Keynote
Robert Thomas (IBM), Tim O'Reilly (O'Reilly Media)
AI has the potential to add $16 trillion global economy by 2030, but adoption has been slow. While we understand the power of AI, many of us aren’t sure how to fully unleash its potential. Join Robert Thomas and Tim O'Reilly to learn that the reality is AI isn't magic. It’s hard work. Read more.

10:00am

10:00am–10:05am Wednesday, September 25, 2019
Keynote
Jeremy Rader (Intel)
Data analytics is the long-standing but constantly evolving science that companies leverage for insight, innovation, and competitive advantage. Jeremy Rader explores Intel’s end-to-end data pipeline software strategy designed and optimized for a modern and flexible data-centric infrastructure that allows for the easy deployment of unified advanced analytics and AI solutions at scale. Read more.

10:05am

10:05am–10:20am Wednesday, September 25, 2019
Keynote
Swatee Singh (American Express)
The financial services industry is increasingly using disruptive technology—including AI and machine learning, edge computing, blockchain, mobile and mixed reality, virtual assistants, and quantum computing to name a few—to enhance the customer experience and personalize their interactions with customers. Swatee Singh outlines how the same is true at American Express. Read more.

10:20am

10:20am–10:25am Wednesday, September 25, 2019
Keynote
Nikita Shamgunov (MemSQL)
Data is now the world’s most valuable resource, with winners and losers decided every day by how well we collect, analyze, and act on data. However, most companies struggle to unlock the full value of their data, using outdated, outmoded data infrastructure. Nikita Shamgunov examines how businesses use data, the new demands on data infrastructure, and what you should expect from your tools. Read more.

10:25am

10:25am–10:30am Wednesday, September 25, 2019
Keynote
Siva Sivakumar (Cisco)
Siva Sivakumar explains the Cisco Data Intelligence Platform (CDIP), which is a cloud-scale architecture that brings together big data, AI and compute farm, and storage tiers to work together as a single entity, while also being able to scale independently to address the IT issues in the modern data center. Read more.

10:30am

10:30am–10:45am Wednesday, September 25, 2019
Keynote
Patrick Lucey (Stats Perform)
Imagine watching sports and being able to immediately find all plays that are similar to what just happened. Better still, imagine being able to draw a play with the Xs and Os on an interface like a coach draws on a chalkboard and instantaneously finding all the similar plays and conduct analytics on those plays. Join Patrick Lucey to see how this is possible. Read more.

10:50am

10:50am–11:20am Wednesday, September 25, 2019
Morning break sponsored by Intel (30m)

11:20am

11:20am–12:00pm Wednesday, September 25, 2019
Session
Sponsored
James Malone (Google)
James Malone takes a deep dive into how customers across the world partner with Google Cloud to reimagine big data processing and data lakes while generating incredible business value. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Session
Sponsored
Jeremy Rader (Intel)
This session will reveal first-hand insights of an Intel analytics practitioner, share Intel IT’s own data maturity journey and provide actionable best known methods (BKMs) for Enterprises amidst transformation into an intelligent data-first business. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Session
Sponsored
Praveen Chitrada (Akamai Technologies)
Praveen Chitrada walks you through how Akamai uses MemSQL, Docker, Airflow, Prometheus, and other technologies as an enabler to streamline and accelerate data ingestion and calculation to generate usage metrics for billing, reporting, and analytics at massive scale. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Navinder Pal Singh Brar (Walmart Labs)
Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Session
Sponsored
Han Yang (Cisco), Karthik Kulkarni (Cisco)
Artificial intelligence and machine learning are well beyond the laboratory exploratory stage of deployment. In fact, the speed of AI and ML deployment has a huge impact on an organization’s financial income. Chiang Yang and Karthik Kulkarni explore how the Cisco Data Intelligence Platform can help bridge the gap between AI and ML and big data. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Ted Dunning (MapR)
Feature engineering is generally the section that gets left out of machine learning books, but it's also the most critical part in practice. Ted Dunning explores techniques, a few well known, but some rarely spoken of outside the institutional knowledge of top teams, including how to handle categorical inputs, natural language, transactions, and more in the context of machine learning. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Evgeny Vinogradov (Yandex.Money)
With a microservice architecture, a data warehouse is the first place where all the data meets. It's supplied by many different data sources and used for many purposes—from near-online transactional processing (OLTP) to model fitting and real-time classifying. Evgeny Vinogradov details his experience in managing and scaling data for support of 20+ product teams. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Moty Fania (Intel)
Moty Fania details Intel’s IT experience of implementing a sales AI platform. This platform is based on streaming, microservices architecture with a message bus backbone. It was designed for real-time data extraction and reasoning and handles the processing of millions of website pages and is capable of sifting through millions of tweets per day. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Steven Touw (Immuta)
Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data security and privacy, they’re everywhere. Over the past four years, data security and privacy anti-patterns have emerged across hundreds of customers and industry verticals—there's been an obvious trend. Steven Touw details five anti-patterns and, more importantly, the solutions for them. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Secondary topics:  Culture and Organization
Brian Dalessandro (Capital One)
While data science value is well recognized within tech, experience across industries shows that the ability to realize and measure business impact is not universal. A core issue is that data science programs face unique risks many leaders aren’t trained to hedge against. Brian Dalessandro addresses these risks and advocates for new ways to think about and manage data science programs. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Janet Haven (Data & Society)
Join Data & Society Research Institute Executive Director Janet Haven for a deep dive into research, case studies and emerging governance approaches to creating the rules of ethical AI. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Secondary topics:  Ethics
Harsha Nori (Microsoft), Samuel Jenkins (Microsoft), Rich Caruana (Microsoft)
Understanding decisions made by machine learning systems is critical for sensitive uses, ensuring fairness, and debugging production models. Interpretability presents options for trying to understand model decisions. Harsha Nori, Sameul Jenkins, and Rich Caruana explore the tools Microsoft is releasing to help you train powerful, interpretable models and interpret existing black box systems. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Meir TOLEDANO (Anodot)
ARIMA has been used for time series modeling for decades. In practice, most time series collected from human activities exhibit seasonal patterns, but the efficient estimation of seasonal ARIMA ((S)ARIMA) models was inefficient for decades. Meir Toledano explains how Anodot was able to apply the technique for forecasting and anomaly detection for millions of time series every day. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Nan Zhu (Uber), Felix Cheung (Uber)
XGBoost has been widely deployed in companies across the industry. Nan Zhu and Felix Cheung dive into the internals of distributed training in XGBoost and demonstrate how XGBoost resolves the business problem in Uber with a scale to thousands of workers and tens of TB of training data. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Paige Roberts (Vertica), Deepak Majeti (Vertica)
GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
David Talby (Pacific AI)
Machine learning and data science systems often fail in production in unexpected ways. David Talby outlines real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
11:20am–12:00pm Wednesday, September 25, 2019
Session
Sponsored
Madhu Kochar (IBM)
An economic revolution is underway, driven by advancements in AI and multicloud technologies. Businesses are crafting strategic plans to modernize their data architecture for this emerging reality, and at the top of their wish list is the ability to virtualize all their data regardless of where it lives. Madhu Kochar explores the data advancements on the horizon. Read more.

12:00pm

12:00pm–1:15pm Wednesday, September 25, 2019
Lunch sponsored by Google Cloud (1h 15m)
12:00pm–1:15pm Wednesday, September 25, 2019
Event
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.
12:00pm–1:15pm Wednesday, September 25, 2019
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
12:00pm–1:15pm Wednesday, September 25, 2019
Event
If you’d like to make new professional connections and hear ideas for supporting diversity in the tech community, come to the diversity and inclusion networking lunch on Wednesday. Read more.

12:30pm

12:30pm–1:10pm Wednesday, September 25, 2019
Session
Blake DuBois (Google)
Taking advantage of cloud infrastructure and analytic services is a must for any digital enterprise. Join Google Cloud as they discuss 10 things you should know about running and migrating on-prem Hadoop deployments to GCP. Read more.

1:15pm

1:15pm–1:55pm Wednesday, September 25, 2019
Session
Sponsored
Diana Shaw (SAS)
Companies today are working to adopt data-driven mind-sets, strategies, and cultures. Yet the ugly truth is many still struggle to make analytics actionable. Diana Shaw outlines a simple, powerful, and automated solution to operationalize all types of analytics at scale. You'll learn how to put analytics into action while providing model governance and data scalability to drive real results. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Session
Sponsored
John DesJardins (Hazelcast)
In this talk, we will explore the challenges with integrating real-time stream processing and machine learning into banking and capital markets applications. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
See-Kit Lam (Malwarebytes), Darren Chinen (Malwarebytes)
Developing, deploying and managing AI and anomaly detection models is tough business. See-Kit Lam details how Malwarebytes has leveraged containerization, scheduling, and orchestration to build a behavioral detection platform and a pipeline to bring models from concept to production. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Michael Noll (Confluent)
Would you cross the street with traffic information that's a minute old? Certainly not. Modern businesses have the same needs. Michael Noll explores why and how you can use Kafka and its growing ecosystem to build elastic event-driven architectures. Specifically, you look at Kafka as the storage layer, at Kafka Connect for data integration, and at Kafka Streams and KSQL as the compute layer. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Session
Sponsored
Peter Wang (Anaconda)
Peter Wang explores why data science shouldn’t be seen as merely another technical job within the business and why open source is such a critical aspect of innovation in the field of data science. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Secondary topics:  Deep Learning
Shioulin Sam (Cloudera Fast Forward Labs)
Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn)
Join Swasti Kakker, Manu Ram Pandit, and Vidya Ravivarma to explore what's offered by a flexible and scalable hosted data science platform at LinkedIn. It provides features to seamlessly develop in multiple languages, enforce developer best practices, governance policies, execute, visualize solutions, efficient knowledge management, and collaboration to improve developer productivity. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Wim Stoop (Cloudera), Srikanth Venkat (Cloudera)
Establishing enterprise-wide security and governance remains a challenge for most organizations. Integrations and exchanges across the landscape are costly to manage and maintain, and typically work in one direction only. Wim Stoop and Srikanth Venkat explore how ODPi's Egeria standard and framework removes the challenges and is leveraged by Cloudera and partners alike to deliver value. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
The Apache Parquet community is working on a column encryption mechanism that protects sensitive data and enables access control for table columns. Many companies are involved, and the mechanism specification has recently been signed off on by the community management committee. Gidon Gershinsky explores the basics of Parquet encryption technology, its usage model, and a number of use cases. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Felipe Hoffa (Google), Bob Bradley (Geotab)
Geotab is a world-leading asset-tracking company with millions of vehicles under service every day. Felipe Hoffa and Bob Bradley examine the challenges and solutions to create an ML- and geographic information system- (GI)S enabled petabyte-scale data warehouse leveraging Google Cloud. And they dive into the process to publish open, how you can access it, and how cities are using it. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Secondary topics:  Ethics, Privacy and Security
Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum), David Florsek (IDEMIA NSS), Alex Beutel (Google Brain), Chris Wheeler (Mastercard)
Machine learning techniques are being deployed across almost every industry and sector. But this adoption comes with real, and oftentimes underestimated, privacy and security risks. Andrew Burt and Brenda Leong convene a panel of experts including David Florsek, Chris Wheeler, and Alex Beutel to detail real-life examples of when ML goes wrong, and the lessons they learned. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering)
Recruiting patients for clinical trials is a major challenge in drug development. Saif Addin Ellafi and Scott Hoch explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They dive into the technical challenges, the architecture of the full solution, and the lessons the company learned. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Every NLP-based document-processing solution depends on converting scanned documents and images to machine readable text using an OCR solution, limited by the quality of scanned images. Nagendra Shishodia, Chaithanya Manda, and Solmaz Torabi explore how GAN can bring significant efficiencies in any document-processing solution by enhancing resolution and denoising scanned images. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)
James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Gil Vernik (IBM)
Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the “push to the cloud” experience, which dramatically simplifies serverless for big data processing frameworks. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what not to do when planning for your organization’s big data initiatives. Michael Stonebraker shares his top 10 big data blunders. Read more.
1:15pm–1:55pm Wednesday, September 25, 2019
Session
Sponsored
Ben Lackey (Oracle)
Learn about running AI/ML solutions like H2O.ai and Kinetica on Oracle Cloud. The session will include a live demo of Terraform, Oracle Cloud Infrastructure, GPUs and Oracle Marketplace. We’ll discuss other leading Data and AI products including Cloudera, DataStax and Confluent. Read more.

2:05pm

2:05pm–2:45pm Wednesday, September 25, 2019
Session
Sponsored
Oftentimes there's a fracture between the highly governed data of enterprise IT systems and the comprehensive but often ungoverned world of large-scale data lakes and streams of data from blogs, system logs, sensors, IoT devices, and more. Kevin Poskitt and Andreas Wesselmann walk you through how AI needs to connect to all of this data, as well as image, video, audio, and text data sources. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Session
Sponsored
Amar Arsikere (infoworks.io)
The breakneck pace of business change and its insatiable appetite for data and analytics to drive Digital Transformation makes agile use of data an imperative. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Session
Sponsored
Aaron Swanson (Talend)
Winning the hearts and minds of millennials and Gen Z is not an easy task. ALDO has devised a data-driven strategy to create the best consumer experience. Today ALDO relies on Talend and AWS. Aaron Swanson explains the choices made for its data architecture and the hurdles the teams had to solve to turn the vision into reality. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
TBC
2:05pm–2:45pm Wednesday, September 25, 2019
Session
Sponsored
Olga Lagunova (Pitney Bowes), John Derrico (Mastercard)
Mastercard and Pitney Bowes have overcome many challenges on their journey to accelerate innovation, achieve efficiencies, and improve the overall customer experience. Olga Lagunova and John Derrico share lessons learned as the data strategy evolved and highlight pitfalls and solutions from data science projects across several industries, from finance to cross-border shipping logistics. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Mikio Braun (Zalando)
With ML becoming more mainstream, the side effects of machine learning and AI on our lives become more visible. You have to take extra measures to make machine learning models fair and unbiased. And awareness for preserving the privacy in ML models is rapidly growing. Mikio Braun explores techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Atul Gupte (Uber)
Uber is changing the way people think about transportation. As an integral part of the logistical fabric in 65+ countries around the world, it uses ML and advanced data science to power every aspect of the Uber experience—from dispatch to customer support. Atul Gupte and Nikhil Joshi explore how Uber enables teams to transform insights into intelligence and facilitate critical workflows. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Shirshanka Das (LinkedIn), Mars Lan (LinkedIn)
Imagine scaling metadata to an organization of 10,000 employees, 1M+ data assets, and an AI-enabled company that ships code to the site three times a day. Shirshanka Das and Mars Lan dive into LinkedIn’s metadata journey from a two-person back-office team to a central hub powering data discovery, AI productivity, and automatic data privacy. They reveal metadata strategies and the battle scars. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. Tomer Shiran and Jacques Nadeau explain how you can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize workloads simultaneously. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Peter Bailis (Sisu | Stanford University)
Despite a meteoric rise in data volumes within modern enterprises, enabling nontechnical users to put this data to work in diagnostic and predictive tasks remains a fundamental challenge. Peter Bailis details the lessons learned in building new systems to help users leverage the data at their disposal, drawing on production experience from Facebook, Microsoft, and the Stanford DAWN project. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Secondary topics:  Ethics, Privacy and Security
Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum), Boris Segalis (Cooley), Susan Israel (Loeb & Loeb, LLP)
From the EU to California and China, more of the world is regulating how data can be used. Andrew Burt and Brenda Leong convene leading experts on law and data science for a deep dive into ways to regulate the use of AI and advanced analytics. Come learn why these laws are being proposed, how they’ll impact data, and what the future has in store. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Panos Alexopoulos (Textkernel)
In an era where discussions among data scientists are monopolized by the latest trends in machine learning, the role of semantics in data science is often underplayed. Panos Alexopoulos presents real-world cases where making fine, seemingly pedantic, distinctions in the meaning of data science tasks and the related data has helped improve significantly the effectiveness and value. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Keshav Peswani (Expedia Group), Ashish Aggarwal (Expedia Group)
Observability is the key in modern architecture to quickly detect and repair problems in microservices. Modern observability platforms have evolved beyond simple application logs and include distributed tracing systems like Zipkin and Haystack. Keshav Peswani and Ashish Aggarwal explore how combining them with real-time, intelligent alerting mechanisms helps in the automated detection of problems. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Secondary topics:  Culture and Organization
Ann Spencer (Domino), Amy Heineike (Primer), Paco Nathan (derwen.ai), Chris Wiggins (NYT | Columbia)
If, as a data scientist, you've wondered why it takes so long to deploy your model into production or, as an engineer, thought data scientists have no idea what they want, you're not alone. Join a lively discussion with industry veterans Ann Spencer, Paco Nathan, Amy Heineike, and Chris Wiggins to find best practices or insights on increasing collaboration when developing and deploying models. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Tomer Levi (Fundbox)
Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster. Read more.
2:05pm–2:45pm Wednesday, September 25, 2019
Arup Nanda (Capital One)
Every organization wants to use data more effectively and as a weapon, but few succeed. Arup Nanda explores how Priceline started on this journey and how it was successful using different techniques and tools. Join in to learn how to streamline data assets, make it easier for end users, define KPIs, create value from data, and build sponsorships to build a data organization. Read more.

2:55pm

2:55pm–3:35pm Wednesday, September 25, 2019
Session
Sponsored
Dong Li (Kyligence), Hongbin Ma (Kyligence)
Your analytics are biased. Efforts to extract meaning by manually scrubbing, indexing, and parsing big data is limited by time, cost, and human assumptions. Dong Li and Hongbin Ma offer an overview of augmented analytics. It takes OLAP into the future with AI, ensuring objective and unique insights that cover all relevant scenarios found in petabytes of multidimensional and variable data. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Session
Sponsored
Radhika Ravirala (Amazon Web Services)
Radhika Ravirala explains how to migrate your workloads to Amazon EMR. Join in to learn the key motivations and benefits from a move to the cloud, along with the architectural changes required and best practices you can use right away. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Session
Sponsored
Jungwook SEo (SK Holdings)
Jungwook Seo walks you through a data analytics platform in the cloud by the name of AccuInsight+ with eight data analytic services in the CloudZ (one of the biggest cloud service providers in Korea), which SK Holdings announced in January 2019. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Weisheng Xie (Orange Financial), Jia Zhai (streamnative)
As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, risk control decision deployment has been critical to its success. Weisheng Xie and Jia Zhai explore how the company leverages Apache Pulsar to boost the efficiency of its risk control decision development for combating financial frauds of over 50 million transactions a day. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Session
Sponsored
Digital location data is a crucial part of data science. The "where" matters as much to an analysis as the "what" and the "why." Shannon Kalisky and Alberto Nieto explore tools that help you apply a range of geospatial techniques in your data science workflows to get deeper insights. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Secondary topics:  Financial Services
Jari Koister (FICO )
Machine learning and constraint-based optimization are both used to solve critical business problems. They come from distinct research communities and have traditionally been treated separately. But Jari Koister examines how they're similar, how they're different, and how they can be used to solve complex problems with amazing results. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Jesse Anderson (Big Data Institute)
In this talk, we will cover the most common reasons why data engineering teams fail and how to correct them. This will include ways to get your management to understand that data engineering is really complex and time consuming. It is not data warehousing with new names. Management needs to understand that you can’t compare a data engineering team to the web development team, for example. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber)
Uber takes data driven to the next level. It needs a robust system for discovering and managing various entities, from datasets to services to pipelines, and their relevant metadata isn't just nice—it's absolutely integral to making data useful. Kaan Onuk, Luyao Li, and Atul Gupte explore the current state of metadata management, end-to-end data flow solutions at Uber, and what’s coming next. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Secondary topics:  Privacy and Security
Marcus Fowler (Darktrace)
Cybersecurity must find what it doesn’t know to look for. AI technologies led to the emergence of self-learning, self-defending networks that achieve this—detecting and autonomously responding to in-progress attacks in real time. Marcus Fowler examine these cyber-immune systems enable the security team to focus on high-value tasks, counter even machine-speed threats, and work in all environments. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Tim McKenzie (Pitney Bowes)
Tim McKenzie examines why planning 5G network rollout and associated services requires a good understanding of location-based data. Accurate addressing and linking consumers to property or points of interest allows data enrichment with attributes, demographics and social data. Companies use location to organize and analyze network and customer data to understand where to target new services. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Secondary topics:  Privacy and Security
Mark Hinely (KirkpatrickPrice)
The fear that comes along with new compliance requirements is overwhelming. Organizations don’t know where to start, what to fix, or what an auditor expects to see. Mark Hinely gives you an auditor's perspective on the newest security and privacy regulations, how your business can prepare for compliance, and what the audit looks like to an auditor. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Gerard de Melo (Rutgers University)
Gerard de Melo takes a deep dive into the kinds of sentiment and emotion consumers associate with a text. With new data-driven approaches, organizations can better pay attention to what's being said about them in different markets. And you can consider fonts and palettes best suited to convey specific emotions, so organizations can make informed choices when presenting information to consumers. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Tony Xing (Microsoft), Congrui Huang (Microsoft), Qiyang Li (Microsoft), Wenyi Yang (Microsoft)
Anomaly detection may sound old fashioned, yet it's super important in many industry applications. Tony Xing, Congrui Huang, Qiyang Li, and Wenyi Yang detail a novel anomaly-detection algorithm based on spectral residual (SR) and convolutional neural network (CNN) and how this method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Fei Wang (CarGurus)
Fei Wang takes a deep dive into a case study for the CarGurus TV Attribution Model. You'll understand how you can leverage the creation of a causal inference model to calculate cost per acquisition (CPA) of TV spend and measure effectiveness when compared to CPA of digital performance marketing spend. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)
A business insight shows a sudden spike. It can take hours, or days, to debug data pipelines to find the root cause. Shradha Ambekar, Sunil Goplani, and Sandeep Uttamchandani outline how Intuit built a self-service tool that automatically discovers data pipeline lineage and tracks every change, helping debug the issues in minutes—establishing trust in data while improving developer productivity. Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Secondary topics:  Ethics
Farrah Bostic (The Difference Engine)
We're living in a culture obsessed with predictions. In politics and business, we collect data in service of the obsession. But our need for certainty and control leads some organizations to be duped by unproven technology or pseudoscience—often with unforeseen societal consequences. Farrah Bostic looks at historical—and sometimes funny—examples of sacrificing understanding for "data." Read more.
2:55pm–3:35pm Wednesday, September 25, 2019
Session
Sponsored
Anant Chintamaneni (HPE (BlueData)), Matt Maccaux (HPE (BlueData))
Anant Chintamaneni and Matt Maccaux explore whether the combination of containers with large-scale distributed data analytics and machine learning applications is like combining oil and water— or like peanut butter and chocolate. Read more.

3:35pm

3:35pm–4:35pm Wednesday, September 25, 2019
Afternoon break sponsored by MemSQL (1h)

4:35pm

4:35pm–5:15pm Wednesday, September 25, 2019
Session
Sponsored
Amit Assudani (Impetus)
Data lakes and analytical processing on the cloud is a reality. This presents new challenges for DevOps, with respect to Governance, Continuous Integration & Deployment, etc. This session will present our views on how to maintain sanity in your development organization while implementing the many dimensions of building an efficient cloud-based data platform and application development environment. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Session
Sponsored
Chuck Yarbrough (Hitachi Vantara)
According to Gartner, over 80% of data lake projects were deemed inefficient. Data lakes come and go. Swamps happen. Data agility is fleeting. Chuck Yarbrough walks you through how data ops practices and a modern data architecture bring greater visibility and allow faster data access with proper governance. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)
Trill has been open-sourced, making the streaming engine behind services like the Bing Ads platform available for all to use and extend. James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein dive into the history of and insights from streaming data at Microsoft. They demonstrate how its API can power complex application logic and the performance that gives the engine its name. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Session
Sponsored
Barbara Petrocelli (Cambridge Semantics), Peter Ball (Consultant)
Join industry consultant Peter Ball, of Liminal Innovation, and Barbara Petrocelli, VP Field Operations of Cambridge Semantics, to learn how enterprise data fabrics are reshaping the modern data management landscape. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Criteo’s infrastructure provides the capacity and connectivity to host Criteo’s platform and applications. The evolution of this infrastructure is driven by the ability to forecast Criteo’s traffic demand. Hamlet Jesse Medina Ruiz explains how Criteo uses Bayesian dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Prakhar Jain (Qubole), Sourabh Goyal (Qubole)
Autoscaling of resources aims to achieve low latency for a big data application while reducing resource costs. Upscaling a cluster in cloud is fairly easy as compared to downscaling nodes, and so the overall total cost of ownership (TCO) goes up. Prakhar Jain and Sourabh Goyal examine a new design to get efficient downscaling, which helps achieve better resource utilization and lower TCO. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Max Neunhöffer (ArangoDB), Joerg Schad (ArangoDB)
Machine learning platforms are becoming more complex, with different components each producing their own metadata and their own way of storing metadata. Max Neunhöffer and Joerg Schad propose a first draft of a common metadata API and demonstrate a first implementation of this API in Kubeflow using ArangoDB, a native multimodel database. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Jeff Zemerick (Mountain Fog)
Hospitals small and large are adopting cloud technologies, and many are in hybrid environments. These distributed environments pose challenges, none of which are more critical than the protection of protected health information (PHI). Jeff Zemerick explores how open source technologies can be used to identify and remove PHI from streaming text in an enterprise healthcare environment. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Vlad Eidelman (FiscalNote)
While regulations affect your life every day, and millions of public comments are submitted to regulatory agencies in response to their proposals, analyzing the comments has traditionally been reserved for legal experts. Vlad Eidelman outlines how natural language processing (NLP) and machine learning can be used to automate the process by analyzing over 10 million publicly released comments. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Elasticsearch (ES) allows extremely quick search and drilldowns on large amounts of semistructured data. Elasticsearch, however, does not have relational join capabilities. Giovanni Tummarello examines a plug-in for ES that adds cluster distributed joins and demonstrates how it enables an exciting array of use cases dealing with interconnected or "Knowledge Graph" enterprise data. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
John Berryman (Eventbrite)
Eventbrite is exploring a new machine learning approach that allows it to harvest data from customer search logs and automatically tag events based upon their content. John Berryman dives into the results and how they have allowed the company to provide users with a better inventory-browsing experience. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Anirudh Koul (Microsoft), Meher Kasam (Square)
Over the last few years, convolutional neural networks (CNNs) have risen in popularity, especially in the area of computer vision. Anirudh Koul and Meher Kasam take you through how you can get deep neural nets to run efficiently on mobile devices. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Robert Pesch (inovex), Robin Senge (inovex)
Data-driven software is revolutionizing the world and enable intelligent services we interact with daily. Robert Pesch and Robin Senge outline the development process, statistical modeling, data-driven decision making, and components needed for productionizing a fully automated and highly scalable demand forecasting system for an online grocery shop for a billion-dollar retail group in Europe. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Wangda Tan (Cloudera), Wei-Chiu Chuang (Cloudera)
Wangda Tan and Wei-Chiu Chuang outline the current status of Apache Hadoop community and dive into present and future of Hadoop 3.x. You'll get a peak at new features like erasure coding, GPU support, NameNode federation, Docker, long-running services support, powerful container placement constraints, data node disk balancing, etc. And they walk you through upgrade guidance from 2.x to 3.x. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Andrew Brust (Blue Badge Insights | ZDNet)
Andrew Brust provides a primer on data catalogs and a review of the major vendors and platforms in the market. He examines the use of data catalogs with classic and newer data repositories, including data warehouses, data lakes, cloud object storage, and even software and applications. You'll learn about AI's role in the data catalog world and get an analysis of data catalog futures. Read more.
4:35pm–5:15pm Wednesday, September 25, 2019
Session
Sponsored
Daniel D'Orazio (Matillion)
According to Forrester, insight-driven companies are on pace to make $1.8 trillion annually by 2021. Daniel D'Orazio wants to know how fast your team can collect, process, and analyze data to solve present—and future—business challenges. You'll gain actionable tips and lessons learned from cloud data warehouse modernizations at companies like DocuSign that you can take back to your business. Read more.

5:25pm

5:25pm–6:05pm Wednesday, September 25, 2019
Secondary topics:  Transportation and Logistics
Brandy Freitas (Pitney Bowes)
Brandy Freitas examines the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and how to harness the power of graph structure for machine learning through node embedding. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Neelesh Salian (Stitch Fix)
Every data team has to build an ecosystem that sustains the data, the users, and the use of the data itself. This data ecosystem comes with its own challenges during the building phase, maintenance, and enhancement. Neelesh Salian dives into the importance of data lineage for an organization. You'll explore how to go about building such a system. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Session
Sponsored
David Leichner (SQream)
What started as an asset for data scientists and BI professionals has become a poorly performing problem. David Leichner explores the Hadoop ecosystem and relational databases from an analytics perspective—reviewing the current landscape, what Hadoop was designed for, and how a Hadoop-based infrastructure can be improved to support a new era of exponentially growing data. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Bas Geerdink (Aizonic)
Streaming analytics (or fast data processing) is the field of making predictions based on real-time data. Bas Geerdink presents a fast data architecture that covers many use cases that follow a "pipes and filters" pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Ignite, and Spark Structured Streaming (KISSS). Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
venkata gunnu (Comcast), Harish Doddi (Datatron)
Machine learning infrastructure is key to the success of AI at scale in enterprises, with many challenges when you want to bring machine learning models to a production environment, given the legacy of the enterprise environment. Venkata Gunnu and Harish Doddi explore some key insights, what worked, what didn't work, and best practices that helped the data engineering and data science teams. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Secondary topics:  Retail and e-commerce
Subhasish Misra (Walmart Labs)
Causal questions are ubiquitous, and randomized tests are considered the gold standard. However, such tests are not always feasible, and then you just have observational data to get to causal insights. But techniques such as matching offer an opportunity to solve this. Subhasish Misra explores this and practical tips when trying to infer causal effects. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Chenzhao Guo (Intel), Carson Wang (Intel)
Shuffle in Spark requires the shuffle data to be persisted on local disks. However, the assumptions of collocated storage do not always hold in today’s data centers. Chenzhao Guo and Carson Wang outline the implementation of a new Spark shuffle manager, which writes shuffle data to a remote cluster with different storage backends, making life easier for customers. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what datasets are available for consumption. Naghman Waheed and John Cooper outline a custom metadata management tool recently deployed at Bayer. The system is cloud-enabled and uses multiple open source components, including machine learning and natural language processing to aid searches. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Matt Carothers (Cox Communications), Jignesh Patel (Cox Communications), Harry Tang (Cox Communications)
Organizations often work with sensitive information such as social security and credit card numbers. Although this data is stored in encrypted form, most analytical operations require data decryption for computation. This creates unwanted exposures to theft or unauthorized read by undesirables. Matt Carothers, Jignesh Patel, and Harry Tang explain how homomorphic encryption prevents fraud. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Thiago Ribeiro (Griaule)
Brazil deployed a national biometric system to register all Brazilian voters using multiple biometric modalities and to ensure that a person does not enroll twice. This session highlights how a large-scale biometric system works, and what are the main architecture decisions that one has to take in consideration. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Brindaalakshmi K (Independent Consultant)
There's a lack of standard for the collection of gender data. Brindaalakshmi K takes a look at the implications of such a lack in the context of a developing country like India, the exclusion of individuals beyond the binary genders of male and female, and how this exclusion permeates beyond the public sector into private sector services. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Emily Webber (Amazon Web Services)
Mansplaining. Know it? Hate it? Want to make it go away? Sireesha Muppala, Shelbee Eigenbrode, and Emily Webber tackle the problem of men talking over or down to women and its impact on career progression for women. They also demonstrate an Alexa skill that uses deep learning techniques on incoming audio feeds, examine ownership of the problem for women and men, and suggest helpful strategies. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
The common perception of deep learning is that it results in a fully self-contained model. However, in most cases, these models have similar requirements for data preprocessing as does more "traditional" machine learning. Despite this, there are few standard solutions for deploying end-to-end deep learning. Nick Pentreath explores how the ONNX format and ecosystem addresses this challenge. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Secondary topics:  Media and Advertising
Aaron Owen (Major League Baseball), Matthew Horton (Major League Baseball), Josh Hamilton (Major League Baseball)
Using SAS, Python, and AWS SageMaker, Major League Baseball's (MLB's) data science team outlines how it predicts ticket purchasers’ likelihood to purchase again, evaluates prospective season schedules, estimates customer lifetime value, optimizes promotion schedules, quantifies the strength of fan avidity, and monitors the health of monthly subscriptions to its game-streaming service. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Krishna Maheshwari (Cloudera)
Krishna Maheshwari provides an overview of the major features and enhancements in the HBase 2.0 release, upcoming releases, and the future of HBase. You'll be able to ask her questions at the end. Apache HBase 2.0 comes packed with a lot of new functionalities: off-heap read paths, multitier bucket cache, new finite state machine-based assignment manager, etc. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Alasdair Allan (Babilim Light Industries)
The arrival of a new generation of smart embedded hardware may cause the demise of large-scale data harvesting. In its place, smart devices will let us process data at the edge and extract insights without storing potentially privacy and GDPR infringing data. Join Alasdair Allan to learn why the current age where privacy is no longer "a social norm" may not long survive the coming of the IoT. Read more.
5:25pm–6:05pm Wednesday, September 25, 2019
Session
Sponsored
Ben Sharma (Zaloni), Santanu Sengupta (Nuveen)
Ben Sharma and Santanu Sengupta walk you through how to quickly integrate and accelerate environmental, social, and governance (ESG) data and third-party data into your environment to provide governed, trusted, and traceable data to portfolio managers and analysts in a self-service manner. Read more.

6:05pm

6:05pm–7:05pm Wednesday, September 25, 2019
Event
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

7:30pm

7:30pm–10:30pm Wednesday, September 25, 2019
Event
Don't miss an exciting evening filled with cocktails, food, and entertainment at Data After Dark at Strata in New York. Read more.

Thursday, 09/26/2019

8:00am

8:00am–8:30am Thursday, September 26, 2019
Event
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:30am

8:30am–8:45am Thursday, September 26, 2019
Early morning coffee (8:00am - 8:45am) (15m)

8:45am

8:45am–8:55am Thursday, September 26, 2019
Keynote
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:55am

8:55am–9:15am Thursday, September 26, 2019
Keynote
Cassie Kozyrkov (Google)
Machine learning and artificial intelligence are no longer science fiction, so now you have to address what it takes to harness their potential effectively, responsibly, and reliably. Based on lessons learned at Google, Cassie Kozyrkov offers actionable advice to help you find opportunities to take advantage of machine learning, navigate the AI era, and stay safe as you innovate. Read more.

9:15am

9:15am–9:25am Thursday, September 26, 2019
Keynote
Daniel Hernandez takes a deep dive into how, with a unified, prescriptive information architecture, organizations can successfully unlock the value of their data for an AI and multicloud world. Read more.

9:25am

9:25am–9:35am Thursday, September 26, 2019
Keynote
Arun Murthy (Cloudera )
In this keynote, we’ll introduce you to the new 100% open source Cloudera Data Platform (CDP), the world’s first enterprise data cloud. CDP is hybrid and multi-cloud, delivering the speed, agility, and scale you need to secure and govern your data anywhere from the edge to AI. Read more.

9:35am

9:35am–9:40am Thursday, September 26, 2019
Keynote
Barbara Eckman (Comcast)
Barbara Eckman shares lessons learned from early big data mistakes and the progress her team at Comcast is making toward a postrevolutionary big data vision. Read more.

9:40am

9:40am–9:45am Thursday, September 26, 2019
Keynote
Edward Jezierski (Microsoft)
Microsoft has an ecosystem spanning research, gaming, and the cloud that's advancing reinforcement learning (RL) and putting it into everyday use. Join Edward Jezierski to see where RL is used practically across Microsoft and imagine the opportunities that exist for your business today. Read more.

9:45am

9:45am–9:55am Thursday, September 26, 2019
Keynote
The Strata Data Awards recognize the most innovative startups, leaders, and data science projects from Strata sponsors and exhibitors around the world. Join us during keynotes for the announcement of the winners. Read more.

9:55am

9:55am–10:15am Thursday, September 26, 2019
Keynote
Jonathan Foster (Microsoft)
Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences. Read more.

10:15am

10:15am–10:20am Thursday, September 26, 2019
Keynote
Jed Dougherty (Dataiku)
Jed Dougherty presents the trailer of the upcoming _Data Science Pioneers_ documentary about the passionate data scientists driving us toward technological revolution. Cut through the hype with _Data Science Pioneers_ and see what it really means to be a data scientist. Read more.

10:20am

10:20am–10:40am Thursday, September 26, 2019
Keynote
Alan Smith (Financial Times)
Based on a critical evaluation of the iconic yield curve chart, Alan Smith argues that combining visualization (data to pixels) with sonification (data to pitch) offers potential to improve not only aesthetic multimedia experiences but also an opportunity to take the presentation of data into the rapidly expanding universe of screenless devices and products. Read more.

10:40am

10:40am–10:45am Thursday, September 26, 2019
Keynote
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs, Ben Lorica, Doug Cutting, and Alistair Croll, offer closing remarks. Read more.

10:50am

10:50am–11:20am Thursday, September 26, 2019
Morning break sponsored by Cisco (30m)

11:20am

11:20am–12:00pm Thursday, September 26, 2019
Session
Sponsored
AI isn't magic. It’s still hard work. Daniel Hernandez explains why having the technology alone isn't enough; it requires a thoughtful and well-architected approach. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Session
Sponsored
Edward Jezierski (Microsoft), Jackie Nichols (Microsoft)
Edward Jezierski and Jackie Nichols demonstrate how Cognitive Services Personalizer works with your content and data, how it autonomously learns to make optimal decisions, how you can add it to your app with two lines of code, and what’s under the hood. Then they share the results Personalizer achieved on the Xbox One home page as well as best practices for applying it in your applications today. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Session
Sponsored
Charles Boicey (Clearsense)
Healthcare’s reliance on comprehendible data is critical to the mission of providing optimal and affordable care. Charles Boicey takes a deep dive into how the application of technology, such as machine learning, is paramount to the modernization of healthcare that provides its professionals with fully integrated and complete medical records. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey)
You're a SaaS company operating on a cloud infrastructure prior to the machine learning (ML) era and you need to successfully extend your existing infrastructure to leverage the power of ML. Jing Huang and Jessica Mong detail a case study with critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Session
Sponsored
Ajay Anand (Kyvos Insights)
Learn how you can overcome the challenges of traditional OLAP solutions and scale BI to deliver quick insights to business users across your enterprise Read more.
11:20am–12:00pm Thursday, September 26, 2019
Secondary topics:  Ethics
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Alejandro Saucedo demystifies AI explainability through a hands-on case study, where the objective is to automate a loan-approval process by building and evaluating a deep learning model. He introduces motivations through the practical risks that arise with undesired bias and black box models and shows you how to tackle these challenges using tools from the latest research and domain knowledge. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend )
Stavros Kontopoulos and Debasish Ghosh explore online machine learning algorithm choices for streaming applications, especially those with resource-constrained use cases like IoT and personalization. They dive into Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms from implementation to production deployment, describing the pros and cons of each of them. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Michael Freedman (TimescaleDB | Princeton University)
Leveraging polyglot solutions for your time series data can lead to issues including engineering complexity, operational challenges, and even referential integrity concerns. Michael Freedman explains why, by re-engineering PostgreSQL to serve as a general data platform, your high-volume time series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Rick Houlihan (Amazon Web Services)
Data has always been and will always be relational. NoSQL databases are gaining in popularity, but that doesn't change the fact that the data is still relational, it just changes how we have to model the data. Rick Houlihan dives deep into how real entity relationship models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Jonathan Foster (Microsoft)
Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences. Read more.
11:20am–12:00pm Thursday, September 26, 2019
John Allen (Deutsche Bank)
As an early adopter of data science, machine learning, and AI, Deutsche Bank's analytics function is trailblazing new ways to drive revenues, lower costs, and reduce risk across all areas of the group. John Allen shares how his team combines commercial offerings with open source technologies to revolutionize legacy processes and transform the way the bank uses technology to drive innovation. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Brian Keng (Rubikloud)
Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure they're robust and consistent with business goals. Brian Keng takes a deep dive into how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Shital Shah (Microsoft Research)
Taming massive deep learning models, data, and training times requires new way of thinking. Shital Shah explores new tools and methods to better understand AI. Explaining the decisions made by AI not only helps us accelerate its development but also make it safe and more trustworthy. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Anjali Samani (CircleUp)
The application of smoothing and imputation strategies is common practice in predictive modeling and time series analysis. With a technique-agnostic approach, Anjali Samani provides qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Petar Zecevic (SV Group)
The Large Scale Survey Telescope (LSST) is one of the most important future surveys. Its unique design allows it to cover large regions of the sky and obtain images of the faintest objects. After 10 years of operation, it will produce about 80 PB of data in images and catalog data. Petar Zecevic explains AXS, a system built for fast processing and cross-matching of survey catalog data. Read more.
11:20am–12:00pm Thursday, September 26, 2019
Secondary topics:  Culture and Organization
Gayle Bieler (RTI International)
Gayle Bieler explains how she built a thriving center for data science within a large, well-respected nonprofit research institute and shares some of its most impactful projects and best adventures to date, that have solved important national problems, improved local communities, and transformed research. Read more.

12:00pm

12:00pm–1:15pm Thursday, September 26, 2019
Break (1h 15m)
12:00pm–1:15pm Thursday, September 26, 2019
Event
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.
12:00pm–1:15pm Thursday, September 26, 2019
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

12:30pm

12:30pm–1:10pm Thursday, September 26, 2019
Session
AI will be the most disruptive class of technologies over the next decade, fueled by near-endless amounts of data and unprecedented advances in deep learning. Brittany Bogle walks you through how to address some of the major AI challenges, like trust, talent, and data. Read more.

1:15pm

1:15pm–1:55pm Thursday, September 26, 2019
Session
Sponsored
Jed Dougherty (Dataiku)
Jed Dougherty takes a deep dive into an often overlooked aspect of the data science lifecycle: model deployment. Once they’ve constructed a data science model that does a good job accurately predicting their test set, many data scientists think the job is over. But really, it’s just begun. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Session
Sponsored
Paul Scott-Murphy (WANdisco)
Paul Scott-Murphy dives into the options that exist for cloud migration and their advantages and disadvantages, what cloud vendors do and don't offer to support large-scale migration, the business risks associated with large-scale cloud migration, and how to migrate analytics data at scale for immediate use in Spark without disrupting on-premises operations. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Session
Sponsored
Paul Wolmering (Actian Corporation)
Paul Wolmering explores the key characteristics for building an Agile data warehouse and defines a reference architecture for hybrid data. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Alon Gavra (AppsFlyer)
Frequently, Kafka is just a piece of the stack that lives in production that often times no one wants to touch—because it just works. Alon Gavra outlines how Kafka sits at the core of AppsFlyer's infrastructure that processes billions of events daily. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Session
Sponsored
Dan DeMers (Cinchy)
After 40 years of apps, enterprise companies now realize that building or buying an application for every use case has become a major threat to their ability to leverage and protect their core data assets. Dan DeMers provides a live demo of Cinchy, the world’s first data collaboration platform. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Sandra Carrico (GLYNT)
Sandra Carrico explores mixed formal learning, explains it, and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to mixed formal learning, a general AI architecture that you can use in your projects. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Jim Scott (NVIDIA)
Data scientists create and test hundreds or thousands more models than in the past. Models require support from both real-time and static data sources. As data becomes enriched, and parameters tuned and explored, there's a need for versioning everything, including the data. Jim Scott examines the very specific problems and approaches to fix them. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Omkar Joshi (Uber), Bo Yang (Uber)
Omkar Joshi and Bo Yang offer an overview of how Uber’s ingestion (Marmary) and observability team improved performance of Apache Spark applications running on thousands of cluster machines and across hundreds of thousands+ of applications and how the team methodically tackled these issues. They also cover how they used Uber’s open-sourced jvm-profiler for debugging issues at scale. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Shant Hovsepian (Arcadia Data)
With cloud object storage (e.g., S3, ADLS) one expects business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces nonobvious challenges. Shant Hovsepian examines service-oriented cloud design (storage, compute, catalog, security, SQL) and how native cloud BI provides analytic depth, low cost, and performance. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Secondary topics:  Culture and Organization
Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
If you've ever been confused about what it takes to be a data scientist or curious about how companies recruit, train, and manage analytics resources, Usama Fayyad and Hamit Hamutcu are here to explore insights from the most comprehensive research effort to date on the data analytics profession and propose a framework for the standardization of roles and methods for assessing skills. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
James Kotecki (Infinia ML)
Miscommunication between business leaders and technical experts can doom even the best data science project. Don’t let it drive you insane! In this session, we’ll dissect many flavors of communication failure, from goal misalignment to technical misunderstanding. Then, we’ll explore practical ways to bridge these gaps. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Victor Dibia (Cloudera Fast Forward Labs)
Recent advances in machine learning frameworks for the browser such as TensorFlow provides the opportunity to craft truly novel experiences within frontend applications. Victor Dibia explores the state of the art for machine learning in the browser using TensorFlow and outlines its use in the design of Handtrack.js—a library for prototyping real-time hand detection in the browser. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Sameer Agarwal (Facebook), Ankit Agarwal (Facebook Inc.)
Apache Spark is the largest compute engine at Facebook by CPU. Sameer Agarwal dives into the story of how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and being used by thousands of data scientists, engineers, and product analysts every day. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Alfred Whitehead (Klick), clare jeon (Klick)
Time series forecasts depend on sensors or measurements made in the real, messy world. The sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing signals. Signals that may tell you what tomorrow's temperature will be or what your blood glucose levels are before bed. Alfred Whitehead and Clare Jeon explore methods for handling data gaps and when to consider which. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Sushant Rao (Cloudera)
Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms. Read more.
1:15pm–1:55pm Thursday, September 26, 2019
Secondary topics:  Culture and Organization, Ethics
Keegan Hines (Capital One)
This talk will explore some of the philosophy around the concept of explaining a model given the colloquial definition is partially recursive. It will cover the lens banking regulation places on this philosophical basis and expand into techniques used for these well governed aspects. Read more.

2:05pm

2:05pm–2:45pm Thursday, September 26, 2019
Session
Sponsored
Jim Cushman (Collibra), Piyush Jain (Progressive)
Transforming data into a trusted business asset that informs decision making requires giving teams access to a powerful platform that makes it easy to harness data across the enterprise. Jim Cushman and Piyush Jain detail how Progressive uses Collibra to transform the way data is managed and used across the organization, driving real business value. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Stephan Ewen (Ververica)
Stephan Ewen details how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: new cross-batch-streaming machine learning algorithms, state-of-the-art batch performance, and new building blocks for data-driven applications and application consistency. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Session
Sponsored
Matt Derda (Trifacta), Yogesh Prasad (IQVIA)
Clinical trial data analysis can be a complex process. The data is typically hand-coded and formatted differently and is required to be delivered in an FDA-approved format. Matt Derda and Yogesh Prasad explain how IQVIA built its Clean Patient Tracker and how it enabled agility and flexibility for end users of the platform, from data acquisition to reporting and analytics. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Davor Bonaci (Kaskada), Anand Madhavan (Narvar)
Narvar provides next-generation posttransaction experience for over 500 retailers. Karthik Ramasamy and Anand Madhavan take you on the journey of how Narvar moved away from using a slew of technologies for their platform and consolidated its use cases using Apache Pulsar. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Mumin Ransom (Comcast), Nick Pinckernell (Comcast)
Mumin Ransom gives an overview of the data management and privacy challenges around automating ML model (re)deployments and stream-based inferencing at scale. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Diego Oppenheimer (Algorithmia)
Machine learning (ML) will fundamentally change the way we build and maintain applications. Diego Oppenheimer dives into how you can adapt your infrastructure, operations, staffing, and training to meet the challenges of the new software development life cycle (SDLC) without throwing away everything that already works. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Building a reliable big data platform is extremely challenging when it has to store and serve hundreds of petabytes of data in real time. Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
With cheap and scalable storage services such as S3 and ADLS, it's never been easier to dump data into a cloud data lake. But you still need to secure that data and be sure it doesn't leak. Tomer Shiran and Jacques Nadeau explore capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest), and auditing, as well as network protections. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Alex Yoon (T-Mobile)
T-Mobile successfully improved the quality of voice calling by analyzing crowdsourced big data from mobile devices. Alex Yoon walks you through how engineers from multiple backgrounds collaborated to achieve 10% improvement in voice quality and why the analysis of big data was the key to the success in bringing a better voice call service quality to millions of end users. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Akshay Rai (Linkedin)
Failures or issues in a product or service can negatively affect the business. Detecting issues in advance and recovering from them is crucial to keeping the business alive. Join Akshay Rai to learn more about LinkedIn's next-generation open source monitoring platform, an integrated solution for real-time alerting and collaborative analysis. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Heitor Murilo Gomes and Albert Bifet introduce you to a machine learning pipeline for streaming data using the streamDM framework. You'll also learn how to use streamDM for supervised and unsupervised learning tasks, see examples of online preprocessing methods, and discover how to expand the framework by adding new learning algorithms or preprocessing methods. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Secondary topics:  Deep Learning, Streaming and IoT
Ryan Foltz (Exabeam)
Unmanaged and foreign devices in the corporate networks pose a security risk, and the first step toward reducing this risk is the ability to identify them. Ryan Foltz walks you through a comprehensive device management machine learning model based on deep learning that performs anomaly detection based on only device names to flag devices that do not follow naming structures. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Anais Dotis (InfluxData)
Machine learning (ML) gets a lot of hype, but its classical predecessors are still immensely powerful, especially in the time series space, and classical algorithms outperform machine learning methods in time series forecasting. Anais Dotis dives into how she used the Holt-Winters forecasting algorithm to predict water levels in a creek. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Nikki Rouda (Amazon Web Services), Janisha Anand (Amazon Web Services)
Nikki Rouda and Janisha Anand demonstrate how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. You'll also learn how to link customer records across different databases, match external product lists against your own catalog, and solve tough challenges to prepare and cleanse data for analysis. Read more.
2:05pm–2:45pm Thursday, September 26, 2019
Paco Nathan (derwen.ai)
Paco Nathan outlines the history and landscape for vendors, open source projects, and research efforts related to AutoML. Starting from the perspective of an AI expert practitioner who speaks business fluently, Paco unpacks the ground truth of AutoML—translating from the hype into business concerns and practices in a vendor-neutral way. Read more.

2:45pm

2:45pm–3:45pm Thursday, September 26, 2019
Afternoon break sponsored by Io-Tahoe (1h)

3:45pm

3:45pm–4:25pm Thursday, September 26, 2019
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Jonghyok Lee Chon Yong Lee discuss T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides a 3D visualization of the real-time status of the whole network. Join in to hear lessons learned during development. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Secondary topics:  Financial Services
David Mack (Octavian)
Graphs are a powerful way to represent knowledge. Organizations, in fields such as biosciences and finance, are starting to amass large knowledge graphs, but they lack the machine learning tools to extract insights from them. David Mack offers an overview of what insights are possible and surveys the most popular approaches. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
As an increasing level of automation becomes available to data science, the balance between automation and quality needs to be maintained. Applying DevOps practices to machine learning workloads brings models to the market faster and maintains the quality and integrity of those models. Sireesha Muppala, Shelbee Eigenbrode, and Randall DeFauw explore applying DevOps practices to ML workloads. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )
Vitaliy Baklikov and Dipti Borkar explore how DBS Bank built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Owen O'Malley (Cloudera)
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. Owen O'Malley dives into how column encryption in ORC files enables both fine-grain protection and audits of who accessed the private data. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
At MakeMyTrip customers were using voice or email to contact agents for postsale support. In order to improve the efficiency of agents and improve customer experience, MakeMyTrip developed a chatbot, Myra, using some of the latest advances in deep learning. Madhu Gopinathan and Sanjay Mohan explain the high-level architecture and the business impact Myra created. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada)
As new digital platforms emerge and governments look at new ways to engage with citizens, there's an increasing awareness of the role these platforms play in shaping public participation and democracy. Audrey Lobo-Pulo, Annette Hester, and Ryan Hum examine the design attributes of civic engagement technologies and their ensuing impacts and an NEB Canada case study. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Secondary topics:  Deep Learning
Sajan Govindan (Intel)
Sajan Govindan outlines CERN’s research on deep learning in high energy physics experiments as an alternative to customized rule-based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider. CERN uses deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Chad Scherrer (Metis)
Chad Scherrer explores the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Scott Castle (Sisense)
In this session, Scott Castle, General Manager at Sisense and former VP of Product at Periscope Data, will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its data teams. Read more.
3:45pm–4:25pm Thursday, September 26, 2019
Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
Jonathan Tudor and Ross Schalmo explore how GE Aviation made it a mission to implement self-service data. To ensure success beyond initial implementation of tools, the data engineering and analytics teams created initiatives to foster engagement from an ongoing partnership with each part of the business to the gamification of tagging data in a data catalog to forming a published dataset council. Read more.

4:35pm

4:35pm–5:15pm Thursday, September 26, 2019
Supun Kamburugamuve (Indiana University)
Big data computing and high-performance computing (HPC) evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms increasingly embrace each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
Ruixin Xu, Long Tian, and Yu Zhou explore an experiment run using Spark and Jupyter notebooks as a replacement for existing IDE-based tools for internal DevOps. The Spark-based solution improved the diagnosis performance significantly, especially for a complex job with a large profile, and leveraging the Jupyter notebooks brings the benefit of fast iteration and easy knowledge share. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
David Boyle (Audience Strategies)
Companies that harness creativity and data in tandem have growth rates twice as high as companies that don’t. David Boyle shares lessons from his successes and failures in trying to do just that across presidential politics, with pop stars, and with power brands in the world of luxury goods. Join in to find out how analysts can work differently to build these partnerships and unlock this growth. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
Secondary topics:  Privacy and Security
Mark Donsky (Okera)
California is following the EU's GDPR with the California Consumer Protection Act (CCPA) in 2020. Penalties for non-compliance, but many companies aren't prepared for this strict regulation. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
Secondary topics:  Deep Learning
Naoto Umemori (NTT DATA), Masaru Dobashi (NTT DATA)
Giant hogweed is a highly toxic plant. Naoto Umemori and Masaru Dobashi aim to automate the process of detecting the plant with technologies like drones and image recognition and detection using machine learning. You'll see how they designed the architecture, took advantage of big data and machine and deep learning technologies (e.g., Hadoop, Spark, and TensorFlow), and the lessons they learned. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
Jeroen Janssens (Data Science Workshops)
Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
Jordan Volz (Dataiku)
Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications. Read more.
4:35pm–5:15pm Thursday, September 26, 2019
Dean Wampler (Lightbend)
Dean Wampler dives into how (and why) to integrate ML into production streaming data pipelines and to serve results quickly; how to bridge data science and production environments with different tools, techniques, and requirements; how to build reliable and scalable long-running services; and how to update ML models without downtime. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires