Sep 23–26, 2019

Schedule

Monday, 09/23/2019

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Michael Li (The Data Incubator), Gonzalo Diaz (The Data Incubator)
Michael Li and Ana Hocevar provide a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and use their input and analysis for your business’s strategic priorities and decision making. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ)
Recommendation systems play a significant role—for users, a new world of options; for companies, it drives engagement and satisfaction. Amit Kapoor and Bargava Subramanian walk you through the different paradigms of recommendation systems and introduce you to deep learning-based approaches. You'll gain the practical hands-on knowledge to build, select, deploy, and maintain a recommendation system. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Event
Sponsored
Jeff Davis (Google Cloud)
This course provides a hands-on introduction to designing and building machine learning models on structured data on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you will learn machine learning (ML) concepts and how to implement them using both BigQuery Machine Learning and TensorFlow/Keras. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Michael Cullan (The Data Incubator)
Michael Cullan walks you through developing a machine learning pipeline from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join Jorge Lopez to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but you face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by outlining the abstractions common to these systems. You'll go hands-on exercises to overcome obstacles to getting started using new tools. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Jesse Anderson (Big Data Institute)
Jesse Anderson offers you an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it, as well as how to create consumers and publishers. You'll take a look Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Dylan Bargteil (The Data Incubator)
The TensorFlow library provides for the use of computational graphs with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil explores TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.

10:30am

10:30am–11:00am Monday, 09/23/2019
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 09/23/2019
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, 09/23/2019
Afternoon break (30m)

7:00pm

Add to your personal schedule
7:00pm–9:00pm Monday, 09/23/2019
Event
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine New York has to offer. Read more.

Tuesday, 09/24/2019

9:00am

Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00am–5:00pm Tuesday, 09/24/2019
Training
Sponsored
Matt Kirk (YourChiefScientist.com)
Note: This free workshop, courtesy of IBM, is open to the first 50 registrants. You'll take a fascinating deep dive into the power and applications of machine learning in the enterprise. Read more.
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00am–5:00pm Tuesday, 09/24/2019
David Boyle (Harrods), Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Capital One), Victoriya Kalmanovich (Navy), Tusharadri Mukherjee (Lenovo), Martin Mendez-Costabel (Bayer Crop Science), Gloria Macia (Roche AG), Gwen Campbell (Revibe Technologies), Moise Convolbo (Rakuten), Muhammed Idris (Capria VC | TeraCrunch)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata. Read more.
Add to your personal schedule
9:00am–5:00pm Tuesday, 09/24/2019
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Brian Lynch (TD Bank Group), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Karan Jaswal (Cinchy), Moto Tohda (Tokyo Century (USA)), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi), Peter Swartz (Altana Trade)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Secondary topics:  Culture and Organization
Rossella Blatt Vital (Wonderlic)
Creating and leading a successful ML strategy is an elegant orchestration of many components: master key ML concepts, operationalize ML workflow, prioritize highest-value projects, build a high-performing team, nurture strategic partnerships, align with the company’s mission, etc. Rossella Blatt Vital details insights and lessons learned in how to create and lead a flourishing ML practice. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Sourav Dey (Manifold), Jakov Kucan (Manifold)
Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You'll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Jules Damji (Databricks)
ML development brings many new complexities beyond the software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information. Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but you need to decide what to do when you run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, explores some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python, including NLTK, TextBlob, spaCy, and gensim. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Viktor Gamov (Confluent)
Building stream processing applications is certainly one of the hot topics in the IT community. But if you've ever thought you needed to be a programmer to do stream processing and build stream processing data pipelines, think again. Viktor Gamov explores KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera)
There are too many edge devices and agents, and you need to control and manage them. Purnima Reddy Kuchikulla, Timothy Spann, and Abdelkrim Hadjidj walk you through handling the difficulty in collecting real-time data and the trouble with updating a specific set of agents with edge applications. Get your hands dirty with Cloudera Edge Management (CEM), which addresses these challenges with ease. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Secondary topics:  Deep Learning
Bruno Goncalves (Data For Science, Inc)
You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Goncalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
James Morantus (Cloudera)
Moving to the cloud poses challenges from rearchitecting to data context consistency across workloads that span multiple clusters. Jason Wang, Tony Wu, and Vinithra Varadharajan explore cloud architecture and its challenges, as well as using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Secondary topics:  Privacy and Security
Mark Donsky (Okera)
New regulations such as CCPA and GDPR drive new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span on-premises, private cloud, multicloud, and hybrid cloud. Mark Donsky shares hands-on best practices for meeting these challenges with special attention to CCPA. Read more.

10:30am

10:30am–11:00am Tuesday, 09/24/2019
Morning break sponsored by Microsoft (30m)

12:30pm

12:30pm–1:30pm Tuesday, 09/24/2019
Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Secondary topics:  Culture and Organization
Mac Steele (Domino), Nick Elprin (Domino)
The honeymoon era of data science is ending and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. Mac Steele and Nick Elprin explore how leading organizations take a holistic approach to people, process, and technology to build a sustainable advantage. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include Word2Vec, recurrent neural networks (RNNs) and variants (long short-term memory [LSTM] and gated recurrent unit [GRU]), and convolutional neural networks. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, Saif Addin Ellafi, and Claudiu Branzan walk you through state-of-the-art natural language processing (NLP) using the highly performant, highly scalable open source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Sophie Watson (Red Hat), William Benton (Red Hat)
Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Secondary topics:  Culture and Organization
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this has led to new challenges for organizations in creating successful data practices. Ted Malaska and Jonathan Seidman detail guidelines and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Secondary topics:  Privacy and Security
Carolyn Duby (Cloudera)
Bring your laptop, roll up your sleeves, and get ready to crunch some cybersecurity events with Apache Metron, an open source big data cybersecurity platform. Carolyn Duby walks you through how Metron finds actionable events in real time. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Mark Madsen (Teradata), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera)
Kafka is omnipresent and the backbone of streaming analytics applications and data lakes. The challenge is understanding what's going on overall in the Kafka cluster, including performance, issues, and message flows. Purnima Reddy Kuchikulla and Dan Chaffelson walk you through a hands-on experience to visualize the entire Kafka environment end-to-end and simplify Kafka operations via SMM. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
Boris Lublinsky and Dean Wampler examine ML use in streaming data pipelines, how to do periodic model retraining, and low-latency scoring in live streams. Learn about Kafka as the data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, metadata tracking, and more. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 09/24/2019
Afternoon break sponsored by Dataiku (30m)

5:00pm

Add to your personal schedule
5:00pm–6:30pm Tuesday, 09/24/2019
Event
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Tuesday. Read more.

Wednesday, 09/25/2019

8:15am

Add to your personal schedule
8:15am–8:45am Wednesday, 09/25/2019
Event
Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:45am

Add to your personal schedule
8:45am–9:00am Wednesday, 09/25/2019
Keynote
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:00am

Add to your personal schedule
9:00am–9:15am Wednesday, 09/25/2019
Keynote
Details to come. Read more.

9:25am

Add to your personal schedule
9:25am–9:35am Wednesday, 09/25/2019
Keynote
James Malone (Google)
Open source has always been a core pillar of Google Cloud’s data and analytics strategy. James Malone examines how, as the community continues to set industry standards, the company continues to integrate those standards into its services so organizations around the world can unlock the value of data faster. Read more.

9:35am

Add to your personal schedule
9:35am–9:50am Wednesday, 09/25/2019
Keynote
Sara Menker (Gro Intelligence)
Sara Menker, CEO, Gro Intelligence Read more.

9:50am

Add to your personal schedule
9:50am–10:00am Wednesday, 09/25/2019
Keynote
Ben Lorica (O'Reilly Media)
Ben Lorica, Chief Data Scientist, O'Reilly Read more.

10:00am

Add to your personal schedule
10:00am–10:05am Wednesday, 09/25/2019
Keynote
Data analytics is the long-standing but constantly evolving science that companies leverage for insight, innovation, and competitive advantage. Ziya Ma explores Intel’s end-to-end data pipeline software strategy designed and optimized for a modern and flexible data-centric infrastructure that allows for the easy deployment of unified advanced analytics and AI solutions at scale. Read more.

10:05am

Add to your personal schedule
10:05am–10:20am Wednesday, 09/25/2019
Keynote
Swatee Singh (American Express)
The financial services industry is increasingly using disruptive technology—including AI and machine learning, edge computing, blockchain, mobile and mixed reality, virtual assistants, and quantum computing to name a few—to enhance the customer experience and personalize their interactions with customers. Swatee Singh outlines how the same is true at American Express. Read more.

10:20am

Add to your personal schedule
10:20am–10:25am Wednesday, 09/25/2019
Keynote
Nikita Shamgunov (MemSQL)
Data is now the world’s most valuable resource, with winners and losers decided every day by how well we collect, analyze, and act on data. However, most companies struggle to unlock the full value of their data, using outdated, outmoded data infrastructure. Nikita Shamgunov examines how businesses use data, the new demands on data infrastructure, and what you should expect from your tools. Read more.

10:25am

Add to your personal schedule
10:25am–10:30am Wednesday, 09/25/2019
Keynote
Siva Sivakumar (Cisco)
Siva Sivakumar explains the Cisco Data Intelligence Platform (CDIP), which is a cloud-scale architecture that brings together big data, AI and compute farm, and storage tiers to work together as a single entity, while also being able to scale independently to address the IT issues in the modern data center. Read more.

10:30am

Add to your personal schedule
10:30am–10:45am Wednesday, 09/25/2019
Keynote
Patrick Lucey (Stats Perform)
Imagine watching sports and being able to immediately find all plays that are similar to what just happened. Better still, imagine being able to draw a play with the Xs and Os on an interface like a coach draws on a chalkboard and instantaneously finding all the similar plays and conduct analytics on those plays. Join Patrick Lucey to see how this is possible. Read more.

10:50am

10:50am–11:20am Wednesday, 09/25/2019
Morning break sponsored by Intel (30m)

11:20am

Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Session
Sponsored
James Malone (Google)
James Malone takes a deep dive into how customers across the world partner with Google Cloud to reimagine big data processing and data lakes while generating incredible business value. Read more.
11:20am–12:00pm Wednesday, 09/25/2019
Session
Sponsored
TBC
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Session
Sponsored
Praveen Chitrada (Akamai Technologies)
Praveen Chitrada walks you through how Akamai uses MemSQL, Docker, Airflow, Prometheus, and other technologies as an enabler to streamline and accelerate data ingestion and calculation to generate usage metrics for billing, reporting, and analytics at massive scale. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Navinder Pal Singh Brar (Walmart Labs)
Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Session
Sponsored
Chiang Yang (Cisco), Karthik Kulkarni (Cisco)
Artificial intelligence and machine learning are well beyond the laboratory exploratory stage of deployment. In fact, the speed of AI and ML deployment has a huge impact on an organization’s financial income. Chiang Yang and Karthik Kulkarni explore how the Cisco Data Intelligence Platform can help bridge the gap between AI and ML and big data. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Ted Dunning (MapR)
Feature engineering is generally the section that gets left out of machine learning books, but it's also the most critical part in practice. Ted Dunning explores techniques, a few well known, but some rarely spoken of outside the institutional knowledge of top teams, including how to handle categorical inputs, natural language, transactions, and more in the context of machine learning. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Julien Le Dem (WeWork)
Big data is crucial to organizations—and it's big not only by volume but also by the multitude of data sources and teams using them. Central data teams doing all the work is outdated as the entire organization becomes an ecosystem and central teams become enablers. Julien Le Dem outlines the principles of a data platform that enables the entire organization to build data-centric products. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Moty Fania (Intel)
Moty Fania details Intel’s IT experience of implementing a sales AI platform. This platform is based on streaming, microservices architecture with a message bus backbone. It was designed for real-time data extraction and reasoning and handles the processing of millions of website pages and is capable of sifting through millions of tweets per day. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Steven Touw (Immuta)
Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data security and privacy, they’re everywhere. Over the past four years, data security and privacy anti-patterns have emerged across hundreds of customers and industry verticals—there's been an obvious trend. Steven Touw details five anti-patterns and, more importantly, the solutions for them. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Secondary topics:  Culture and Organization
Brian Dalessandro (SparkBeyond)
While data science value is well recognized within tech, experience across industries shows that the ability to realize and measure business impact is not universal. A core issue is that data science programs face unique risks many leaders aren’t trained to hedge against. Brian Dalessandro addresses these risks and advocates for new ways to think about and manage data science programs. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Session
Janet Haven (Data & Society)
Details to come. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Secondary topics:  Ethics
Harsha Nori (Microsoft), Sameul Jenkins (Microsoft), Rich Caruana (Microsoft)
Understanding decisions made by machine learning systems is critical for sensitive uses, ensuring fairness, and debugging production models. Interpretability presents options for trying to understand model decisions. Harsha Nori, Sameul Jenkins, and Rich Caruana explore the tools Microsoft is releasing to help you train powerful, interpretable models and interpret existing black box systems. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Ying Yau (Walmart Labs)
Time series forecasting techniques can be applied in a wide range of scientific disciplines, business scenarios, and policy settings. Jeffrey Yau details the application of deep learning techniques to time series forecasting and compares them to time series statistical models when forecasting time series with trends, multiple seasonality, regime switch, and exogenous series. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Secondary topics:  Culture and Organization
Ann Spencer (Domino), Paco Nathan (Derwen), Amy Heineike (Primer), Pete Warden (TensorFlow)
If, as a data scientist, you've wondered why it takes so long to deploy your model into production or, as an engineer, thought data scientists have no idea what they want, you're not alone. Join a lively discussion panel with industry veterans Ann Spencer, Paco Nathan, Amy Heineike, and Pete Warden to find best practices or insights on increasing collaboration when developing and deploying models. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Paige Roberts (Vertica), Deepak Majeti (Vertica)
GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Session
David Talby (Pacific AI)
Machine learning and data science systems often fail in production in unexpected ways. David Talby outlines real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.

12:00pm

12:00pm–1:15pm Wednesday, 09/25/2019
Lunch sponsored by Google Cloud (1h 15m)
Add to your personal schedule
12:00pm–1:15pm Wednesday, 09/25/2019
Event
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.
Add to your personal schedule
12:00pm–1:15pm Wednesday, 09/25/2019
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
Add to your personal schedule
12:00pm–1:15pm Wednesday, 09/25/2019
Event
If you’d like to make new professional connections and hear ideas for supporting diversity in the tech community, come to the diversity and inclusion networking lunch on Wednesday. Read more.

1:15pm

Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Session
Sponsored
Diana Shaw (SAS)
Companies today are working to adopt data-driven mind-sets, strategies, and cultures. Yet the ugly truth is many still struggle to make analytics actionable. Diana Shaw outlines a simple, powerful, and automated solution to operationalize all types of analytics at scale. You'll learn how to put analytics into action while providing model governance and data scalability to drive real results. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Darren Chinen (Malwarebytes)
Developing, deploying and managing AI and anomaly detection models is tough business. Darren Chinen details how Malwarebytes has leveraged containerization, scheduling, and orchestration to build a behavioral detection platform and a pipeline to bring models from concept to production. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Michael Noll (Confluent)
Would you cross the street with traffic information that's a minute old? Certainly not. Modern businesses have the same needs. Michael Noll explores why and how you can use Kafka and its growing ecosystem to build elastic event-driven architectures. Specifically, you look at Kafka as the storage layer, at Kafka Connect for data integration, and at Kafka Streams and KSQL as the compute layer. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Session
Sponsored
Peter Wang (Anaconda)
Peter Wang explores why data science shouldn’t be seen as merely another technical job within the business and why open source is such a critical aspect of innovation in the field of data science. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Secondary topics:  Deep Learning
Shioulin Sam (Cloudera Fast Forward Labs)
Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn)
Join Swasti Kakker, Manu Ram Pandit, and Vidya Ravivarma to explore what's offered by a flexible and scalable hosted data science platform at LinkedIn. It provides features to seamlessly develop in multiple languages, enforce developer best practices, governance policies, execute, visualize solutions, efficient knowledge management, and collaboration to improve developer productivity. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Wim Stoop (Cloudera), Srikanth Venkat (Cloudera)
Establishing enterprise-wide security and governance remains a challenge for most organizations. Integrations and exchanges across the landscape are costly to manage and maintain, and typically work in one direction only. Wim Stoop and Srikanth Venkat explore how ODPi's Egeria standard and framework removes the challenges and is leveraged by Cloudera and partners alike to deliver value. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
The Apache Parquet community is working on a column encryption mechanism that protects sensitive data and enables access control for table columns. Many companies are involved, and the mechanism specification has recently been signed off on by the community management committee. Gidon Gershinsky explores the basics of Parquet encryption technology, its usage model, and a number of use cases. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Felipe Hoffa (Google), Bob Bradley (Geotab)
Geotab is a world-leading asset-tracking company with millions of vehicles under service every day. Felipe Hoffa and Bob Bradley examine the challenges and solutions to create an ML- and geographic information system- (GI)S enabled petabyte-scale data warehouse leveraging Google Cloud. And they dive into the process to publish open, how you can access it, and how cities are using it. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Secondary topics:  Ethics, Privacy and Security
Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum)
Machine learning techniques are being deployed across almost every industry and sector. But this adoption comes with real, and oftentimes underestimated, privacy and security risks. Andrew Burt and Brenda Leong convene a panel of experts to detail real-life examples of when ML goes wrong, and the lessons they learned. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering)
Recruiting patients for clinical trials is a major challenge in drug development. Saif Addin Ellafi and Scott Hoch explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They dive into the technical challenges, the architecture of the full solution, and the lessons the company learned. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Every NLP-based document-processing solution depends on converting scanned documents and images to machine readable text using an OCR solution, limited by the quality of scanned images. Nagendra Shishodia, Chaithanya Manda, and Solmaz Torabi explore how GAN can bring significant efficiencies in any document-processing solution by enhancing resolution and denoising scanned images. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)
James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Gil Vernik (IBM)
Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the “push to the cloud” experience, which dramatically simplifies serverless for big data processing frameworks. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what not to do when planning for your organization’s big data initiatives. Michael Stonebraker shares his top 10 big data blunders. Read more.

2:05pm

Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Session
Sponsored
Oftentimes there's a fracture between the highly governed data of enterprise IT systems and the comprehensive but often ungoverned world of large-scale data lakes and streams of data from blogs, syslogs, sensors, IoT devices, and more. Kevin Poskitt walks you through how AI needs to connect to all of this data, as well as image, video, audio, and text data sources. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Session
Sponsored
Zaher Hazim (ALDO Group)
Winning the hearts and minds of millennials and Gen Z is not an easy task. ALDO has devised a data-driven strategy to create the best consumer experience. Today ALDO relies on Talend and AWS. Zaher Hazim explains the choices made for its data architecture and the hurdles the teams had to solve to turn the vision into reality. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Stephan Ewen (Ververica), Aljoscha Krettek (Ververica)
Stephan Ewen and Aljoscha Krettek detail how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: new cross-batch-streaming machine learning algorithms, state-of-the-art batch performance, and new building blocks for data-driven applications and application consistency. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Session
Sponsored
Olga Lagunova (Pitney Bowes), John Derrico (Mastercard)
Mastercard and Pitney Bowes have overcome many challenges in their journey to accelerate innovation, achieve efficiencies and improve the overall customer experience. This presentation will feature key learnings through the evolution of their data strategy and highlight pitfalls and solutions from data science projects across several industries—from finance to cross-border shipping logistics. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Mikio Braun (Zalando)
With ML becoming more mainstream, the side effects of machine learning and AI on our lives become more visible. You have to take extra measures to make machine learning models fair and unbiased. And awareness for preserving the privacy in ML models is rapidly growing. Mikio Braun explores techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Atul Gupte (Uber), Nikhil Joshi (Uber)
Uber is changing the way people think about transportation. As an integral part of the logistical fabric in 65+ countries around the world, it uses ML and advanced data science to power every aspect of the Uber experience—from dispatch to customer support. Atul Gupte and Nikhil Joshi explore how Uber enables teams to transform insights into intelligence and facilitate critical workflows. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Shirshanka Das (LinkedIn), Mars Lan (LinkedIn)
Imagine scaling metadata to an organization of 10,000 employees, 1M+ data assets, and an AI-enabled company that ships code to the site three times a day. Shirshanka Das and Mars Lan dive into LinkedIn’s metadata journey from a two-person back-office team to a central hub powering data discovery, AI productivity, and automatic data privacy. They reveal metadata strategies and the battle scars. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
With cheap and scalable storage services such as S3 and ADLS, it's never been easier to dump data into a cloud data lake. But you still need to secure that data and be sure it doesn't leak. Tomer Shiran and Jacques Nadeau explore capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest), and auditing, as well as network protections. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Secondary topics:  Ethics, Privacy and Security
Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum)
From the EU to California and China, more of the world is regulating how data can be used. Andrew Burt and Brenda Leong convene leading experts on law and data science for a deep dive into ways to regulate the use of AI and advanced analytics. Come learn why these laws are being proposed, how they’ll impact data, and what the future has in store. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Panos Alexopoulos (Textkernel)
In an era where discussions among data scientists are monopolized by the latest trends in machine learning, the role of semantics in data science is often underplayed. Panos Alexopoulos presents real-world cases where making fine, seemingly pedantic, distinctions in the meaning of data science tasks and the related data has helped improve significantly the effectiveness and value. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Keshav Peswani (Expedia Group), Ashish Aggarwal (Expedia Group)
Observability is the key in modern architecture to quickly detect and repair problems in microservices. Modern observability platforms have evolved beyond simple application logs and include distributed tracing systems like Zipkin and Haystack. Keshav Peswani and Ashish Aggarwal explore how combining them with real-time, intelligent alerting mechanisms helps in the automated detection of problems. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Nan Zhu (Uber), Felix Cheung (Uber)
XGBoost has been widely deployed in companies across the industry. Nan Zhu and Felix Cheung dive into the internals of distributed training in XGBoost and demonstrate how XGBoost resolves the business problem in Uber with a scale to thousands of workers and tens of TB of training data. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Tomer Levi (Fundbox)
Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster. Read more.
2:05pm–2:45pm Wednesday, 09/25/2019
TBC

2:55pm

Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Session
Sponsored
Dong Li (Kyligence)
Your analytics are biased. Efforts to extract meaning by manually scrubbing, indexing, and parsing big data is limited by time, cost, and human assumptions. This session demonstrates augmented analytics. It takes OLAP into the future with artificial intelligence, ensuring objective and unique insights that cover all relevant scenarios found in petabytes of multidimensional and variable data. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Session
Sponsored
Jungwook Seo (SK Holdings)
Jungwook Seo walks you through a data analytics platform in the cloud by the name of AccuInsight+ with eight data analytic services in the CloudZ (one of the biggest cloud service providers in Korea), which SK Holdings announced in January 2019. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Weisheng Xie (Orange Finance), Sijie Guo (Apache Software Foundation)
As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, risk control decision deployment has been critical to its success. Weisheng Xie and Sijie Guo explore how the company leverages Apache Pulsar to boost the efficiency of its risk control decision development for combating financial frauds of over 50 million transactions a day. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Session
Sponsored
Alberto Nieto (Esri)
Digital location data is a crucial part of data science. The "where" matters as much to an analysis as the "what" and the "why." Alberto Nieto explores tools that help you apply a range of geospatial techniques in your data science workflows to get deeper insights. He walks you through the concepts of spatial data science with demos where he uses these tools to solve real-world problems. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Secondary topics:  Financial Services
Jari Koister (FICO )
Machine learning and constraint-based optimization are both used to solve critical business problems. They come from distinct research communities and have traditionally been treated separately. But Jari Koister examines how they're similar, how they're different, and how they can be used to solve complex problems with amazing results. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Kai Liu (BING) (Microsoft), Jack Zhang (Microsoft), Jing Zhao (Microsoft)
Facilitating large-scale deep learning projects in parallel requires effort and innovation. Bing now runs a deployment of thousands of servers to address this challenge. Kai Liu, Jack Zhang, and Jing Zhao detail how Bing provides training services, offline data processing, vector hosting, and inferencing service offline to help data scientists through all steps in the project lifecycle. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber)
At Uber’s scale and pace of growth, a robust system for discovering and managing various entities, from datasets to services to pipelines, and their relevant metadata is not just nice to have: it is absolutely integral to making data useful at Uber. In this talk, we will explore the current state of metadata management and end-to-end data flow solutions at Uber and what’s coming next. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Secondary topics:  Privacy and Security
Marcus Fowler (Darktrace)
Cyber security must find what it doesn’t know to look for. AI technologies have led to the emergence of self-learning, self-defending networks that achieve this – detecting and autonomously responding to in-progress attacks in real time. These cyber immune systems enable the security team to focus on high-value tasks, can counter even machine-speed threats, and work in all environments. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Tim McKenzie (Pitney Bowes)
Tim McKenzie examines why planning 5G network rollout and associated services requires a good understanding of location-based data. Accurate addressing and linking consumers to property or points of interest allows data enrichment with attributes, demographics and social data. Companies use location to organize and analyze network and customer data to understand where to target new services. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Secondary topics:  Privacy and Security
Mark Hinely (KirkpatrickPrice)
The fear that comes along with new compliance requirements is overwhelming. Organizations don’t know where to start, what to fix, or what an auditor expects to see. Mark Hinely gives you an auditor's perspective on the newest security and privacy regulations, how your business can prepare for compliance, and what the audit looks like to an auditor. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Gerard de Melo (Rutgers University)
Gerard de Melo takes a deep dive into the kinds of sentiment and emotion consumers associate with a text. With new data-driven approaches, organizations can better pay attention to what's being said about them in different markets. And you can consider fonts and palettes best suited to convey specific emotions, so organizations can make informed choices when presenting information to consumers. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Tony Xing (Microsoft), Bixiong Xu (Microsoft), Congrui Huang (Microsoft), Qiyang Li (Microsoft)
Anomaly detection may sound old fashioned, yet it's super important in many industry applications. Tony Xing, Bixiong Xu, Congrui Huang, and Qiyang Li detail a novel anomaly-detection algorithm based on spectral residual (SR) and convolutional neural network (CNN) and how this method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Fei Wang (CarGurus)
Fei Wang takes a deep dive into a case study for the CarGurus TV Attribution Model. You'll understand how you can leverage the creation of a causal inference model to calculate cost per acquisition (CPA) of TV spend and measure effectiveness when compared to CPA of digital performance marketing spend. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)
A business insight shows a sudden spike. It can take hours, or days, to debug data pipelines to find the root cause. Shradha Ambekar, Sunil Goplani, and Sandeep Uttamchandani outline how Intuit built a self-service tool that automatically discovers data pipeline lineage and tracks every change, helping debug the issues in minutes—establishing trust in data while improving developer productivity. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Secondary topics:  Ethics
Farrah Bostic (The Difference Engine)
We're living in a culture obsessed with predictions. In politics and business, we collect data in service of the obsession. But our need for certainty and control leads some organizations to be duped by unproven technology or pseudoscience—often with unforeseen societal consequences. Farrah Bostic looks at historical—and sometimes funny—examples of sacrificing understanding for "data." Read more.

3:35pm

3:35pm–4:35pm Wednesday, 09/25/2019
Afternoon break sponsored by MemSQL (1h)

4:35pm

Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Session
Sponsored
Chuck Yarbrough (Hitachi Vantara)
Cleaning the Swamp: How DataOps practices and a modern data architecture bring greater visibility and allow faster data access with proper governance Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)
Trill has been open-sourced, making the streaming engine behind services like the multi-billion-dollar Bing Ads platform available for all to use and extend. We give a brief history of streaming data at Microsoft and lessons learned. We then demonstrate how its API can power complex application logic, and the performance that gives the engine its name: a trillion events per day per node. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Criteo’s infrastructure provides capacity and connectivity to host Criteo’s platform and applications. The evolution of our infrastructure is driven by the ability to forecast Criteo’s traffic demand. In this talk, we explain how Criteo uses Bayesian Dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Prakhar Jain (Qubole), Sourabh Goyal (Qubole)
Autoscaling of resources aims to achieve low latency for a big data application, while reducing resource costs at the same time. Upscale a cluster in cloud is fairly easy as compared to downscaling nodes and so overall Total-cost-of-ownership (TCO) goes up. We will talk about new design to get efficient downscaling which further helps in achieving better resource utilization and thus lower TCO. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Max Neunhöffer (ArangoDB), Joerg Schad (Suki)
Machine Learning Platforms being built are becoming more complex with different components each producing their own metadata. Currently, most components provide their own way of storing metadata. In this talk, we propose a first draft of a common Metadata API and demo a first implementation of this API in Kubeflow using ArangoDB, which is a native multi-model database. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Jeff Zemerick (Mountain Fog)
This talk describes how open source technologies can be used to identify and remove PHI from streaming text in an enterprise healthcare environment. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Vlad Eidelman (FiscalNote)
While regulations affect your life every day, and millions of public comments are submitted to regulatory agencies in response to their proposals, analyzing the comments has traditionally been reserved for legal experts. In this talk, we show how natural language processing and machine learning can be used to automate the process by analyzing over 10 million publicly released comments. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Elasticsearch allows extremely quick search and drilldowns on large amounts of semistructured data. Elasticsearch, however, does not have relational join capabilities. In this presentation I'll introduce a plugin for ES that adds cluster distributed joins and demonstrate how it enables an exciting array of use cases dealing with interconnected or "Knowledge Graph" enterprise data. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
John Berryman (Eventbrite)
Eventbrite is exploring a new machine learning approach that allows us to harvest data from customer search logs and automatically tag events based upon their content. The results have allowed us to provide users with a better inventory browsing experience. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Siddha Ganju (NVIDIA), Meher Kasam (Square)
Optimizing deep neural nets to run efficiently on mobile devices. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Robert Pesch (inovex GmbH), Robin Senge (inovex GmbH)
In this talk, we outline the development process, the statistical modeling, the data-driven decision making, and the components needed for productionizing a fully automated and highly scalable demand forecasting system for an online grocery shop for a billion-dollar retail group in Europe. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Wangda Tan (Cloudera), Arpit Agarwal (Hortonworks Inc.)
In this talk, we’ll start with the current status of Apache Hadoop community, we'll then move on to the exciting present & future of Hadoop 3.x. We will cover new features like erasure coding, GPU support, namenode federation, Docker, long-running services support, powerful container placement constraints, data node disk balancing, etc. Also we will talk about upgrade guidance from 2.x to 3.x. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Andrew Brust (ZDNet | Blue Badge Insights)
A primer on data catalogs and review of the major vendors and platforms in the market. Includes discussion on the use of data catalogs with classic and newer data repositories, including data warehouses, data lakes, cloud object storage and even software/applications. Coverage of AI's role in the data catalog world and analysis of data catalog futures will be provided. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Session
Sponsored
Daniel D'Orazio (Matillion)
According to Forrester, insight-driven companies are on pace to make $1.8 trillion annually by 2021. How fast can your team collect, process, and analyze data to help solve present — and future — business challenges? This session shares actionable tips and lessons learned from cloud data warehouse modernizations at companies like DocuSign and others that you can take back to your business. Read more.

5:25pm

Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Session
Sponsored
David Leichner (SQream)
What started as an asset for data scientists and BI professionals has become a poorly-performing problem. This session will explore the Hadoop ecosystem and relational databases from an analytics perspective - reviewing the current landscape, what Hadoop was designed for, and how a Hadoop-based infrastructure can be improved to support a new era of exponentially growing data. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Bas Geerdink (ING)
Streaming Analytics (or Fast Data processing) is the field of making predictions on real-time data. In this talk, I'll present a fast data architecture that covers many use cases that follows a 'pipes and filters' pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Ignite, and Spark Structured Streaming (KISSS). Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Secondary topics:  Retail and e-commerce
Subhasish Misra (Walmart )
Causal questions are ubiquitous. Randomized tests are considered to be the gold standard for these. However, such tests are not always feasible and then, one just has observational data to get to causal insights. Techniques such as matching offer a solve then. This talk will offer a take on the above aspects, plus share practical tips when trying to infer causal effects. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Chenzhao Guo (Intel Asia-Pacific Research & Development Ltd.), Carson Wang (Intel)
Shuffle in Spark requires the shuffle data to be persisted on local disks.However, the assumptions of collocated storage do not always hold in today’s data centers. We implemented a new Spark shuffle manager, which writes shuffle data to a remote cluster with different storage backends. This makes life easier for those customers who want to leverage the latest storage hardware, and HPC customers Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what data sets are available for consumption. To address this challenge, a custom metadata management tool was recently deployed as a new capability at Bayer. The system is cloud enabled and uses multiple open source components including machine learning and natural language processing to aid search. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Matt Carothers (Cox Communications), Jignesh Patel (Cox Communications), Harry Tang (Cox Communication Inc)
Organizations often work with sensitive information such as social security number, and Credit card information. Although this data is stored in encrypted form, most analytical operations ranging from data analysis to advanced machine learning algorithms require data decryption for computation. This creates unwanted exposures to theft or unauthorized read by undesirables. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Thiago Ribeiro (Griaule)
Brazil deployed a national biometric system to register all Brazilian voters using multiple biometric modalities and to ensure that a person does not enroll twice. This session highlights how a large-scale biometric system works, and what are the main architecture decisions that one has to take in consideration. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Brindaalakshmi K (Independent Consultant)
There is a lack of standard for the collection of gender data. This session takes a look at the implications of such a lack in the context of a developing country like India, the exclusion of individuals beyond the binary genders of male and female and how this exclusion permeates beyond the public sector into private sector services. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Emily Webber (Amazon Web Services)
Mansplaining. Know it? Hate it? Want to make it go away? In this session we tackle the chronic problem of men talking over or down to women and its negative impact on career progression for women. We will also demonstrate an Alexa skill that uses deep learning techniques on incoming audio feeds. We discuss ownership of the problem for both women and men, and suggest helpful strategies. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
The common perception of deep learning is that it results in a fully self-contained model. However, in most cases these models have similar requirements for data pre-processing as more "traditional" machine learning. Despite this, there are few standard solutions for deploying end-to-end deep learning. In this talk, I show how the ONNX format and ecosystem is addressing this challenge. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Secondary topics:  Media and Advertising
Aaron Owen (Major League Baseball), Matthew Horton (Major League Baseball), Josh Hamilton (MLB)
Utilizing SAS, Python, and AWS Sagemaker, MLB’s data science team discusses how it predicts ticket purchasers’ likelihoods to purchase again, evaluates prospective season schedules, estimates customer lifetime value, optimizes promotion schedules, quantifies the strength of fan avidity, and monitors the health of monthly subscriptions to its game-streaming service. Read more.
5:25pm–6:05pm Wednesday, 09/25/2019 TBC
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Alasdair Allan (Babilim Light Industries)
A arrival of new generation of smart embedded hardware may cause the demise of large scale data harvesting. In its place smart devices will allow us process data at the edge, allowing us to extract insights from the data without storing potentially privacy and GDPR infringing data. The current age where privacy is no longer "a social norm" may not long survive the coming of the Internet of Things. Read more.

6:05pm

Add to your personal schedule
6:05pm–7:05pm Wednesday, 09/25/2019
Event
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

7:30pm

Add to your personal schedule
7:30pm–10:30pm Wednesday, 09/25/2019
Event
Don't miss an exciting evening filled with cocktails, food, and entertainment at Data After Dark at Strata in New York. Read more.

Thursday, 09/26/2019

8:15am

Add to your personal schedule
8:15am–8:45am Thursday, 09/26/2019
Event
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:45am

Add to your personal schedule
8:45am–8:55am Thursday, 09/26/2019
Keynote
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:55am

Add to your personal schedule
8:55am–9:10am Thursday, 09/26/2019
Keynote
Cassie Kozyrkov (Google)
Machine learning and artificial intelligence are no longer science fiction, but what does it take to harness their potential effectively, responsibly, and reliably? Based on lessons learned at Google, this talk will offer actionable advice to help you find opportunities to take advantage of machine learning, navigate the AI era, and stay safe as you innovate. Read more.

9:20am

Add to your personal schedule
9:20am–9:30am Thursday, 09/26/2019
Keynote
Details to come. Read more.

9:35am

Add to your personal schedule
9:35am–9:55am Thursday, 09/26/2019
Keynote
Robert Thomas (IBM), Tim O'Reilly (O'Reilly Media)
AI has the potential to add $16 trillion global economy by 2030, but adoption has been slow. While we understand the power of AI, many of us aren’t sure how to fully unleash its potential. The reality is: AI is not magic. It’s hard work. Read more.

9:55am

Add to your personal schedule
9:55am–10:00am Thursday, 09/26/2019
Keynote
Edward Jezierski (Microsoft)
At Microsoft, we have an ecosystem spanning research, gaming and cloud that is advancing RL and putting it into everyday use. Join Edward Jezierski to see where RL is used practically across Microsoft and imagine the opportunities that exist for your business today. Read more.

10:00am

Add to your personal schedule
10:00am–10:05am Thursday, 09/26/2019
Keynote
The Strata Data Awards recognize the most innovative startups, leaders, and data science projects from Strata sponsors and exhibitors around the world. Join us during keynotes for the announcement of the winners. Read more.

10:10am

Add to your personal schedule
10:10am–10:25am Thursday, 09/26/2019
Keynote
Jonathan Foster (Microsoft)
Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences. Read more.

10:25am

Add to your personal schedule
10:25am–10:30am Thursday, 09/26/2019
Keynote
Jed Dougherty (Dataiku)
One of the more common and fairly widely accepted definitions is that AI means going beyond simple statistics to mimic human skills in perception, learning, interaction, and decision making. But even this definition leaves some room for interpretation. Jed Dougherty breaks down the different parts of that definition and how they might manifest themselves in data science projects. Read more.

10:30am

Add to your personal schedule
10:30am–10:50am Thursday, 09/26/2019
Keynote
Alan Smith (Financial Times)
Based on a critical evaluation of the iconic yield curve chart, this talk argues that combining visualisation (data to pixels) with sonification (data to pitch) offers potential to improve not only aesthetic multimedia experiences - but also an opportunity to take the presentation of data into the rapidly expanding universe of screenless devices and products. Read more.

10:50am

10:50am–11:20am Thursday, 09/26/2019
Morning break sponsored by Cisco (30m)

11:20am

Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Session
Sponsored
Edward Jezierski (Microsoft), Jackie Nichols (Microsoft)
In this session we’ll show you how Personalizer works with your content and data, how it autonomously learns to make optimal decisions, how you can add it to your app with two lines of code, and how to understand what’s under the hood. We’ll share the results Personalizer achieved on the Xbox One home page and best practices for applying it in your applications today. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Session
Sponsored
Charles Boicey (Clearsense)
Healthcare’s reliance on comprehendible data is critical to the mission of providing optimal and affordable care. Learn how the application of technology, such as machine learning, is paramount to the modernisation of healthcare that provides its professionals with fully integrated and complete medical records. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Michael Freedman (TimescaleDB)
Leveraging polyglot solutions for your time-series data can lead to a variety of issues including engineering complexity, operational challenges, and even referential integrity concerns. By re-engineering Postgres to serve as a general data platform, your high-volume time-series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Secondary topics:  Ethics
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company, and introduce key tools and techniques from latest research that allow us to assess and mitigate undesired bias in our machine learning models. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend )
In this talk, we discuss online machine learning algorithm choices for streaming applications. We motive the discussion with resource constrained use cases like IoT and personalization. We cover Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms, all the way from implementation to production deployment, describing the pros and cons of using each of them. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey)
You are a SaaS company that operates on a cloud infra prior to the ML era. How do you successfully extend your existing infrastructure to leverage the power of ML? In this case study, you will learn critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Rick Houlihan (Amazon Web Services)
Data has always been relational, and it always will be. NoSQL databases are gaining in popularity, but that does not change the fact that the data they manage is still relational, it just changes how we have to model the data. This session dives deep into how real Entity Relationship Models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.
11:20am–12:00pm Thursday, 09/26/2019
TBC
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
John Allen (Deutsche Bank)
As an early adopter of data science, machine learning, and AI, Deutsche Bank's analytics function is trailblazing new ways to drive revenues, lower costs, and reduce risk across all areas of the group. John Allen shares how his team combines commercial offerings with open source technologies to revolutionize legacy processes and transform the way the bank uses technology to drive innovation. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Brian Keng (Rubikloud Technologies Inc)
Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure that they are robust and consistent with business goals. This talk will describe how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Shital Shah (Microsoft Research)
How do we visualize what exactly deep learning is doing? Taming the massive models, data and training times requires new way of thinking about them. In talk we will introduce explore new tools and methods to understand AI better. Explaining the decisions made by AI not only helps us accelerate its development but also make it safe and more trustworthy. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Anjali Samani (CircleUp)
The application of smoothing and imputation strategies is common practice in predictive modelling and time series analysis. With a technique-agnostic approach, this session will provide qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Petar Zecevic (SV Group d.o.o.)
Large Scale Survey Telescope, or LSST, is one of the most important future surveys. Its unique design will allow it to cover large regions of the sky and obtain images of the faintest objects. In 10 years of its operation it will produce about 80 PB of data, both in images and catalog data. I will present AXS, a system we built for fast processing and cross-matching of survey catalog data. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Secondary topics:  Culture and Organization
Gayle Bieler (RTI International)
This presentation is about building a thriving Center for Data Science within a large and well-respected non-profit research institute. I'll discuss my transformation from an entrepreneurial statistician to data science leader, as well as some of our most impactful projects and best adventures to date--solving important national problems, improving our local communities, and transforming research. Read more.

12:00pm

12:00pm–1:15pm Thursday, 09/26/2019
Break (1h 15m)
Add to your personal schedule
12:00pm–1:15pm Thursday, 09/26/2019
Event
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.
Add to your personal schedule
12:00pm–1:15pm Thursday, 09/26/2019
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:15pm

Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Session
Sponsored
Paul Scott-Murphy (WANdisco)
What you’ll learn: The options that exist for cloud migration, their advantages and disadvantages * What cloud vendors do and don't offer to support large-scale migration *The business risks associated with large-scale cloud migration *How to migrate analytics data at scale for immediate use in Spark without disrupting on-premises operations Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Alon Gavra (AppsFlyer)
Kafka, many times is just a piece of the stack that lives in production that often times no one wants to touch - because it just works. At AppsFlyer, Kafka sits at the core of our infrastructure that processes billions of events daily. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Session
Sponsored
Dan DeMers (Cinchy)
After 40 years of apps, enterprise companies are now realizing that building or buying an application for every use case has become a major threat to their ability to leverage and protect their core data assets. Join Cinchy CEO Dan DeMers for this live demo of Cinchy, the World’s first Data Collaboration Platform. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Sandra Carrico (Glynt.ai)
This talk motivates mixed formal learning, explains it and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to Mixed Formal Learning, a general AI architecture that you can use in your projects. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Jim Scott (NVIDIA)
Data scientists are creating and testing hundreds or thousands more models than in the past. Models require support from both real-time and static data sources. As data becomes enriched, and parameters tuned and explored, there is a need for versioning everything, including the data. We will discuss the very specific problems and approaches to fix them. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Omkar Joshi (Uber Technologies), Bo Yang (uber inc)
Omkar Joshi and Bo Yang offer an overview of how Uber’s ingestion (Marmary) & observability team improved performance of Apache Spark applications running on thousands of cluster machines and across 100 thousands+ of applications and how they methodically tackled these issues. They will also cover how they used Uber’s open sourced jvm-profiler for debugging issues at scale. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Shant Hovsepian (Arcadia Data)
With cloud object storage (e.g. S3, ADLS) one expects business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces non-obvious challenges. This talk will review service-oriented cloud design (storage, compute, catalog, security, SQL) and shows how native cloud BI provides analytic depth, low cost and performance Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Secondary topics:  Culture and Organization
Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
Ever confused about what it takes to be a data scientist? Or curious about how companies recruit, train and manage analytics resources? This presentation covers insight from the most comprehensive research effort to-date on the data analytics profession, propose a framework for standardization of roles in the industry and methods for assessing skills. Read more.
1:15pm–1:55pm Thursday, 09/26/2019
TBC
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Victor Dibia (Cloudera Fast Forward Labs)
Recent advances in Machine Learning frameworks for the browser such as Tensorflow.js provides opportunity to craft truly novel experiences within front-end applications. This talk explores the state of the art for Machine Learning in the browser using Tensorflow.js and covers its use in the design of Handtrack.js - a library for prototyping real time hand detection in the browser. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Sameer Agarwal (Facebook Inc.)
Apache Spark is the largest compute engine at Facebook by CPU. This talk will cover the story of how we optimized, tuned and scaled Apache Spark at Facebook to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and used by thousands of data scientists, engineers and product analysts every day. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Alfred Whitehead (Klick), Clare Jeon (KLICK INC)
What will tomorrow’s temperature be? My blood glucose levels tonight before bed? Time series forecasts depend on sensors or measurements made out in the real, messy world. Those sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing data in our signals. We will show a number of methods for handling data gaps and give advice on which to consider and when. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Jason Wang (Cloudera), Sushant Rao (Cloudera)
We’ll give you actionable understanding of cloud architecture and different approaches customers took in their journey to the cloud. We start with the different ways we’ve seen customers be successful in the cloud. Then deep dive into the decisions they made, and how that drove their cloud architecture. Along the way we review problems they overcame, lessons learned, and core cloud paradigms. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Secondary topics:  Culture and Organization, Ethics
Michael Kubiske (Captial One)
This talk will explore some of the philosophy around the concept of explaining a model given the colloquial definition is partially recursive. It will cover the lens banking regulation places on this philosophical basis and expand into techniques used for these well governed aspects. Read more.

2:05pm

Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Session
Sponsored
Jim Cushman (Collibra), Piyus Jain (Progressive)
Transforming data into a trusted business asset that informs decision-making requires giving teams access to a powerful platform that makes it easy to harness data across the enterprise. In this session, you'll hear how Progressive uses Collibra to transform the way data is managed and used across the organization, driving real business value. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Session
Sponsored
Matt Derda (Trifacta)
Clinical Trial data analysis can be a complex process. The data is typically hand coded, formatted differently and is required to be delivered in an FDA-approved format. During this Session, IQVIA will share its experiences building a Clean Patient Tracker and how it enabled agility and flexibility for end-users of the platform, from data acquisition to reporting and analytics. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Narvar provides next generation post transaction experience for over 500+ retailers. This talk explores the journey of how Narvar moving away from using a slew of technologies for their platform and consolidating their use cases using Apache Pulsar. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Andrew Leamon (Comcast), Wadkar Sameer (Comcast NBCUniversal)
And overview of the Data Management and privacy challenges around automating ML model (re)deployments and stream based inferencing at scale. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Diego Oppenheimer (Algorithmia)
Machine Learning (ML) will fundamentally change the way we build and maintain applications. How can we adapt our infrastructure, operations, staffing, and training to meet the challenges of the new Software Development Life Cycle (SDLC) without throwing away everything that already works? Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Reza Shiftehfar (Uber Technologies)
Building a reliable Big Data platform is extremely challenging when it has to store and serve 100s of PetaBytes of data in a real-time fashion . This talk reflects on the challenges faced and proposes architectural solutions to scale a Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing the hardware and meeting the security needs. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. In this talk we describe how companies can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize various workloads simultaneously. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Alex Yoon (T-Mobile)
T-Mobile successfully improved the quality of voice calling by analyzing crowd sourced big data from mobile devices. In this session, you will learn how engineers from multiple backgrounds collaborated to achieve 10% improvement in voice quality and why the analysis of big data was the key to the success in bringing a better voice call service quality to millions of end users. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Akshay Rai (Linkedin)
Failures or issues in a product or service can negatively affect the business. Detecting issues in advance and recovering from them is crucial to keep the business alive. Come, join us, to learn more about LinkedIn's next-generation open-source monitoring platform, an integrated solution for real-time alerting and collaborative analysis. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
In this talk, we show how to develop a machine learning pipeline for streaming data using the StreamDM framework (https://github.com/huawei-noah/streamDM). We also introduce how to use StreamDM for supervised and unsupervised learning tasks, show examples of online preprocessing methods, and how to expand the framework adding new learning algorithms or preprocessing methods. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Secondary topics:  Deep Learning, Streaming and IoT
Ryan Foltz (Exabeam)
Unmanaged & foreign devices in the corporate networks pose a security risk. The 1st step toward reducing risk from these devices is the ability to identify them. To have a comprehensive device management program, we proposed a machine learning model based on Deep Learning to perform anomaly detection based on only device names to flag devices that do not follow device naming structures. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Anais Dotis (InfluxData)
Did you know that Classical algorithms outperform Machine Learning methods in time series forecasting? I’ll show you how I used the Holt-Winters forecasting algorithm to predict water levels in a creek. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Nikki Rouda (Amazon Web Services), Roy Hasson (Amazon Web Services)
Learn how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. Link customer records across different databases (e.g. different name spelling or address.) Match external product lists against your own catalog, such as lists of hazardous goods. Solve tough challenges to prepare and cleanse data for analysis. Read more.
2:05pm–2:45pm Thursday, 09/26/2019
TBC

2:45pm

2:45pm–3:45pm Thursday, 09/26/2019
Afternoon break sponsored by Io-Tahoe (1h)

3:45pm

Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Architecture and lessons learned from development of T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides 3D visualized real-time status of the whole network and services for the operators and analytics platform for data scientists, engineers and developers. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Secondary topics:  Financial Services
David Mack (Octavian)
Graphs are a powerful way to represent knowledge. Organizations (in fields such as bio-sciences and finance) are starting to amass large knowledge graphs, but lack the machine-learning tools to extract the insights they need from them. In this presentation, I’ll give an overview of what insights are possible and survey the most popular approaches. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
As an increasing level of automation is becoming available to data science, there is a balance between automation and quality that needs to be maintained. Applying DevOps practices to machine learning workloads not only brings models to the market faster but also maintains the quality and integrity of those models. This presentation will focus on applying DevOps practices to ML workloads. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Vitaliy Baklikov (Development Bank of Singapore), Dipti Borkar (Alluxio )
In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Owen O'Malley (Cloudera)
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. This talk describes how column encryption in ORC files enables both fine grain protection and audits of who accessed the private data. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
At MakeMyTrip, India’s leading online travel platform, customers were using voice or email to contact agents for post sale support. In order to improve the efficiency of agents and improve customer experience, MakeMyTrip developed a chatbot Myra using some of the latest advances in deep learning. In this talk, we will discuss the high level architecture and the business impact created by Myra. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada), Ryan Hum (National Energy Board, Canada)
As new digital platforms emerge and governments look at new ways to engage with citizens, there is an increasing awareness of the role these platforms play in shaping public participation and democracy. This talk examines the design attributes of civic engagement technologies, and their ensuing impacts. A framework for better achieving desired outcomes is demonstrated with a NEB Canada case study. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Secondary topics:  Deep Learning
Sajan Govindan (Intel), Luca Canali (CERN)
We will show CERN’s research on applying Deep Learning in High Energy Physics experiments as an alternative to customized rule based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. CERN implemented deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Chad Scherrer (Metis)
This talk will explore the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Tom O'Neill (Sisense)
CCO Tom O’Neill will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its 2,000+ teams as part of Sisense.. He’ll highlight the process of migrating from Heroku to Kubernetes and discovering new ways to leverage its power, plus other developments that have allowed users to delve deeper into new data science and ML analysis. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
GE Aviation has made it a mission to implement Self-Service Data. To ensure success beyond initial implementation of tools, the Data Engineering and Analytics teams at GE Aviation created initiatives designed to foster engagement from an ongoing partnership with each part of the business to the gamification of tagging data in a data catalog to forming a Published Dataset Council. Read more.

4:35pm

Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Neelesh Salian (Stitch Fix)
It is important to understand why Data Lineage is needed for an organization. Once the purpose is defined, we can talk about how to go about building such a system. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Secondary topics:  Transportation and Logistics
Brandy Freitas (Pitney Bowes)
In this session, Brandy Freitas from Pitney Bowes will cover the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and harnessing the power of graph structure for machine learning through node embedding. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Supun Kamburugamuve (Indiana University)
Big data computing and high-performance computing (HPC) has evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms are increasingly embracing each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
Microsoft big data team run experiment to use Spark and Jupyter notebook as a replacement of existing IDE based diagnose tools for internal DevOps. Experiment result indicates the Spark based solution has improved the diagnosis performance significantly especially for complex job with large profile, and leveraging Jupyter notebook also bring the benefit of fast iteration and easy knowledge share. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Secondary topics:  Privacy and Security
Mark Donsky (Okera)
California is following the EU's GDPR with the California Consumer Protection Act (CCPA) in 2020. Penalties for non-compliance, but many companies aren't prepared for this strict regulation. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Secondary topics:  Deep Learning
Naoto Umemori (NTT DATA Corporation), Masaru Dobashi (NTT Data Corp.)
Giant Hogweed is a highly toxic plant. Our project aims to automate the process of detecting the Giant Hogweed by exploiting technologies like drones and image recognition/detection using Machine Learning. We show you how we designed the architecture, how we took advantage of both of Big Data and Machine / Deep Learning technologies (e.g. Hadoop, Spark and TensorFlow) and lessons learned. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Jeroen Janssens (Data Science Workshops B.V.)
In this talk, we present Stochastic Outlier Section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and most recently, Spark. First, we illustrate the idea and intuition behind SOS. Subsequently, we demonstrate our implementation of SOS on top of Spark. Finally, we apply SOS to a real-world use case. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Evgeny Vinogradov (Yandex.Money)
With a microservice architecture, DWH is a first place where all the data gets together. It supplied by many different datasources. It is used for many purposes – from near-OLTP till models fitting and realtime classifying. Talk will cover our experience in management and scaling of data Engineering Team and infrastructure for support of 20+ Product Teams. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Dean Wampler (Lightbend)
Join me for a discussion of the following problems and their solutions: 1. How (and why) to integrate ML into production streaming data pipelines, to serve results quickly? 2. How to bridge data science and production environments, with different tools, techniques, and requirements? 3. How to build reliable and scalable, long-running services? 4. How to update ML models without downtime? Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts