Sep 23–26, 2019

Schedule

Monday, 09/23/2019

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Michael Li (The Data Incubator), Ana Hocevar (The Data Incubator)
Michael Li and Ana Hocevar offer a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and utilize their input and analysis for your business’s strategic priorities and decision making. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ Consulting)
In this two-days workshop, you will learn the different paradigms of recommendation systems and get introduced to the usage of deep-learning based approaches . By the end of the workshop, you will have enough practical hands-on knowledge to build, select, deploy and maintain a recommendation system for your problem. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Michael Cullan (The Data Incubator)
Michael Cullan walks you through developing a machine learning pipeline, from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Jesse Anderson (Big Data Institute)
Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 09/23/2019
Dylan Bargteil (The Data Incubator)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil offers an overview of TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.

10:30am

10:30am–11:00am Monday, 09/23/2019
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 09/23/2019
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, 09/23/2019
Afternoon break (30m)

Tuesday, 09/24/2019

9:00am

Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00am–5:00pm Tuesday, 09/24/2019
Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Priceline), Victoriya Kalmanovich (Navy), Shreya Sharma (Expedia Inc.), Naghman Waheed (Bayer Crop Science), Martin Mendez-Costabel (Bayer Crop Science), Gloria Macia (Roche AG), Gwen Campbell (Revibe Technologies, Inc)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata. Read more.
Add to your personal schedule
9:00am–5:00pm Tuesday, 09/24/2019
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Dan Barker (RSA Security), Chelsea Douglas (Plotly), Rochelle March (Trucost), Catherine Gu (Stanford University), elva fernandez (American Express), Moto Tohda (Tokyo Century (USA) Inc.), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Rossella Blatt Vital (Wonderlic)
Creating and leading a successful ML strategy is an elegant orchestration of many components: master the key ML concepts, operationalize the ML workflow, prioritize highest value projects, build a high performing team, nurture strategic partnerships, align with the company’s mission, etc. This tutorial aims to share insights and lessons learned in how to create and lead a flourishing ML practice. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Sourav Dey (Manifold), Alex Ng (Manifold)
Many teams are still run as if data science is about experimentation, but those days are over. Now it must offer turnkey solutions to take models into production. We'll explain how to streamline a ML project and help your engineers work as an integrated part of production teams, using a Lean AI process and the Orbyter package for Docker-first data science. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Jules Damji (Databricks)
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but what happens when we run into text data? In this tutorial, I will walk through the steps to turn text data into a format that a machine can understand, share some of the most popular text analytics techniques, and showcase several natural language processing (NLP) libraries in Python including NLTK, TextBlob, spaCy and gensim. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Bruno Goncalves (Data For Science, Inc)
Students will learn, in a hands-on way, the theoretical foundations and principal ideas underlying Deep Learning and Neural Networks. The code structure of the implementations provided is meant to closely resemble he way Keras is structured so that by the end of the course, students will be prepared to dive deeper into the deep learning applications of their choice. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
In this tutorial, we shall walk the audience through the landscape of streaming systems and overview the inception and growth of the serverless paradigm. Next, we shall present a deep dive of Apache Pulsar which provides native serverless support in the form of Pulsar functions and paint a bird’s eye view of the application domains where Pulsar functions can be leveraged. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Ricardo Ferreira (Confluent)
Building stream processing applications are certainly one of the hot topics among the IT community. Though a lot has been talked about this subject, one might say that building stream processing applications is the new sex during teenage. This tutorial aims to change this by introducing KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera)
Too many edge devices and agents. How does one control and manage them. How do we have handle the difficulty in collecting real-time data and most importantly, the trouble with updating specific set of agents with edge applications. Get your hands dirty with Cloudera Edge Management that addresses these challenges with ease. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/24/2019
Mark Donsky (Okera)
New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span on-prem, private cloud, multi-cloud, and hybrid cloud. We will share hands-on best practices for meeting these challenges, with special attention to CCPA. Read more.

10:30am

10:30am–11:00am Tuesday, 09/24/2019
Morning break (30m)

12:30pm

12:30pm–1:30pm Tuesday, 09/24/2019
Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Mac Steele (Domino Data Lab), Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending, and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. Attendees will learn how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
In this workshop we’ll introduce the Amazon SageMaker machine learning platform, followed by a high level discussion of recommender systems. Next we’ll dig into different machine learning approaches for recommender systems. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
David Talby (Pacific AI), Alex Thomas (Indeed), Saif Addin Ellafi (John Snow Labs)
This is a hands-on tutorial on state-of-the-art NLP using the highly performant, highly scalable open-source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Sophie Watson (Red Hat), William Benton (Red Hat)
In this hands-on workshop, we’ll introduce several data structures that let you answer interesting queries about massive data sets in fixed amounts of space and constant time. This seems like magic, but we'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Mark Madsen (Teradata), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Enterprises adopt Cloud platforms such as AWS for agility, elasticity and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. In this session, you will learn important considerations in choosing the right database based on your use cases and access pattern while migrating an application or building a new application on cloud. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera)
Kafka is omnipresent and is the backbone of not only streaming analytics applications but data lakes as well. The challenge is understanding what is going on overall in the Kafka cluster including performance, issues and message flows. This session gives a hands on experience to visualize their entire Kafka environment end-to-end and simplifies Kafka operations via SMM. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/24/2019
Carolyn Duby (Hortonworks)
Bring your laptop, roll up your sleeves, and get ready to crunch some cyber security events with Apache Metron, an open source big data cyber security platform. Learn how Metron finds actionable events in real time. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 09/24/2019
Afternoon break (30m)

5:00pm

Add to your personal schedule
5:00pm–6:30pm Tuesday, 09/24/2019
Event
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Tuesday. Read more.

Wednesday, 09/25/2019

8:15am

Add to your personal schedule
8:15am–8:45am Wednesday, 09/25/2019
Event
Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:45am

Add to your personal schedule
8:45am–10:45am Wednesday, 09/25/2019
Keynote
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

10:50am

10:50am–11:20am Wednesday, 09/25/2019
Morning break (30m)

11:20am

Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Navinder Pal Singh Brar (Walmart Labs)
Each week 275 million people shop at Walmart, generating multi-terabytes of interaction and transaction data. In Customer Backbone team, we enable extraction, transforming and storing of customer data to be served to teams such as Ads and Personalisation. At 5 Billion events/day our Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Ted Dunning (MapR)
Feature engineering is generally the section that gets left out of machine learning books, but it is also the most critical part in practice. I will provide a variety of techniques, a few well known, but some rarely spoken of outside the tribal lore of top teams, including how to handle categorical inputs, natural language, transactions and more all in the context of modern machine learning. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Julien Le Dem (WeWork)
Big Data is crucial to organizations. Big not only by volume of data but also by the multitude of datasources and teams using them. Central data teams doing all the work is outdated as the entire organization becomes an ecosystem and central teams become enablers. We will discuss the principles of a data platform enabling the entire organization to build data centric products. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Moty Fania (Intel)
In this session, Moty Fania will share Intel’s IT experience of implementing a Sales AI platform. This platform is based on streaming, micro-services architecture with a message bus backbone. It was designed for real-time, data extraction and reasoning. The platform handles processing of millions of website pages and capable of sifting thru millions of tweets per day. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Steven Touw (Immuta)
Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data security and privacy, they’re everywhere. Over the past 4 years we’ve seen data security and privacy anti-patterns consistently emerge across 100s of customers and industry verticals - there has been an obvious trend. We’ll cover 5 anti-patterns and more importantly, the solutions for them. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Brian Dalessandro (SparkBeyond)
While Data Science value is well recognized within Tech, our experience with leaders across industries shows that the ability to realize and measure business impact is not universal. A core issue is DS programs face unique risks that many leaders aren’t trained to hedge against. This talk addresses these risks and advocates for new ways to think about and manage data science programs. Read more.
11:20am–12:00pm Wednesday, 09/25/2019
TBC
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Harsha Nori (Microsoft), Sameul Jenkins (Microsoft), Rich Caruana (Microsoft)
Understanding decisions made by machine learning systems is critical for sensitive uses, ensuring fairness, and debugging production models. Interpretability is a maturing field of research that presents many options for trying to understand model decisions. Microsoft is releasing new tools to help you train powerful, interpretable models and interpret decisions of existing blackbox systems. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Ying Yau (AllianceBernstein)
Time series forecasting techniques can be applied in a wide range of scientific disciplines, business scenarios, and policy settings. This session discusses the application of deep learning techniques to time series forecasting and compares them to time series statistical models when forecasting time series with trends, multiple seasonality, regime switch, and exogenous series. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Ann Spencer (Domino Data Lab), Paco Nathan (Derwen), Amy Heineike (Primer), Pete Warden (TensorFlow)
Are you a data scientist that has wondered "why does it take so long to deploy my model into production?" Are you an engineer that has ever thought "data scientists have no idea what they want"? You are not alone. Join us for a lively discussion panel, with industry veterans, to chat about best practices and insights regarding how to increase collaboration when developing and deploying models. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Paige Roberts (Vertica), Deepak Majeti (Vertica)
a. Analytics experts, GoodData, needed to auto-recover from node failures and scale rapidly when workloads spike on their MPP database in the cloud. Kubernetes could solve that, but K8 is for stateless micro-services, not a stateful MPP database that needs 100s of containers. In order to merge the power of an MPP database with the flexibility of Kubernetes, a lot of hurdles had to be overcome. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/25/2019
Session
David Talby (Pacific AI)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.

12:00pm

12:00pm–1:15pm Wednesday, 09/25/2019
Lunch (1h 15m)
Add to your personal schedule
12:00pm–1:15pm Wednesday, 09/25/2019
Event
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.
Add to your personal schedule
12:00pm–1:15pm Wednesday, 09/25/2019
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
Add to your personal schedule
12:00pm–1:15pm Wednesday, 09/25/2019
Event
If you’d like to make new professional connections and hear ideas for supporting diversity in the tech community, come to the diversity and inclusion networking lunch on Wednesday. Read more.

1:15pm

Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Michael Noll (Confluent)
Would you cross the street with traffic information that is a minute old? Certainly not! Modern businesses have the same needs. In this talk we cover why and how you can use Kafka and its growing ecosystem to build elastic event-driven architectures. Specifically, we look at Kafka as the storage layer, at Kafka Connect for data integration, and at Kafka Streams and KSQL as the compute layer. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Shioulin Sam (Cloudera Fast Forward Labs)
Supervised machine learning requires large labeled datasets - a prohibitive limitation in many real world applications. What if machines could learn with few labeled examples? This talk explores and demonstrates an algorithmic solution that relies on collaboration between human and machines to label smartly, and discuss product possibilities. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn)
Come hear about the infrastructure and features offered by flexible and scalable hosted data science platform at LinkedIn. The platform provides features to seamlessly develop in multiple languages, enforce developer best practices, governance policies, execute, visualize solutions, efficient knowledge management and collaboration that improve developer productivity. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Wim Stoop (Cloudera)
Establishing enterprise wide security and governance remains a challenge for most organisations. Integrations and exchanges across their landscape are costly to manage and maintain, and typically work in one direction only. In this session, we'll discuss how ODPi's Egeria standard and framework removes the challenges and is leveraged by Cloudera and partners alike to deliver value for customers. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
The Apache Parquet community is working on a column encryption mechanism that protects the sensitive data and enables access control for table columns. Many companies are involved, the mechanism specification has recently been signed off by the community management committee. I will present the basics of Parquet encryption technology, its usage model and a number of use cases. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Felipe Hoffa (Google), Bob Bradley (Geotab)
Geotab is a world's leading asset tracking company, with millions of vehicles under service every day. In the first part of this talk we are going to review their challenges and solutions to create an ML and GIS enabled petabyte scale data warehouse leveraging Google Cloud. Then we are going to review their process to publish open, how to access it, and how cities are using it. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum)
Machine learning techniques are being deployed across almost every industry and sector. But this adoption comes with real, and oftentimes underestimated, privacy and security risks. In this session, Immuta and the Future of Privacy Forum will convene leading industry representatives and experts to talk about real life examples of when ML goes wrong, and the lessons they learned. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Saif Addin Ellafi (John Snow Labs), Scott Hoch (Deep6.ai)
Recruiting patients for clinical trials is a major challenge in drug development. This talk explains how Deep6 utilizes Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. It covers the technical challenges, the architecture of the full solution, and lessons learned. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Nagendra Shishodia (EXL), Chaithanya Manda (EXL Service), Solmaz Torabi (EXL Service)
Every NLP based document processing solution depends on converting scanned documents/ images to machine readable text using an OCR solution. However, accuracy of OCR solutions is limited by quality of scanned images. We show that generative adversarial networks can be used to bring significant efficiencies in any document processing solution by enhancing resolution and de-noising scanned images. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
James Tang (WalmartLabs), Yiyi Zeng (WalmartLabs), Linhong Kang (WalmartLabs)
How No1 retailer provides secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Gil Vernik (IBM)
Most analytic flows can benefit from the serverless, starting with simple cases to complex data preparations for AI frameworks, like TensorFlow. To address the challenge of how to easily integrate serverless, without major disruptions to your system, we present “push to the cloud” experience. This ability dramatically simplifies using serverless for different big data processing frameworks. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/25/2019
Michael Stonebraker (Tamr, Inc.)
As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what NOT to do when planning for your organization’s Big Data initiatives. Dr Michael Stonebraker, Adjunct Professor, MIT, & Co-Founder/CTO, Tamr will discuss his Top 10 Big Data Blunders. Read more.

2:05pm

Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Stephan Ewen (Ververica), Aljoscha Krettek (data Artisans)
The talk discusses how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: New cross-batch-streaming Machine Learning algorithms, State-of-the-art batch performance, and new building blocks for data-driven applications and application consistency. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Mikio Braun (Zalando SE)
With ML becoming more and more mainstream, the side effects of using machine learning and AI on our lives become more and more visible. One has to take extra measures to make machine learning models fair and unbiased In addition, awareness for preserving the privacy in ML models is rapidly growing. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Atul Gupte (Uber Technologies Inc.), Nikhil Joshi (Uber)
At Uber, we’re changing the way people think about transportation. As an integral part of the logistical fabric in 65+ countries around the world, we’re using ML and advanced data science to power every aspect of the Uber experience - from dispatch to customer support. In this talk, we’ll explore how we enable teams at Uber to transform insights into intelligence and facilitate critical workflows. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Shirshanka Das (LinkedIn), Mars Lan (LinkedIn)
How do you scale metadata to an organization of 10,000 employees, 1M+ data assets and an AI-enabled company that ships code to the site three times a day. We describe the journey of LinkedIn’s metadata from a two-person back-office team to a central hub powering data discovery, AI productivity and automatic data privacy. Different metadata strategies and our battle scars will be revealed! Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
With cheap and infinitely scalable storage services such as S3 and ADLS, it has never been easier to dump data into a cloud data lake. But how do you secure that data and make sure it doesn't leak? In this talk we explore numerous capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest) and auditing, as well as network protections. Read more.
2:05pm–2:45pm Wednesday, 09/25/2019 TBC
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum)
From the EU to California and China, more and more of the world is regulating how data can be used. In this session, Immuta and the Future of Privacy Forum will convene leading experts on law and data science for a deep dive into ways to regulate the use of AI and advanced analytics. Come learn why these laws are being proposed, how they’ll impact data, and what the future has in store. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Panos Alexopoulos (Textkernel BV)
In an era where discussions among data scientists are monopolized by the latest trends in Machine Learning, the role of Semantics in Data Science is often underplayed. In this talk, I present real-world cases where making fine, seemingly pedantic, distinctions in the meaning of data science tasks and their related data, has helped improve significantly their effectiveness and value. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Magesh Chandramouli (ExpediaGroup), Keshav Peswani (Expedia Group), Shreya Sharma (Expedia Inc.)
Observability is the key in modern architecture to quickly detect and repair problems in microservices. Modern observability platforms have evolved beyond simple application logs and now include distributed tracing systems like Zipkin, Haystack. Combining them with real time intelligent alerting mechanisms with accurate alerts helps in automated detection of these problems. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Nan Zhu (Uber), Felix Cheung (Uber)
XGBoost has been widely deployed in companies across the industry. This talk begins with introducing the internals of distributed training in XGBoost and then demonstrate how XGBoost resolves the business problem in Uber with a scale to thousands of workers and 10s of TB training data. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/25/2019
Tomer Levi (Fundbox)
Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy to use, scalable, and flexible data workflow platform is a complex undertaking. In this talk, attendees will learn how the data engineering team at Fundbox uses AWS serverless technologies to address this problem, and how it enables data scientists, BI devs and engineers move faster. Read more.
2:05pm–2:45pm Wednesday, 09/25/2019
TBC

2:55pm

Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Weisheng Xie (China Telecom BestPay Co., Ltd), Sijie Guo (ASF)
As a Fintech company of China Telecom with half billion registered users and 41 million monthly active users, risk control decision deployment has been critical to the success of the business. In this talk we share how we leverage Apache Pulsar to boost the efficiency of our risk control decision development for combating financial frauds over 50 million transactions a day. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Jari Koister (FICO )
Machine Learning and Constraint-based Optimization are both used to solve critical business problems. They come from distinct research communities and have traditionally been treated separately. This talk describes how they are similar, how they differ and how they can be used to solve complex problems with amazing results. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Kai Liu (Microsoft (BING))
Facilitating large scale of deep learning projects in parallel requires some effort and innovation. Bing is now running a deployment of thousands of servers to address this challenge. We provides training services, offline data processing, vector hosting, and inferencing service at offline fashion to help data scientists through all steps in the project life cycle. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber)
At Uber’s scale and pace of growth, a robust system for discovering and managing various entities, from datasets to services to pipelines, and their relevant metadata is not just nice to have: it is absolutely integral to making data useful at Uber. In this talk, we will explore the current state of metadata management and end-to-end data flow solutions at Uber and what’s coming next. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Justin Fier (Darktrace)
Cyber security must find what it doesn’t know to look for. AI technologies have led to the emergence of self-learning, self-defending networks that achieve this – detecting and autonomously responding to in-progress attacks in real time. These cyber immune systems enable the security team to focus on high-value tasks, can counter even machine-speed threats, and work in all environments. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Tim McKenzie (Pitney Bowes)
Planning 5G network rollout and associated services requires a good understanding of location based data. Accurate addressing and linking consumers to property parcels or points of interest allows data enrichment with property attributes, demographics and social data. Companies use location to organize and analyze network and customer data in order to understand where to target new services. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Mark Hinely (KirkpatrickPrice)
The fear that comes along with new compliance requirements is overwhelming. Organizations don’t know where to start, what to fix, or what an auditor expects to see. In this session, learn what an auditor’s perspective is on the newest security and privacy regulations, how your business can prepare for compliance, and what the audit looks like from their perspective. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Gerard de Melo (Rutgers University)
What kinds of sentiment and emotions do consumers associate with a text? With new data-driven approaches, organizations can better pay attention to what is being said about them in different markets. We can also consider the fonts and color palettes best-suited to convey specific emotions, so that organizations can make informed choices when presenting information to consumers. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Tony Xing (Microsoft), Bixiong Xu (Microsoft), Congrui Huang (Microsoft), Qun Ying (Microsoft)
Anomaly Detection may sound old fashioned yet super important in many industry applications. How about doing this in a computer vision way? Come to our talk to learn a novel Anomaly Detection algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN), and how this novel method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Fei Wang (CarGurus), Michael Brautbar (CarGurus)
This session will present the case study for the CarGurus TV Attribution Model. Attendees will learn how the creation of a causal inference model can be leveraged to calculate cost per acquisition (CPA) of TV spend and measure effectiveness when compared to CPA of Digital Performance Marketing spend. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/25/2019
Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)
Imagine a business insight showing a sudden spike.Debugging data pipelines is non-trivial and finding the root cause can take hours or even days! We’ll share how Intuit built a self-serve tool that automatically discovers data pipeline lineage and tracks every change that impacts pipeline.This helps debug pipeline issues in minutes–establishing trust in data while improving developer productivity. Read more.
2:55pm–3:35pm Wednesday, 09/25/2019
TBC

3:35pm

3:35pm–4:35pm Wednesday, 09/25/2019
Afternoon break (1h)

4:35pm

Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
James Terwilliger (Microsoft), Daniel Musgrave (Microsoft)
Trill has been open-sourced, making the streaming engine behind services like the multi-billion-dollar Bing Ads platform available for all to use and extend. We give a brief history of streaming data at Microsoft and lessons learned. We then demonstrate how its API can power complex application logic, and the performance that gives the engine its name: a trillion events per day per node. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Criteo’s infrastructure provides capacity and connectivity to host Criteo’s platform and applications. The evolution of our infrastructure is driven by the ability to forecast Criteo’s traffic demand. In this talk, we explain how Criteo uses Bayesian Dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Prakhar Jain (Qubole), Sourabh Goyal (Qubole)
Autoscaling of resources aims to achieve low latency for a big data application, while reducing resource costs at the same time. Upscale a cluster in cloud is fairly easy as compared to downscaling nodes and so overall Total-cost-of-ownership (TCO) goes up. We will talk about new design to get efficient downscaling which further helps in achieving better resource utilization and thus lower TCO. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Max Neunhöffer (ArangoDB), Joerg Schad (Suki)
Machine Learning Platforms being built are becoming more complex with different components each producing their own metadata. Currently, most components provide their own way of storing metadata. In this talk, we propose a first draft of a common Metadata API and demo a first implementation of this API in Kubeflow using ArangoDB, which is a native multi-model database. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Jeff Zemerick (Mountain Fog)
This talk describes how open source technologies can be used to identify and remove PHI from streaming text in an enterprise healthcare environment. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Vlad Eidelman (FiscalNote)
While regulations affect your life every day, and millions of public comments are submitted to regulatory agencies in response to their proposals, analyzing the comments has traditionally been reserved for legal experts. In this talk, we show how natural language processing and machine learning can be used to automate the process by analyzing over 10 million publicly released comments. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Elasticsearch allows extremely quick search and drilldowns on large amounts of semistructured data. Elasticsearch, however, does not have relational join capabilities. In this presentation I'll introduce a plugin for ES that adds cluster distributed joins and demonstrate how it enables an exciting array of use cases dealing with interconnected or "Knowledge Graph" enterprise data. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
John Berryman (Eventbrite)
Eventbrite is exploring a new machine learning approach that allows us to harvest data from customer search logs and automatically tag events based upon their content. The results have allowed us to provide users with a better inventory browsing experience. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Siddha Ganju (Nvidia), Meher Kasam (Square)
Optimizing deep neural nets to run efficiently on mobile devices. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Robert Pesch (inovex GmbH), Robin Senge (inovex GmbH)
In this talk, we outline the development process, the statistical modeling, the data-driven decision making, and the components needed for productionizing a fully automated and highly scalable demand forecasting system for an online grocery shop for a billion-dollar retail group in Europe. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Wangda Tan (Cloudera), Jitendra Pandey (Hortonworks)
In this talk, we’ll start with the current status of Apache Hadoop community, we'll then move on to the exciting present & future of Hadoop 3.x. We will cover new features like erasure coding, GPU support, namenode federation, Docker, long-running services support, powerful container placement constraints, data node disk balancing, etc. Also we will talk about upgrade guidance from 2.x to 3.x. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/25/2019
Andrew Brust (ZDNet | Blue Badge Insights)
A primer on data catalogs and review of the major vendors and platforms in the market. Includes discussion on the use of data catalogs with classic and newer data repositories, including data warehouses, data lakes, cloud object storage and even software/applications. Coverage of AI's role in the data catalog world and analysis of data catalog futures will be provided. Read more.

5:25pm

Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Bas Geerdink (ING)
Streaming Analytics (or Fast Data processing) is the field of making predictions on real-time data. In this talk, I'll present a fast data architecture that covers many use cases that follows a 'pipes and filters' pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Impala, and Spark Structured Streaming (KISSS). Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Subhasish Misra (Walmart )
Causal questions are ubiquitous. Randomized tests are considered to be the gold standard for these. However, such tests are not always feasible and then, one just has observational data to get to causal insights. Techniques such as matching offer a solve then. This talk will offer a take on the above aspects, plus share practical tips when trying to infer causal effects. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Chenzhao Guo (Intel Asia-Pacific Research & Development Ltd.), Carson Wang (Intel)
Shuffle in Spark requires the shuffle data to be persisted on local disks.However, the assumptions of collocated storage do not always hold in today’s data centers. We implemented a new Spark shuffle manager, which writes shuffle data to a remote cluster with different storage backends. This makes life easier for those customers who want to leverage the latest storage hardware, and HPC customers Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what data sets are available for consumption. To address this challenge, a custom metadata management tool was recently deployed as a new capability at Bayer. The system is cloud enabled and uses multiple open source components including machine learning and natural language processing to aid search. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Matt Carothers (Cox Communications), Jignesh Patel (Cox Communications)
Organizations often work with sensitive information such as social security number, and Credit card information. Although this data is stored in encrypted form, most analytical operations ranging from data analysis to advanced machine learning algorithms require data decryption for computation. This creates unwanted exposures to theft or unauthorized read by undesirables. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Thiago Ribeiro (Griaule)
Brazil deployed a national biometric system to register all Brazilian voters using multiple biometric modalities and to ensure that a person does not enroll twice. This session highlights how a large-scale biometric system works, and what are the main architecture decisions that one has to take in consideration. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Brindaalakshmi K (Independent Consultant)
There is a lack of standard for the collection of gender data. This session takes a look at the implications of such a lack in the context of a developing country like India, the exclusion of individuals beyond the binary genders of male and female and how this exclusion permeates beyond the public sector into private sector services. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Sireesha Muppala (AWS), Shelbee Eigenbrode (AWS), Emily Webber (Amazon Web Services)
Mansplaining. Know it? Hate it? Want to make it go away? In this session we tackle the chronic problem of men talking over or down to women and its negative impact on career progression for women. We will also demonstrate an Alexa skill that uses deep learning techniques on incoming audio feeds. We discuss ownership of the problem for both women and men, and suggest helpful strategies. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
The common perception of deep learning is that it results in a fully self-contained model. However, in most cases these models have similar requirements for data pre-processing as more "traditional" machine learning. Despite this, there are few standard solutions for deploying end-to-end deep learning. In this talk, I show how the ONNX format and ecosystem is addressing this challenge. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Aaron Owen (Major League Baseball), Matt Horton (Major League Baseball), Josh Hamilton (MLB)
Utilizing SAS, Python, and AWS Sagemaker, MLB’s data science team discusses how it predicts ticket purchasers’ likelihoods to purchase again, evaluates prospective season schedules, estimates customer lifetime value, optimizes promotion schedules, quantifies the strength of fan avidity, and monitors the health of monthly subscriptions to its game-streaming service. Read more.
5:25pm–6:05pm Wednesday, 09/25/2019 TBC
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/25/2019
Alasdair Allan (Babilim Light Industries)
A arrival of new generation of smart embedded hardware may cause the demise of large scale data harvesting. In its place smart devices will allow us process data at the edge, allowing us to extract insights from the data without storing potentially privacy and GDPR infringing data. The current age where privacy is no longer "a social norm" may not long survive the coming of the Internet of Things. Read more.

6:05pm

Add to your personal schedule
6:05pm–7:05pm Wednesday, 09/25/2019
Event
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

Thursday, 09/26/2019

8:15am

Add to your personal schedule
8:15am–8:45am Thursday, 09/26/2019
Event
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:45am

Add to your personal schedule
8:45am–10:45am Thursday, 09/26/2019
Keynote
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

10:50am

10:50am–11:20am Thursday, 09/26/2019
Morning break (30m)

11:20am

Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Michael Freedman (TimescaleDB)
Leveraging polyglot solutions for your time-series data can lead to a variety of issues including engineering complexity, operational challenges, and even referential integrity concerns. By re-engineering Postgres to serve as a general data platform, your high-volume time-series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company, and introduce key tools and techniques from latest research that allow us to assess and mitigate undesired bias in our machine learning models. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend )
In this talk, we discuss online machine learning algorithm choices for streaming applications. We motive the discussion with resource constrained use cases like IoT and personalization. We cover Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms, all the way from implementation to production deployment, describing the pros and cons of using each of them. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey)
You are a SaaS company that operates on a cloud infra prior to the ML era. How do you successfully extend your existing infrastructure to leverage the power of ML? In this case study, you will learn critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Data has always been relational, and it always will be. NoSQL databases are gaining in popularity, but that does not change the fact that the data they manage is still relational, it just changes how we have to model the data. This session dives deep into how real Entity Relationship Models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.
11:20am–12:00pm Thursday, 09/26/2019
TBC
11:20am–12:00pm Thursday, 09/26/2019
TBC
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Brian Keng (Rubikloud Technologies Inc)
Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure that they are robust and consistent with business goals. This talk will describe how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation. Read more.
11:20am–12:00pm Thursday, 09/26/2019 TBC
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Anjali Samani (CircleUp)
The application of smoothing and imputation strategies is common practice in predictive modelling and time series analysis. With a technique-agnostic approach, this session will provide qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Petar Zecevic (SV Group d.o.o.)
Large Scale Survey Telescope, or LSST, is one of the most important future surveys. Its unique design will allow it to cover large regions of the sky and obtain images of the faintest objects. In 10 years of its operation it will produce about 80 PB of data, both in images and catalog data. I will present AXS, a system we built for fast processing and cross-matching of survey catalog data. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/26/2019
Gayle Bieler (RTI International)
This presentation is about building a thriving Center for Data Science within a large and well-respected non-profit research institute. I'll discuss my transformation from an entrepreneurial statistician to data science leader, as well as some of our most impactful projects and best adventures to date--solving important national problems, improving our local communities, and transforming research. Read more.

12:00pm

12:00pm–1:15pm Thursday, 09/26/2019
Break (1h 15m)
Add to your personal schedule
12:00pm–1:15pm Thursday, 09/26/2019
Event
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.
Add to your personal schedule
12:00pm–1:15pm Thursday, 09/26/2019
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:15pm

Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Kafka, many times is just a piece of the stack that lives in production that often times no one wants to touch - because it just works. At AppsFlyer, Kafka sits at the core of our infrastructure that processes billions of events daily. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Sandra Carrico (Glynt.ai)
This talk motivates mixed formal learning, explains it and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to Mixed Formal Learning, a general AI architecture that you can use in your projects. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Jim Scott (MapR Technologies)
Data scientists are creating and testing hundreds or thousands more models than in the past. Models require support from both real-time and static data sources. As data becomes enriched, and parameters tuned and explored, there is a need for versioning everything, including the data. We will discuss the very specific problems and approaches to fix them. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Bo Yang (uber inc)
Omkar Joshi and Bo Yang offer an overview of how Uber’s ingestion (Marmary) & observability team improved performance of Apache Spark applications running on thousands of cluster machines and across 100 thousands+ of applications and how they methodically tackled these issues. They will also cover how they used Uber’s open sourced jvm-profiler for debugging issues at scale. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Shant Hovsepian (Arcadia Data)
With cloud object storage (e.g. S3, ADLS) one expects business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces non-obvious challenges. This talk will review service-oriented cloud design (storage, compute, catalog, security, SQL) and shows how native cloud BI provides analytic depth, low cost and performance Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
Ever confused about what it takes to be a data scientist? Or curious about how companies recruit, train and manage analytics resources? This presentation covers insight from the most comprehensive research effort to-date on the data analytics profession, propose a framework for standardization of roles in the industry and methods for assessing skills. Read more.
1:15pm–1:55pm Thursday, 09/26/2019 TBC
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Victor Dibia (Cloudera Fast Forward Labs)
Recent advances in Machine Learning frameworks for the browser such as Tensorflow.js provides opportunity to craft truly novel experiences within front-end applications. This talk explores the state of the art for Machine Learning in the browser using Tensorflow.js and covers its use in the design of Handtrack.js - a library for prototyping real time hand detection in the browser. Read more.
1:15pm–1:55pm Thursday, 09/26/2019
TBC
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Alfred Whitehead (Klick), Clare Jeon (KLICK INC)
What will tomorrow’s temperature be? My blood glucose levels tonight before bed? Time series forecasts depend on sensors or measurements made out in the real, messy world. Those sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing data in our signals. We will show a number of methods for handling data gaps and give advice on which to consider and when. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
Jason Wang (Cloudera), Sushant Rao (Cloudera)
We’ll give you actionable understanding of cloud architecture and different approaches customers took in their journey to the cloud. We start with the different ways we’ve seen customers be successful in the cloud. Then deep dive into the decisions they made, and how that drove their cloud architecture. Along the way we review problems they overcame, lessons learned, and core cloud paradigms. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/26/2019
David Castillo (Capital One)
The head of Capital One's Center for Machine Learning will share best practices for building a Responsible AI program in the enterprise, from multidisciplinary internal working groups to research & development. Read more.

2:05pm

Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Narvar provides next generation post transaction experience for over 500+ retailers. This talk explores the journey of how Narvar moving away from using a slew of technologies for their platform and consolidating their use cases using Apache Pulsar. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Andrew Leamon (Comcast), Wadkar Sameer (Comcast NBCUniversal)
And overview of the Data Management and privacy challenges around automating ML model (re)deployments and stream based inferencing at scale. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Diego Oppenheimer (Algorithmia)
Machine Learning (ML) will fundamentally change the way we build and maintain applications. How can we adapt our infrastructure, operations, staffing, and training to meet the challenges of the new Software Development Life Cycle (SDLC) without throwing away everything that already works? Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Reza Shiftehfar (Uber Technologies)
Building a reliable Big Data platform is extremely challenging when it has to store and serve 100s of PetaBytes of data in a real-time fashion . This talk reflects on the challenges faced and proposes architectural solutions to scale a Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing the hardware and meeting the security needs. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. In this talk we describe how companies can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize various workloads simultaneously. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Alex Yoon (T-Mobile)
T-Mobile successfully improved the quality of voice calling by analyzing crowd sourced big data from mobile devices. In this session, you will learn how engineers from multiple backgrounds collaborated to achieve 10% improvement in voice quality and why the analysis of big data was the key to the success in bringing a better voice call service quality to millions of end users. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Akshay Rai (Linkedin)
Failures or issues in a product or service can negatively affect the business. Detecting issues in advance and recovering from them is crucial to keep the business alive. Come, join us, to learn more about LinkedIn's next-generation open-source monitoring platform, an integrated solution for real-time alerting and collaborative analysis. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
In this talk, we show how to develop a machine learning pipeline for streaming data using the StreamDM framework (https://github.com/huawei-noah/streamDM). We also introduce how to use StreamDM for supervised and unsupervised learning tasks, show examples of online preprocessing methods, and how to expand the framework adding new learning algorithms or preprocessing methods. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Derek Lin (Pivotal)
Unmanaged & foreign devices in the corporate networks pose a security risk. The 1st step toward reducing risk from these devices is the ability to identify them. To have a comprehensive device management program, we proposed a machine learning model based on Deep Learning to perform anomaly detection based on only device names to flag devices that do not follow device naming structures. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Anais Jackie Dotis (InfluxData)
Did you know that Classical algorithms outperform Machine Learning methods in time series forecasting? I’ll show you how I used the Holt-Winters forecasting algorithm to predict water levels in a creek. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/26/2019
Nikki Rouda (Amazon Web Services), Roy Hasson (Amazon Web Services)
Learn how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. Link customer records across different databases (e.g. different name spelling or address.) Match external product lists against your own catalog, such as lists of hazardous goods. Solve tough challenges to prepare and cleanse data for analysis. Read more.
2:05pm–2:45pm Thursday, 09/26/2019
TBC

2:45pm

2:45pm–3:45pm Thursday, 09/26/2019
Break (1h)

3:45pm

Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Architecture and lessons learned from development of T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides 3D visualized real-time status of the whole network and services for the operators and analytics platform for data scientists, engineers and developers. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
David Mack (Octavian)
Graphs are a powerful way to represent knowledge. Organizations (in fields such as bio-sciences and finance) are starting to amass large knowledge graphs, but lack the machine-learning tools to extract the insights they need from them. In this presentation, I’ll give an overview of what insights are possible and survey the most popular approaches. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
As an increasing level of automation is becoming available to data science, there is a balance between automation and quality that needs to be maintained. Applying DevOps practices to machine learning workloads not only brings models to the market faster but also maintains the quality and integrity of those models. This presentation will focus on applying DevOps practices to ML workloads. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Vitaliy Baklikov (Development Bank of Singapore), Dipti Borkar (Alluxio )
In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Owen O'Malley (Cloudera)
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. This talk describes how column encryption in ORC files enables both fine grain protection and audits of who accessed the private data. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
At MakeMyTrip, India’s leading online travel platform, customers were using voice or email to contact agents for post sale support. In order to improve the efficiency of agents and improve customer experience, MakeMyTrip developed a chatbot Myra using some of the latest advances in deep learning. In this talk, we will discuss the high level architecture and the business impact created by Myra. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada), Ryan Hum (National Energy Board, Canada)
As new digital platforms emerge and governments look at new ways to engage with citizens, there is an increasing awareness of the role these platforms play in shaping public participation and democracy. This talk examines the design attributes of civic engagement technologies, and their ensuing impacts. A framework for better achieving desired outcomes is demonstrated with a NEB Canada case study. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Sajan Govindan (Intel), Luca Canali (CERN)
We will show CERN’s research on applying Deep Learning in High Energy Physics experiments as an alternative to customized rule based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. CERN implemented deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Chad Scherrer (Metis)
This talk will explore the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Tom O'Neill (Periscope Data)
In this session, CTO Tom O’Neill will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its 1,000+ data teams. He’ll highlight the process of migrating from Heroku to Kubernetes and discovering new ways to leverage its power, plus other developments that have allowed users to delve deeper into new data science and ML analysis. Read more.
Add to your personal schedule
3:45pm–4:25pm Thursday, 09/26/2019
Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
GE Aviation has made it a mission to implement Self-Service Data. To ensure success beyond initial implementation of tools, the Data Engineering and Analytics teams at GE Aviation created initiatives designed to foster engagement from an ongoing partnership with each part of the business to the gamification of tagging data in a data catalog to forming a Published Dataset Council. Read more.

4:35pm

Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Neelesh Salian (Stitch Fix)
It is important to understand why Data Lineage is needed for an organization. Once the purpose is defined, we can talk about how to go about building such a system. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Brandy Freitas (Pitney Bowes)
In this session, Brandy Freitas from Pitney Bowes will cover the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and harnessing the power of graph structure for machine learning through node embedding. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Supun Kamburugamuve (Indiana University)
Big data computing and high-performance computing (HPC) has evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms are increasingly embracing each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Ruixin Xu (Microsoft)
Microsoft big data team run experiment to use Spark and Jupyter notebook as a replacement of existing IDE based diagnose tools for internal DevOps. Experiment result indicates the Spark based solution has improved the diagnosis performance significantly especially for complex job with large profile, and leveraging Jupyter notebook also bring the benefit of fast iteration and easy knowledge share. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Mark Donsky (Okera)
California is following the EU's GDPR with the California Consumer Protection Act (CCPA) in 2020. Penalties for non-compliance, but many companies aren't prepared for this strict regulation. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Naoto Umemori (NTT DATA Corporation), Masaru Dobashi (NTT Data Corp.)
Giant Hogweed is a highly toxic plant. Our project aims to automate the process of detecting the Giant Hogweed by exploiting technologies like drones and image recognition/detection using Machine Learning. We show you how we designed the architecture, how we took advantage of both of Big Data and Machine / Deep Learning technologies (e.g. Hadoop, Spark and TensorFlow) and lessons learned. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Evgeny Vinogradov (Yandex.Money)
With a microservice architecture, DWH is a first place where all the data gets together. It supplied by many different datasources. It is used for many purposes – from near-OLTP till models fitting and realtime classifying. Talk will cover our experience in management and scaling of data Engineering Team and infrastructure for support of 20+ Product Teams. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/26/2019
Dean Wampler (Lightbend)
Join me for a discussion of the following problems and their solutions: 1. How (and why) to integrate ML into production streaming data pipelines, to serve results quickly? 2. How to bridge data science and production environments, with different tools, techniques, and requirements? 3. How to build reliable and scalable, long-running services? 4. How to update ML models without downtime? Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts