Schedule: Data, Analytics, and AI Architecture sessions: Data science + business analytics training: Strata Data Conference

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1A 17

SOLD OUT: Building a serverless big data application on AWS

Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Nikki Rouda (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join the AWS team to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 09

Serverless streaming architectures and algorithms for the enterprise

Data Engineering and Architecture, Streaming and IoT

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)

Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 09

From relational databases to cloud databases: Using the right tool for the right job

Data Engineering and Architecture

Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)

Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 12/13

Architecting a data platform for enterprise use

Data Engineering and Architecture

Mark Madsen (Teradata), Todd Walter (Archimedata)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 15/16

Building a multitenant data processing and model inferencing platform with Kafka Streams

Data Engineering and Architecture

Navinder Pal Singh Brar (Walmart Labs)

Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1E 07/08

Kubernetes for stateful MPP systems

Data Engineering and Architecture

Paige Roberts (Vertica), Deepak Majeti (Vertica)

GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 23/24

Building an AI platform: Key principles and lessons learned

Automation in data science and data, Data Engineering and Architecture

Moty Fania (Intel)

Moty Fania details Intel’s IT experience of implementing a sales AI platform. This platform is based on streaming, microservices architecture with a message bus backbone. It was designed for real-time data extraction and reasoning and handles the processing of millions of website pages and is capable of sifting through millions of tweets per day. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1A 21/22

A productive data science platform: Beyond a hosted-notebooks solution at LinkedIn

Data Engineering and Architecture

Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn)

Join Swasti Kakker, Manu Ram Pandit, and Vidya Ravivarma to explore what's offered by a flexible and scalable hosted data science platform at LinkedIn. It provides features to seamlessly develop in multiple languages, enforce developer best practices, governance policies, execute, visualize solutions, efficient knowledge management, and collaboration to improve developer productivity. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1E 10/11

Executive Briefing: Top 10 big data blunders

Executive Briefing and best practices, Strata Business Summit

Michael Stonebraker (Tamr)

As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what not to do when planning for your organization’s big data initiatives. Michael Stonebraker shares his top 10 big data blunders. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1A 08/10

Machine learning and large-scale data analysis on a centralized platform

Data Science, Machine Learning, & AI

James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)

James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1A 15/16

Now you see me; now you compute: Building event-driven architectures with Apache Kafka

Data Engineering and Architecture

Michael Noll (Confluent)

Would you cross the street with traffic information that's a minute old? Certainly not. Modern businesses have the same needs. Michael Noll explores why and how you can use Kafka and its growing ecosystem to build elastic event-driven architectures. Specifically, you look at Kafka as the storage layer, at Kafka Connect for data integration, and at Kafka Streams and KSQL as the compute layer. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1E 07/08

Orchestrating data workflows using a fully serverless architecture

Data Engineering and Architecture

Tomer Levi (Fundbox)

Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1A 21/22

From raw data to informed intelligence: Democratizing data science and ML at Uber

Data Engineering and Architecture

Atul Gupte (Uber)

Uber is changing the way people think about transportation. As an integral part of the logistical fabric in 65+ countries around the world, it uses ML and advanced data science to power every aspect of the Uber experience—from dispatch to customer support. Atul Gupte and Nikhil Joshi explore how Uber enables teams to transform insights into intelligence and facilitate critical workflows. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1A 15/16

How Orange Financial combats financial fraud over 50M transactions a day using Apache Pulsar

Data Engineering and Architecture

Weisheng Xie (Orange Financial), Jia Zhai (StreamNative)

As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, risk control decision deployment has been critical to its success. Weisheng Xie and Jia Zhai explore how the company leverages Apache Pulsar to boost the efficiency of its risk control decision development for combating financial frauds of over 50 million transactions a day. Read more.