Schedule: Cloud Platforms and SaaS sessions: Data science + business analytics training: Strata Data Conference

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1A 17

SOLD OUT: Building a serverless big data application on AWS

Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Nikki Rouda (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join the AWS team to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 14

Running multidisciplinary big data workloads in the cloud with CDP

Data Engineering and Architecture

James Morantus (Cloudera), Tony Huinker (Cloudera), Naren Koneru (Cloudera), Ramachandran Venkatesh (Cloudera), Gunther Hagleitner (Cloudera), Olli Draese (Cloudera)

Organizations now run diverse, multidisciplinary, big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature. There are many challenges with moving these workloads to the cloud. In this talk we start off with a technical deep... Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 09

Serverless streaming architectures and algorithms for the enterprise

Data Engineering and Architecture, Streaming and IoT

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)

Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 09

From relational databases to cloud databases: Using the right tool for the right job

Data Engineering and Architecture

Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)

Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1A 21

Building a recommender system with Amazon ML services

Data Science, Machine Learning, & AI

Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)

Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 12/13

Architecting a data platform for enterprise use

Data Engineering and Architecture

Mark Madsen (Teradata), Todd Walter (Archimedata)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1E 07/08

Kubernetes for stateful MPP systems

Data Engineering and Architecture

Paige Roberts (Vertica), Deepak Majeti (Vertica)

GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1E 07/08

Your easy move to serverless computing and radically simplified data processing

Data Engineering and Architecture

Gil Vernik (IBM)

Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the “push to the cloud” experience, which dramatically simplifies serverless for big data processing frameworks. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1E 12/13

Turning petabytes of data from millions of vehicles into open data with Geotab

Case studies, Strata Business Summit

Felipe Hoffa (Google), Bob Bradley (Geotab)

Geotab is a world-leading asset-tracking company with millions of vehicles under service every day. Felipe Hoffa and Bob Bradley examine the challenges and solutions to create an ML- and geographic information system- (GI)S enabled petabyte-scale data warehouse leveraging Google Cloud. And they dive into the process to publish open, how you can access it, and how cities are using it. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1E 07/08

Orchestrating data workflows using a fully serverless architecture

Data Engineering and Architecture

Tomer Levi (Fundbox)

Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1E 09

Building a best-in-class data lake on AWS and Azure

Business Analytics and Visualization, Data Engineering and Architecture

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)

Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. Tomer Shiran and Jacques Nadeau explain how you can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize workloads simultaneously. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 15/16

Trill: The crown jewel of Microsoft’s streaming pipeline explained

Data Engineering and Architecture, Streaming and IoT

James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)

Trill has been open-sourced, making the streaming engine behind services like the Bing Ads platform available for all to use and extend. James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein dive into the history of and insights from streaming data at Microsoft. They demonstrate how its API can power complex application logic and the performance that gives the engine its name. Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1E 09

Where's my lookup table? Modeling relational data in a denormalized world

Data Engineering and Architecture

Rick Houlihan (Amazon Web Services)

Data has always been and will always be relational. NoSQL databases are gaining in popularity, but that doesn't change the fact that the data is still relational, it just changes how we have to model the data. Rick Houlihan dives deep into how real entity relationship models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1A 15/16

Your cloud, your ML, but more and more scale? How SurveyMonkey did it

Data Engineering and Architecture

Jing Huang (SurveyMonkey), Jesscia Mong (SurveyMonkey)

You're a SaaS company operating on a cloud infrastructure prior to the machine learning (ML) era and you need to successfully extend your existing infrastructure to leverage the power of ML. Jing Huang and Jessica Mong detail a case study with critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.

1:15pm–1:55pm Thursday, September 26, 2019

Location: 1E 07/08

The hitchhiker’s guide to the cloud: Architecting for the cloud through customer stories

Data Engineering and Architecture

Sushant Rao (Cloudera)

Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1E 09

Securing your cloud data lake with a "defense in depth" approach

Data Engineering and Architecture

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)

With cheap and scalable storage services such as S3 and ADLS, it's never been easier to dump data into a cloud data lake. But you still need to secure that data and be sure it doesn't leak. Tomer Shiran and Jacques Nadeau explore capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest), and auditing, as well as network protections. Read more.

3:45pm–4:25pm Thursday, September 26, 2019

Location: 1A 21/22

ML ops: Applying DevOps practices to machine learning workloads

Automation in data science and data, Data Engineering and Architecture

Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)

As an increasing level of automation becomes available to data science, the balance between automation and quality needs to be maintained. Applying DevOps practices to machine learning workloads brings models to the market faster and maintains the quality and integrity of those models. Sireesha Muppala, Shelbee Eigenbrode, and Randall DeFauw explore applying DevOps practices to ML workloads. Read more.

3:45pm–4:25pm Thursday, September 26, 2019

Location: 1A 23/24

Enabling big data and AI workloads on the object store at DBS Bank

Data Engineering and Architecture

Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )

Vitaliy Baklikov and Dipti Borkar explore how DBS Bank built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.