Schedule: Data Management and Storage sessions: Data science + business analytics training: Strata Data Conference

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 14

Running multidisciplinary big data workloads in the cloud with CDP

James Morantus (Cloudera), Tony Huinker (Cloudera), Naren Koneru (Cloudera), Ramachandran Venkatesh (Cloudera), Gunther Hagleitner (Cloudera), Olli Draese (Cloudera)

Organizations now run diverse, multidisciplinary, big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature. There are many challenges with moving these workloads to the cloud. In this talk we start off with a technical deep... Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 08

Learning Presto: SQL on anything

Data Engineering and Architecture

Matt Fuller (Starburst)

Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 09

From relational databases to cloud databases: Using the right tool for the right job

Data Engineering and Architecture

Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)

Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1E 09

Data security and privacy anti-patterns

Data Engineering and Architecture, Security and Privacy

Steven Touw (Immuta)

Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data security and privacy, they’re everywhere. Over the past four years, data security and privacy anti-patterns have emerged across hundreds of customers and industry verticals—there's been an obvious trend. Steven Touw details five anti-patterns and, more importantly, the solutions for them. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1E 07/08

Kubernetes for stateful MPP systems

Data Engineering and Architecture

Paige Roberts (Vertica), Deepak Majeti (Vertica)

GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1E 10/11

Executive Briefing: Top 10 big data blunders

Executive Briefing and best practices, Strata Business Summit

Michael Stonebraker (Tamr)

As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what not to do when planning for your organization’s big data initiatives. Michael Stonebraker shares his top 10 big data blunders. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1E 09

Building a best-in-class data lake on AWS and Azure

Business Analytics and Visualization, Data Engineering and Architecture

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)

Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. Tomer Shiran and Jacques Nadeau explain how you can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize workloads simultaneously. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1E 14

Supercharging Elasticsearch for extended Knowledge Graph use cases

Business Analytics and Visualization, Strata Business Summit

Giovanni Tummarello (Siren)

Elasticsearch (ES) allows extremely quick search and drilldowns on large amounts of semistructured data. Elasticsearch, however, does not have relational join capabilities. Giovanni Tummarello examines a plug-in for ES that adds cluster distributed joins and demonstrates how it enables an exciting array of use cases dealing with interconnected or "Knowledge Graph" enterprise data. Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1E 09

Where's my lookup table? Modeling relational data in a denormalized world

Data Engineering and Architecture

Rick Houlihan (Amazon Web Services)

Data has always been and will always be relational. NoSQL databases are gaining in popularity, but that doesn't change the fact that the data is still relational, it just changes how we have to model the data. Rick Houlihan dives deep into how real entity relationship models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1A 23/24

Performant time series data management and analytics with PostgreSQL

Data Engineering and Architecture

Michael Freedman (TimescaleDB | Princeton University)

Leveraging polyglot solutions for your time series data can lead to issues including engineering complexity, operational challenges, and even referential integrity concerns. Michael Freedman explains why, by re-engineering PostgreSQL to serve as a general data platform, your high-volume time series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.

1:15pm–1:55pm Thursday, September 26, 2019

Location: 1A 15/16

Managing your Kafka in an explosive growth environment

Data Engineering and Architecture

Alon Gavra (AppsFlyer)

Frequently, Kafka is just a piece of the stack that lives in production that often times no one wants to touch—because it just works. Alon Gavra outlines how Kafka sits at the core of AppsFlyer's infrastructure that processes billions of events daily. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1A 23/24

Creating an extensible 100+ PB real-time big data platform by unifying storage and serving

Data Engineering and Architecture

Reza Shiftehfar (Uber)

Building a reliable big data platform is extremely challenging when it has to store and serve hundreds of petabytes of data in real time. Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. Read more.

3:45pm–4:25pm Thursday, September 26, 2019

Location: 1A 23/24

Enabling big data and AI workloads on the object store at DBS Bank

Data Engineering and Architecture

Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )

Vitaliy Baklikov and Dipti Borkar explore how DBS Bank built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.