Sep 23–26, 2019

Schedule: Cloud Platforms and SaaS sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1A 17
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 14
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 09
Arun Kejariwal (Facebook), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
In this tutorial, we shall walk the audience through the landscape of streaming systems and overview the inception and growth of the serverless paradigm. Next, we shall present a deep dive of Apache Pulsar which provides native serverless support in the form of Pulsar functions and paint a bird’s eye view of the application domains where Pulsar functions can be leveraged. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 12/13
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt Cloud platforms such as AWS for agility, elasticity and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. In this session, you will learn important considerations in choosing the right database based on your use cases and access pattern while migrating an application or building a new application on cloud. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1A 21/22
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
In this workshop we’ll introduce the Amazon SageMaker machine learning platform, followed by a high level discussion of recommender systems. Next we’ll dig into different machine learning approaches for recommender systems. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 09
Mark Madsen (Teradata), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 25, 2019
Location: 1E 07/08
Paige Roberts (Vertica), Deepak Majeti (Vertica)
a. Analytics experts, GoodData, needed to auto-recover from node failures and scale rapidly when workloads spike on their MPP database in the cloud. Kubernetes could solve that, but K8 is for stateless micro-services, not a stateful MPP database that needs 100s of containers. In order to merge the power of an MPP database with the flexibility of Kubernetes, a lot of hurdles had to be overcome. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 07/08
Gil Vernik (IBM)
Most analytic flows can benefit from the serverless, starting with simple cases to complex data preparations for AI frameworks, like TensorFlow. To address the challenge of how to easily integrate serverless, without major disruptions to your system, we present “push to the cloud” experience. This ability dramatically simplifies using serverless for different big data processing frameworks. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 12/13
Felipe Hoffa (Google), Bob Bradley (Geotab)
Geotab is a world's leading asset tracking company, with millions of vehicles under service every day. In the first part of this talk we are going to review their challenges and solutions to create an ML and GIS enabled petabyte scale data warehouse leveraging Google Cloud. Then we are going to review their process to publish open, how to access it, and how cities are using it. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1E 07/08
Tomer Levi (Fundbox)
Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy to use, scalable, and flexible data workflow platform is a complex undertaking. In this talk, attendees will learn how the data engineering team at Fundbox uses AWS serverless technologies to address this problem, and how it enables data scientists, BI devs and engineers move faster. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1E 09
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
With cheap and infinitely scalable storage services such as S3 and ADLS, it has never been easier to dump data into a cloud data lake. But how do you secure that data and make sure it doesn't leak? In this talk we explore numerous capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest) and auditing, as well as network protections. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 15/16
James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)
Trill has been open-sourced, making the streaming engine behind services like the multi-billion-dollar Bing Ads platform available for all to use and extend. We give a brief history of streaming data at Microsoft and lessons learned. We then demonstrate how its API can power complex application logic, and the performance that gives the engine its name: a trillion events per day per node. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 26, 2019
Location: 1E 09
Rick Houlihan (Amazon Web Services)
Data has always been relational, and it always will be. NoSQL databases are gaining in popularity, but that does not change the fact that the data they manage is still relational, it just changes how we have to model the data. This session dives deep into how real Entity Relationship Models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 23/24
Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey)
You are a SaaS company that operates on a cloud infra prior to the ML era. How do you successfully extend your existing infrastructure to leverage the power of ML? In this case study, you will learn critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 26, 2019
Location: 1E 07/08
Jason Wang (Cloudera), Sushant Rao (Cloudera)
We’ll give you actionable understanding of cloud architecture and different approaches customers took in their journey to the cloud. We start with the different ways we’ve seen customers be successful in the cloud. Then deep dive into the decisions they made, and how that drove their cloud architecture. Along the way we review problems they overcame, lessons learned, and core cloud paradigms. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1E 09
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. In this talk we describe how companies can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize various workloads simultaneously. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 21/22
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
As an increasing level of automation is becoming available to data science, there is a balance between automation and quality that needs to be maintained. Applying DevOps practices to machine learning workloads not only brings models to the market faster but also maintains the quality and integrity of those models. This presentation will focus on applying DevOps practices to ML workloads. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 23/24
Vitaliy Baklikov (Development Bank of Singapore), Dipti Borkar (Alluxio )
In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    Contact list

    View a complete list of Strata Data Conference contacts