Sep 23–26, 2019

Enabling big data and AI workloads on the object store at DBS Bank

Vitaliy Baklikov (Development Bank of Singapore), Dipti Borkar (Alluxio )
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 23

Who is this presentation for?

Data engineers, data architects, storage architects

Level

Intermediate

Description

The big data stack has heavily evolved over the past few years with an explosion of data frameworks starting with MapReduce and expanding to Apache Spark and Presto. In addition, the approach to managing and storing data has evolved as well, starting from using primarily HDFS to using newer, cheaper and easier technologies like object stores. But the design of most object stores inhibits real time big data and AI workloads to be directly on run on them.

In this session, we introduce a different different architecture for analytic workloads particularly deployed in cloud environment. Alluxio, an open-source virtual distributed file system, provides a unified data access layer for hybrid and multi-cloud deployments. Alluxio enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access.

In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store as persistent storage even for data-intensive workloads and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. In addition, deploying Alluxio to access data, solves many challenges that cloud deployments bring with separated compute and storage.

Prerequisite knowledge

Basic knowledge of the data ecosystem

What you'll learn

Object stores provide an easy and cheaper storage alternative to Hadoop. But currently their limitations prevent them from being used for real-time big data workloads. Alluxio, an open source project can be used to enable new workloads on object stores.
Photo of Vitaliy Baklikov

Vitaliy Baklikov

Development Bank of Singapore

Vitaliy Baklikov is a data architect at Development Bank of Singapore.

Dipti Borkar

Alluxio

Dipti Borkar is the VP of Product & Marketing at Alluxio with over 15 years experience in data and database technology across relational and non-relational. Prior to Alluxio, Dipti was VP of Product Marketing at Kinetica and Couchbase. At Couchbase she held several leadership positions there including Head of Global Technical Sales and Head of Product Management. Earlier in her career Dipti managed development teams at IBM DB2 where she started her career as a database software engineer. Dipti holds a M.S. in Computer Science from the UC San Diego, and an MBA from the Haas School of Business at UC Berkeley.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts