Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

High-performance enterprise data processing with Spark

Vickye Jain (ZS Associates), Raghav Sharma (ZS Associates)
1:45pm2:25pm Wednesday, December 6, 2017
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • IT managers and directors, business analysts, and analytics managers and directors

Prerequisite knowledge

  • Basic knowledge of big data applications
  • A high-level understanding of Spark

What you'll learn

  • Explore a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance

Description

Enterprises are getting increasingly comfortable with moving traditional workloads to Spark. However, despite its popularity, Spark remains an esoteric technology within enterprises, and many for whom technology is not their core competence, are wary of building internally managed applications on Spark, in part owing to the lack of a steady talent pool and a fear of budget overruns. As such, there is still a constant struggle to balance the ability to support advanced technology platforms within enterprises with matrix organizations, complex funding channels, and business demands.

Vickye Jain and Raghav Sharma explain how they built a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. Vickye and Raghav had to negotiate conflicting objectives such as:

  • Using Spark while enabling a broader low-tech user base (SQL writers) to build on the platform;
  • Building for extreme performance while catering to the frequent need to restart from intermediate points;
  • The ability to handle over 200 data sources and thousands of transformations while providing full traceability and visibility for operators to debug and enhance.

Vickye and Raghav also offer an overview of the architecture itself, which consists of several elastic clusters, external orchestrators providing full visibility into jobs, a combination of job servers and traditional Spark applications, and deep integration with technical experts with domain experts for rapid development.

Photo of Vickye Jain

Vickye Jain

ZS Associates

Vickye Jain is a technology manager at ZS Associates, where he jointly runs the big data expertise center. Vickye has extensive experience implementing large-scale big data platforms for Fortune 200 companies in the US. He and his team have implemented very large-scale ETL offloading use cases, data lakes, and high-performance data processing platforms that have had transformation business impact on commercial, R&D, and operations organizations within life sciences.

Photo of Raghav Sharma

Raghav Sharma

ZS Associates

Raghav Sharma is a solution delivery manager at ZS Associates, where he specializes in big data platforms, cloud-based analytical solutions, and information architecture and helps lead the delivery of technology consulting engagements in the big data space for life sciences industry clients.