Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Adding complex data to the Spark stack

Neeraja Rentachintala (MapR Technologies)
14:05–14:45 Friday, 3/06/2016
Spark & beyond
Location: Capital Suite 14 Level: Intermediate
Average rating: *****
(5.00, 2 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of SQL and Apache Spark.

Description

Apache Drill is an evolving SQL technology that allows users to instantly query and manipulate complex and semistructured data such as JSON in its native format without requiring any upfront schema definitions. Apache Spark is a proven data-processing framework that allows users to quickly build in-memory data pipelines for advanced analytics and machine learning using a wide variety of language APIs. The latest integration between Drill and Spark brings the best of both of these technologies together. Neeraja Rentachintala explores how Spark users can leverage Drill’s dynamic schema discovery capabilities to create Spark RDDs directly on complex, semistructured data, build data pipelines using Drill’s ANSI SQL extensions to manipulate the complex data within Spark programs, mix in Spark’s transformations, and then persist the SparkRDDs back to disk for queries by BI/analytics tools. Neeraja discusses the use cases for the integration and offers a live demo of these technologies working together.

Photo of Neeraja Rentachintala

Neeraja Rentachintala

MapR Technologies

As director of product management at MapR Technologies, Neeraja Rentachintala is responsible for the product strategy, roadmap, and requirements of MapR SQL initiatives. Prior to MapR, Neeraja held numerous product management and engineering roles at Informatica, Microsoft SQL Server, Oracle, and Expedia.com, most recently as the principal product manager for Informatica Data Services/Data Virtualization. Neeraja holds a BS in electronics and communications from the National Institute of Technology in India and is product management certified from the University of Washington.