Apache Drill is an evolving SQL technology that allows users to instantly query and manipulate complex and semistructured data such as JSON in its native format without requiring any upfront schema definitions. Apache Spark is a proven data-processing framework that allows users to quickly build in-memory data pipelines for advanced analytics and machine learning using a wide variety of language APIs. The latest integration between Drill and Spark brings the best of both of these technologies together. Neeraja Rentachintala explores how Spark users can leverage Drill’s dynamic schema discovery capabilities to create Spark RDDs directly on complex, semistructured data, build data pipelines using Drill’s ANSI SQL extensions to manipulate the complex data within Spark programs, mix in Spark’s transformations, and then persist the SparkRDDs back to disk for queries by BI/analytics tools. Neeraja discusses the use cases for the integration and offers a live demo of these technologies working together.
As director of product management at MapR Technologies, Neeraja Rentachintala is responsible for the product strategy, roadmap, and requirements of MapR SQL initiatives. Prior to MapR, Neeraja held numerous product management and engineering roles at Informatica, Microsoft SQL Server, Oracle, and Expedia.com, most recently as the principal product manager for Informatica Data Services/Data Virtualization. Neeraja holds a BS in electronics and communications from the National Institute of Technology in India and is product management certified from the University of Washington.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.