Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Analyzing dynamic JSON with Apache Drill

Tomer Shiran (Dremio)
14:55–15:35 Friday, 3/06/2016
Hadoop use cases
Location: Capital Suite 14 Level: Intermediate
Average rating: ****.
(4.00, 5 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of SQL and JSON.

Description

Modern data is often messy and does not fit into the old schema-on-write or even the newer schema-on-read paradigms. Some data effectively has no schema at all. For example, in a MongoDB collection or a Mixpanel log file, different records may have different fields, and identically named fields in different records may have different types. This can make doing any sort of analysis extremely difficult.

Apache Drill has been built with this sort of data in mind. Tomer Shiran explores how to analyze such data with Drill, covering Drill’s internal architecture and explaining how type introspection can be used to query JSON and JSON-structured data—such as data in MongoDB—without requiring a schema.

Photo of Tomer Shiran

Tomer Shiran

Dremio

Tomer Shiran is cofounder and CEO of Dremio. Previously, Tomer was the vice president of product at MapR, where he was responsible for product strategy, road map, and new feature development. As a member of the executive team, he helped grow the company from 5 employees to over 300 employees and 700 enterprise customers. Previously, Tomer held numerous product management and engineering positions at Microsoft and IBM Research. He’s the author of five US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from the Technion, the Israel Institute of Technology.