Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Apache Drill bootcamp

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
1:30pm–5:00pm Tuesday, 09/29/2015
Spark & Beyond
Location: 3D 06/07 Level: Intermediate
Average rating: ***..
(3.88, 17 ratings)

Materials or downloads needed in advance

They will need a laptop. In addition, instructions will be provided on how to install Java and Drill.

Description

In this tutorial you’ll learn how to use Apache Drill, the open source, distributed schema-free, SQL engine. At the end of this tutorial, you’ll be able to explore and analyze your data in place with standard SQL queries or BI tools, whether the data is sitting in files on your laptop, or in HDFS, HBase, MongoDB, or even a relational database.

Agenda:

  • Apache Drill overview
  • Hello World!
  • Data model and data types
  • Data sources: storage plugin architecture; using storage plugins; navigating the namespace; HDFS; Hive; HBase; MongoDB
  • Metadata in Drill: decentralized metadata; optional schemas; information catalog
  • Exploring, analyzing, and transforming data: exploration (SELECT * LIMIT 10 and Drill Explorer); analysis (SELECT); transformation (CREATE TABLE AS)
  • Using virtual datasets (Views): why virtual datasets?; creating virtual datasets (CLI and Drill Explorer); virtual dataset internals (.drill); how virtual datasets are exposed
  • APIs: ODBC; JDBC; REST; C; Java
  • Clients: CLI; BI (Excel, Tableau, etc.); Python (PyData, Pandas); R
  • Querying complex and/or schemaless data: handling schemaless data; traditional BI on complex data

Note that this is a hands-on tutorial, so bring your laptop and you’ll be able to run all the examples (as well as some of your own queries) throughout the tutorial.

Photo of Tomer Shiran

Tomer Shiran

Dremio

Tomer Shiran is cofounder and CEO of Dremio. Previously, Tomer was the vice president of product at MapR, where he was responsible for product strategy, roadmap, and new feature development. As a member of the executive team, he helped grow the company from 5 employees to over 300 employees and 700 enterprise customers. Prior to MapR, Tomer held numerous product management and engineering positions at Microsoft and IBM Research. He is the author of five US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from Technion, the Israel Institute of Technology.

Photo of Jacques Nadeau

Jacques Nadeau

Dremio

Jacques Nadeau is the cofounder and CTO of Dremio. Previously, he ran MapR’s distributed systems team; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive. Jacques is cocreator and PMC chair of Apache Arrow, a PMC member of Apache Calcite, a mentor for Apache Heron, and the founding PMC chair of the open source Apache Drill project.