October 30–31, 2016: Training
October 31–November 2, 2016: Tutorials & Conference
New York, NY

Drilling into network data with Apache Drill

Charles Givre (Deutsche Bank)
1:30pm–5:00pm Monday, 10/31/2016
Tools and processes
Location: Mercury Ballroom Level: Intermediate
Average rating: ****.
(4.33, 3 ratings)

Prerequisite knowledge

  • Basic familiarity with SQL and networking technology

Materials or downloads needed in advance

  • A laptop with 30 GB of free space and 8 GB of memory (You will be provided with a VM with Apache Drill preinstalled and all datasets used in the class.)
  • Download Merlin ahead of the tutorial. Make sure you get version 2.5.
  • In order to run Merlin, you’ll also need to download VirtualBox.

What you'll learn

  • Learn how to use Drill to explore and analyze complex datasets, merge disperate datasets across multiple systems quickly and easily, and analyze network and log data

Description

One of the problems that every enterprise faces is data silos. Breaking down data silos can be extremely costly in terms of both time and money. Apache Drill offers a new approach that can de-silo data at scale without the need to redesign architecture.

Drill is an open source, schema-free SQL engine that can query all kinds of data, including CSV files, JDBC/ODBC databases, JSON, NoSQL, HDFS, and more. Drill uses ANSI SQL and is JDBC/ODBC compatible as well, so it can communicate results with BI tools such as Splunk or Tableau. Charles Givre demonstrates how to use Drill to query simple data, complex data, and data from databases and big data sources and walks you through writing your own functions to extend Drill’s functionality.

Topics include:

  • Installing Drill on a single machine and cluster
  • SQL refresher
  • Querying basic data, such as CSV files
  • Querying complex data, such as JSON and log files
  • Querying remote systems and merging them with local data
  • Writing user-defined functions (UDFs)
  • Interacting with Drill using other tools
Photo of Charles Givre

Charles Givre

Deutsche Bank

Charles Givre is an unapologetic data geek who is passionate about helping others learn about data science and become passionate about it themselves. For the last five years, Charles has worked as a data scientist at Booz Allen Hamilton for various government clients and has done some really neat data science work along the way, hopefully saving US taxpayers some money. Most of his work has been in developing meaningful metrics to assess how well the workforce is performing. For the last two years, Charles has been part of the management team for one of Booze Allen Hamilton’s largest analytic contracts, where he was tasked with increasing the amount of data science on the contract—both in terms of tasks and people.

Even more than the data science work, Charles loves learning about and teaching new technologies and techniques. He has been instrumental in bringing Python scripting to both his government clients and the analytic workforce and has developed a 40-hour Introduction to Analytic Scripting class for that purpose. Additionally, Charles has developed a 60-hour Fundamentals of Data Science class, which he has taught to Booz Allen staff, government civilians, and US military personnel around the world. Charles has a master’s degree from Brandeis University, two bachelor’s degrees from the University of Arizona, and various IT security certifications. In his nonexistent spare time, he plays trombone, spends time with his family, and works on restoring British sports cars.

Comments on this page are now closed.

Comments

10/31/2016 5:37am EDT

Thanks – I got your revised link, and I am downloading now.

Picture of Charles Givre
10/31/2016 4:52am EDT

I’m sorry everyone, I sent the wrong link to O’Reilly. Here is the correct link: http://bit.ly/merlin-vm. Please download version 2.5.

10/31/2016 4:38am EDT

the link sent in the earlier email regarding the Merlin download did not have a link to the file. I am registered for the Apache Drill tutorial and I do not yet have the VM file.

Picture of Charles Givre
10/25/2016 4:54pm EDT

Hi Michel Sahyoun,
I have a virtual machine which I’d ask you to use for the class. I’ll be sending and posting a link shortly, but there are a lot of features which require configuration and I’d really appreciate it if everyone uses the VM.
Thanks,
— C

10/25/2016 3:20pm EDT

Do we need a specific hypervisor installed on our laptops for the tutorial?

Thanks,

Michel