Training: 8–9 November 2016
Tutorials & Conference: 9–11 November 2016
Amsterdam, NL

Drilling into network data with Apache Drill

Charles Givre (Deutsche Bank)
13:30–17:00 Wednesday, 9 November, 2016
Tech, tools, and processes
Location: E104/106 Level: Intermediate

Prerequisite knowledge

  • Basic familiarity with SQL and networking technology

Materials or downloads needed in advance

  • A laptop with 30 GB of free space and 8 GB of memory (You will be provided with a VM with Apache Drill preinstalled and all datasets used in the class.)

What you'll learn

  • Learn how to use Drill to explore and analyze complex datasets, merge disperate datasets across multiple systems quickly and easily, and analyze network and log data

Description

One of the problems that every enterprise faces is data silos. Breaking down data silos can be extremely costly in terms of both time and money. Apache Drill offers a new approach that can de-silo data at scale without the need to redesign architecture.

Drill is an open source, schema-free SQL engine that can query all kinds of data, including CSV files, JDBC/ODBC databases, JSON, NoSQL, HDFS, and more. Drill uses ANSI SQL and is JDBC/ODBC compatible as well, so it can communicate results with BI tools such as Splunk or Tableau. Charles Givre demonstrates how to use Drill to query simple data, complex data, and data from databases and big data sources and walks you through writing your own functions to extend Drill’s functionality.

Topics include:

  • Installing Drill on a single machine and cluster
  • SQL refresher
  • Querying basic data, such as CSV files
  • Querying complex data, such as JSON and log files
  • Querying remote systems and merging them with local data
  • Writing user-defined functions (UDFs)
  • Interacting with Drill using other tools
Photo of Charles Givre

Charles Givre

Deutsche Bank

Charles Givre is an unapologetic data geek who is passionate about helping others learn about data science and become passionate about it themselves. For the last five years, Charles has worked as a data scientist at Booz Allen Hamilton for various government clients and has done some really neat data science work along the way, hopefully saving US taxpayers some money. Most of his work has been in developing meaningful metrics to assess how well the workforce is performing. For the last two years, Charles has been part of the management team for one of Booze Allen Hamilton’s largest analytic contracts, where he was tasked with increasing the amount of data science on the contract—both in terms of tasks and people.

Even more than the data science work, Charles loves learning about and teaching new technologies and techniques. He has been instrumental in bringing Python scripting to both his government clients and the analytic workforce and has developed a 40-hour Introduction to Analytic Scripting class for that purpose. Additionally, Charles has developed a 60-hour Fundamentals of Data Science class, which he has taught to Booz Allen staff, government civilians, and US military personnel around the world. Charles has a master’s degree from Brandeis University, two bachelor’s degrees from the University of Arizona, and various IT security certifications. In his nonexistent spare time, he plays trombone, spends time with his family, and works on restoring British sports cars.