Apache Avro provides an expressive, efficient standard for representing large data sets. Avro data is programming-language neutral and MapReduce-friendly. Hopefully it can replace gzipped CSV-like formats as a dominant format for data.
This tutorial will explain MapReduce and how to develop big data applications in Java and high level languages such as Pig and Hive SQL. Using examples it will cover how to prototype, debug, monitor, test and optimize big data applications for Hadoop. Attendees will get hands-on instruction and will leave with a solid understanding of how to analyze data on Hadoop clusters and practical examples.
A discussion of Big Data approaches to analysis problems in marketing, forecasting, academia and enterprise computing. We focus on practices to enhance collaboration and employ rich statistical methods: a Magnetic, Agile and Deep (MAD) approach to analytics. While the approach is language-agnostic, we show that sophisticated statistics can be easily scaled in traditional environments like SQL.