Hadoop has achieved unprecedented growth in deployments across both enterprise and cloud service providers as the standard for storing and managing petabytes of data of all types. However, until recent advances from Spark, Presto, Hive, and Impala, companies were unable to turn this growing deluge of data into actionable business intelligence. Reports show that over 53% of Hadoop users are doing interactive SQL on Hadoop, making it the second-most-popular processing model after MapReduce (6/14, Merv Adrian, Gartner). But why is analytical SQL making a major comeback? At a technical level, new SQL engines don’t need to do transactions and were built in a distributed architecture, so they are able to access data at incredible speeds. Additionally, these new SQL query engines allow for serious SQL, such as window functions and materialized view, that was previously limited to data warehouses.
Lloyd Tabb, CTO of Looker, Suresh Duddi, VP of engineering at Yahoo, Rex Gibson, manager of data engineering at Knewton, and Nick Amabile, principal and cofounder of FullStack Analytics, explain why analytical SQL is making a major comeback and how this shift is finally allowing them to see real business value from their Hadoop implementations, as they discuss their respective data architectures (and how they are able to have hundreds of users accessing Hadoop for dashboarding and data exploration), the trials and tribulations of running analytics in-cluster, and examples of the real business value gained from putting their data in the hands of employees across their companies.
While the conceptual debate for or against SQL on Hadoop is interesting, the real value is how the advent of performant SQL on Hadoop technologies is changing the ROI companies are seeing from Hadoop. The main reason SQL on Hadoop matters today is the same reason it always mattered: it is the common way to ask and answer questions. Making the data in Hadoop accessible to the huge number of people who know SQL (compared to the fraction of people who know how to write code) makes it infinitely more useful.
Surya Mukherjee is a senior analyst for Ovum’s Information Management team responsible for the analysis of enterprises’ business intelligence technology investment priorities, market forecast models, and product and vendor evaluations. He is also responsible for the delivery of research-based consulting projects relating to the information management software markets. Based in London, Surya is a thought leader and has given keynotes at several global events.
From writing the first application server for the web to designing the first crowdsourced ecosystem, Lloyd Tabb has spent the last 25 years revolutionizing how the world uses the Internet and, by extension, data. As cofounder and CEO of Looker, Lloyd combines his passion for data exploration and discovery, his love of programming languages, and his commitment to developing and nurturing talent and change the face of the business intelligence market. Originally a database and languages architect at Borland International, Lloyd left to found Commerce Tools (acquired by Netscape in 1995). At Netscape, he became the principal engineer on Netscape Navigator Gold, led several releases of Communicator, and helped define the creation of Mozilla.org. Prior to founding Looker, Lloyd was the CTO of LiveOps, cofounder of Readyforce, and advisor to Luminate, recently acquired by Yahoo.
Nick Amabile is the cofounder and principal of FullStack Analytics, a data and analytics consulting firm based in Brooklyn, NY, that builds data pipelines and delivers business value from data using the latest technologies and analytical techniques. Nick has helped startups and Fortune 500 companies alike gain insight from their data. Most recently, he held leadership positions at Jet.com, during hypergrowth, and Etsy, during their IPO.
Rex Gibson is head of data warehousing at Knewton, the world’s leading adaptive learning platform. Knewton’s mission, to bring personalized education to the world, is built on its data. Rex wrote his first code on an Apple IIe in 1986. In 1996, he wrote his first SQL statement at Webster University while soldering circuits and transcribing Charlie Parker solos. Since then Rex has made his mark developing tools that make businesses more efficient. Rex has built data warehouses for a wide variety of industries including finance, construction, arts/entertainment, human resources, government, retail, and edtech. He has defended the United Nation’s Mission websites from hackers, built an email marketing platform for the Metropolitan Opera, and managed 24×7 systems teams. Rex is profoundly grateful to all of the talented people he has learned from along the way. His most recent teacher is two years old and loves dinosaurs and ukulele music.
Suresh Duddi (DP to his friends and coworkers) is a tech wizard with over 25 years of experience in Silicon Valley. He is currently a VP of product and engineering at Yahoo, where he leads a team that builds analytics for Yahoo.com while managing a petabyte of data. DP is also no stranger to the startup world; he cofounded Habitera, where he focused on creating solutions for health using behavioral economics, and worked for both LiveOps and Simply Hired. He’s worked in engineering leadership positions at pioneering companies including Netscape, where he invented the Internet. :-) When you talk with DP, you’ll recognize his deep passion for technology. In his free time, you’ll find him playing volleyball, discovering local South Indian restaurants to satisfy his cravings, and baking healthy cakes with his daughters.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.