Spark and Shark: High-Speed Analytics over Hadoop and Hive Data

Data Science, Tools & Technology
Location: King's Suite - Sandringham Level: Non-technical
Average rating: ****.
(4.67, 12 ratings)
Slides:   1-PPTX 

As big data analytics evolves beyond simple batch jobs, there is a need for both lower-latency processing (interactive queries and steam processing) and more complex analytics (e.g. machine learning, graph algorithms). This talk will introduce Spark and Shark, open source projects from UC Berkeley that address this need through an optimized runtime engine and in-memory computing capabilities. Spark is a cluster computing engine that lets users concisely express a wide range of applications through APIs in Scala, Java and Python, and supports both streaming, batch and interactive analytics. Due to its support for in-memory storage and general operator graphs, it can run 100x faster than Hadoop for complex algorithms such as machine learning and graph processing. Shark extends Apache Hive to run over Spark. It runs over unmodified Hive warehouses and lets users query data up to 10x faster for on-disk tables and 100x faster for in-memory data. Both projects are compatible with the Hadoop ecosystem and have a growing open-source community, with over 15 companies contributing in the past year.

Photo of Patrick Wendell

Patrick Wendell

Databricks

Patrick Wendell is a committer on Apache Spark and a co-founder of Databricks. Before Databricks, he was pursuing a Ph.D in the UC Berkeley AMPLab advised by Ion Stoica. His research focused on scalable low latency scheduling for data processing frameworks. In the past, he has contributed to several Hadoop projects, including Apache Flume and Apache Avro. He holds a B.S. in Computer Science from Princeton University and an M.S. in Computer Science from UC Berkeley.

Comments on this page are now closed.

Comments

Richard Zaresbki
28/11/2013 12:16 GMT

ignore my comment! antivirus problem. works now! cheers

Richard Zaresbki
28/11/2013 10:58 GMT

hi there, I’m having problems opening the slide show. It downloads fine. would you be able to repost or advise? regards Rich.

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts