“Big data” problems are increasingly common. Storage and compute services are inexpensive and easy to get in the cloud. New data sources — sensor readings, video and still imagery, audio, telemetry from software systems and devices, web logs, and data from the biological and physical sciences are exploding, and it is now possible to store them on-line cheaply.
Apache Hadoop is a powerful tool for analyzing these new, very diverse, repostories. Hadoop can scan and analyze petabytes of data using a collection of commodity servers working in parallel.
Hadoop enforces a different programming paradigm from relational databases, large-scale data warehouses and special-purpose high-performance computing systems. Hadoop uses shared-nothing parallel execution to break large data processing tasks into small pieces that get distributed among many servers. Those tasks can include code written by the user that operates on complex data in its native format. Hadoop relies on a high-performance distributed file system, HDFS, for data storage and replication.
In this talk, I’ll explain what Hadoop and related open source projects do, how they operate, and how they are used in real-world workloads to answer questions that simply can’t be posed using other systems.
Mike Olson is the CEO of Cloudera, which offers support and services for Hadoop. He was an early architect of the Postgres database system and has worked as an engineer, manager and executive at a number of database companies, including Britton Lee, Illustra, Informix and Oracle. He was CEO of Sleepycat Software, makers of Berkeley DB, through the company’s acquisition by Oracle.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com
Download the OSCON Sponsor/Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required)
View a complete list of OSCON contacts
Comments
Talked primarily about types of problems his consultancy clients were working with. Not about Hadoop itself.