Scaling Data Analysis with Apache Mahout

Location: Mission City B5
Average rating: **...
(2.75, 12 ratings)

The amount of digital data available online has been exploding in recent years: Users generate content on blogs and micro-blogs, shopping sites make product reviews and detailed descriptions available. With such amounts of data at their fingertips software developers are more than ever in need for a scalable, easy to use framework for extracting knowledge from the data. Apache Mahout offers scalable implementations of algorithms for data mining and machine learning.

Scalable here means “scalable community” as in the project is based on a sustainable community. The number of possible use cases is scalable in that the library is available under a commercially friendly license . Of course scalable also means scalable in terms of amount of data to process: Apache Mahout is easy to start with but scales to increasing data volumn due to its use of Apache Hadoop.

After motivating the need for machine learning the talk gives an overview of Apache Mahout including a deep dive to one of its algorithms. It shows the tremendous improvements that have been implemented in recent past – including the addition of several algorithms, performance improvements. Last but not least Apache Mahout graduated to a top level project this year.

Photo of Isabel Drost-Fromm

Isabel Drost-Fromm

Apache Software Foundation/ Nokia Gate 5 GmbH

Isabel Drost is member of the Apache Software Foundation. She is organiser of the Apache Hadoop Get Together in Berlin, was co-organiser of the first European NoSQL meetup as well as the Berlin Buzzwords conference. She co-founded Apache Mahout and is active Apache Mahout committer. Isabel is actively engaged with communities of various Apache projects, e.g. Apache Lucene and Apache Hadoop. She is regular speaker at renown conferences on topics related to free software development, scalability, Apache Lucene, Apache Hadoop and Apache Mahout.

Comments on this page are now closed.


HÃ¥kan Jonsson
02/07/2011 4:35pm PST

I had high expectations on this talk, but it turned out to be about ML basics rather than Mahout. Please come back with a more advanced and Mahout specific talk.


  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts