Computing various quantities such as medians or the number of unique elements requires a lot of time or a lot of memory or both. It is, however, possible to get really close to the exact answer with much less time and much less memory. Some of these algorithms are much simpler than you might expect. I will describe a selection of these algorithms including some not yet published results.
I will also outline how these algorithms can be applied to practical problems like anomaly detection.
Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member for the Apache Mahout project. He contributing to the Mahout clustering, classification and matrix decomposition algorithms. He was the chief architect behind the MusicMatch, (now Yahoo Music) and Veoh recommendation systems and built fraud detection systems for ID Analytics.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.