GraphBuilder – Scalable Graph Construction using Hadoop

Average rating: ****.
(4.50, 2 ratings)

The exponential growth in the study of graph-based data dependencies is fueling the need for large scale machine learning frameworks and techniques. The nature of these computations is iterative and compute-centered. Recently, frameworks, such as Google’s Giraph, Apache’s Hama, and CMU’s GraphLab, have emerged to perform these computations in a distributed manner at commercial scale. But feeding data to these frameworks is a huge challenge in itself. Since graph construction is a data-parallel problem, Hadoop is well-suited for this task but lacks some elements that would make things easier for Map-Reduce programmers.

In this talk, Nilesh will introduce GraphBuilder, a graph construction library for Apache Hadoop. GraphBuilder makes the job easy by providing services for transforming unstructured data into graphs, graph cleaning, output-formatting, and partitioning graphs ahead of cluster ingress.

Nilesh will review emerging frameworks for graph-based machine learning and explain the benefits of GraphBuilder by sharing end-to-end case studies for complex machine learning applications, such as sentiment analysis and perceptual computing. Finally he will explain how his work is evolving to accommodate more frameworks and complex ingress structures.

Photo of Nilesh Jain

Nilesh Jain

Intel Corp

Nilesh Jain is Sr. Research Scientist with the Cluster Computing Architecture team in Intel Labs. His current research focus is on emerging frameworks for large-scale machine learning and big data analytics. His other research interests include systems architectures and technologies that improve scaling, performance, and power consumption of distributed parallel computing. Before joining Intel Labs in 2007, Nilesh spent 11 years working on various telecom and I/O technologies within Intel product groups and at a premier telecom research organization (C-DOT) in India. Nilesh was an open source contributor to Linux Standard Base (LSB).

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.