In this talk I will introduce you to a Docker container that provides an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You’ll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database, there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. Open source tools like Apache Hadoop and Apache Spark provide opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Kenny Bastani is a technology evangelist and open source software advocate in Silicon Valley. As an enterprise software consultant he has applied a diverse set of skills needed for projects requiring a full stack web developer in agile mode. As a passionate advocate for the popular graph database Neo4j, Kenny has supported developers from globally recognized companies who have inserted the NoSQL database inside their technology stack. As a blogger and open source contributor, Kenny engages a community of developers who are looking to take advantage of newer graph processing techniques to analyze data.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org