July 20–24, 2015
Portland, OR

Open source big graph analytics on Neo4j with Apache Spark

Kenny Bastani (Digital Insight)
2:30pm–3:10pm Wednesday, 07/22/2015
Data Portland 252
Average rating: ****.
(4.22, 9 ratings)
Slides:   1-BIN 

Prerequisite Knowledge

Attendees should have an understanding of what PageRank is and sparse knowledge of big data platforms like Hadoop and limited knowledge of Apache Spark.

Description

In this talk I will introduce you to a Docker container that provides an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You’ll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.

Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database, there may be tedious transformations and shuffling around of data to perform large scale analysis.

Fast and scalable analysis of big data has become a critical competitive advantage for companies. Open source tools like Apache Hadoop and Apache Spark provide opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.

Photo of Kenny Bastani

Kenny Bastani

Digital Insight

Kenny Bastani is a technology evangelist and open source software advocate in Silicon Valley. As an enterprise software consultant he has applied a diverse set of skills needed for projects requiring a full stack web developer in agile mode. As a passionate advocate for the popular graph database Neo4j, Kenny has supported developers from globally recognized companies who have inserted the NoSQL database inside their technology stack. As a blogger and open source contributor, Kenny engages a community of developers who are looking to take advantage of newer graph processing techniques to analyze data.