Neo4j With Scala: Awesome Experience with Spark


Lets start our journey again in this series. In the last blog we have discussed about the Data Migration from the Other Database to the Neo4j. Now we will discuss that how can we combine Neo4j with Spark?

Before starting the blog here is recap :

  1. Getting Started Neo4j with Scala : An Introduction
  2. Neo4j with Scala: Defining User Defined Procedures and APOC
  3. Neo4j with Scala: Migrate Data From Other Database to Neo4j

Now we start our journey. We know that we use Apache Spark is a generalized framework for distributed data processing providing functional API for manipulating data at large scale, in-memory data caching and reuse across computations. Now when we are using Neo4j as a graph database and we have to perform data processing that time we use Spark for the data processing. We have to follow some basic step before start playing with Spark.

  1. We can download Apache Spark 2.0.0 (Download Here). Here we have to remember that we can use only Spark 2.0.0 because which one connector we are going to use, it is build on Spark 2.0.0 .
  2. Set the SPARK_HOME Path in the .bashrc file(for Linux user).
  3. Now We can use Neo4j-Spark-Connector :

Configuration with Neo4j :

  1. When we are running Neo4j with default host and port then we have to configure only our password in the spark-defaults.conf.
     spark.neo4j.bolt.password=your_neo4j_password
  2. If we are not running Neo4j with default host and port then we have to configure it spark-defaults.conf.
      1. spark.neo4j.bolt.url=bolt://$host_name:$port_number
      2. spark.neo4j.bolt.user=user_name
      3. spark.neo4j.bolt.password=your_neo4j_password
    
  3. We can provide user and password with the URL.
     spark.neo4j.bolt.url=bolt://neo4j:<password>@localhost

Integration of Neo4j Spark Connector :

  1. Download Neo4j-Spark-Connector(Download Here) and build it. After building we can provide jar path when we start spark-shell.
    $ $SPARK_HOME/bin/spark-shell --jars neo4j-spark-connector_2.11-full-2.0.0-M2.jar
  2. We can provide Neo4j-Spark-Connector as a package also when we start spark-shell.
    $ $SPARK_HOME/bin/spark-shell --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2
  3. We can directly integrate Neo4j-Spark-Connector in SBT project.
    resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
    libraryDependencies += "neo4j-contrib" % "neo4j-spark-connector" % "2.0.0-M2"
  4. We can directly integrate Neo4j-Spark-Connector with POM(See Here).

Now we are ready to use Neo4j with Spark. We will start Spark from the command prompt.

$ $SPARK_HOME/bin/spark-shell --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2

Screenshot from 2016-09-26 14:03:43.png

Now we are ready to start a new journey with the Spark and Neo4j.

I suppose that you have created data in the Neo4j for running the basic commands. If you have not created yet then here is a cypher for creating 1K record in your Neo4j Database :

1. UNWIND range(1, 1000) AS row
   CREATE (:Person {id: row, name: 'name' + row, age: row % 100})

2. UNWIND range(1, 1000) AS row
   MATCH (n)
   WHERE id(n) = row
   MATCH (m)
   WHERE id(m) = toInt(rand() * 1000)
   CREATE (n)-[:KNOWS]->(m)

RDD Operations :

We start with some RDD operations on the data which store in the Neo4j.


import org.neo4j.spark._

val neo = Neo4j(sc)

val rowRDD = neo.cypher("MATCH (n:Person) RETURN n.name as name limit 10").loadRowRdd
rowRDD.map(t => "Name: " + t(0)).collect.foreach(println)

Screenshot from 2016-10-04 15:17:28.png


import org.neo4j.spark._

val neo = Neo4j(sc)

//calculate mean from the age data

val rowRDD = neo.cypher("MATCH (n:Person) RETURN n.age").loadRdd[Long].mean
//rowRDD: Double = 49.5

// load relationships via pattern

neo.pattern("Person",Seq("KNOWS"),"Person").partitions(12).batch(100).loadNodeRdds.count
//res30: Long = 1000

DataFrame Operations :

Now we will perform some operation on the DataFrame.


import org.neo4j.spark._

val neo = Neo4j(sc)
val df = neo.cypher("MATCH (n:Person) RETURN n.name as name, n.age as age limit 5").loadDataFrame

df.collect.foreach(println)

Screenshot from 2016-10-04 16:42:46.png


import org.neo4j.spark._

val neo = Neo4j(sc)

//calculate sum from the age data

val df = neo.cypher("MATCH (n:Person) RETURN n.age as age").loadDataFrame
df.agg(sum(df.col("age"))).collect
//res10: Array[org.apache.spark.sql.Row] = Array([49500])

// load relationships via pattern

val df = neo.pattern("Person",Seq("KNOWS"),"Person").partitions(2).batch(100).loadDataFrame.count
//df: Long = 200

Graphx Graph Operations :

Now we will perform some operation on the Graphx Graph.


import org.neo4j.spark._

val neo = Neo4j(sc)

import org.apache.spark.graphx._
import org.apache.spark.graphx.lib._

//load graph

val graphQuery = "MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) as source, id(m) as target, type(r) as value"
val graph: Graph[Long, String] = neo.rels(graphQuery).partitions(10).batch(100).loadGraph

graph.vertices.count
//res0: Long = 1000

graph.edges.count
//res1: Long = 999

//load graph with pattern

val graph = neo.pattern(("Person","id"),("KNOWS","since"),("Person","id")).partitions(10).batch(100).loadGraph[Long,Long]

//Count in-degree

graph.inDegrees.count
//res0: Long = 505

val graph2 = PageRank.run(graph, 5)

graph2.vertices.sortBy(_._2).take(3)

//res1: Array[(org.apache.spark.graphx.VertexId, Double)] = Array((602,0.15), (630,0.15), (476,0.15))

This is the basic example for using Spark with Neo4j.

I hope it will help you to start working with Spark and Neo4j.

Thanks.

Reference:

  1. Neo4j Spark Connector

KNOLDUS-advt-sticker

Advertisements

About Anurag Srivastava

Anurag is the Sr. Software Consultant @ Knoldus Software LLP. In his 3 years of experience, he has become the developer with proven experience in architecting and developing web applications.
This entry was posted in apache spark, database, Scala, Spark and tagged , , . Bookmark the permalink.

3 Responses to Neo4j With Scala: Awesome Experience with Spark

  1. Anurag-Knoldus says:

    Reblogged this on Anurag-Knoldus.

  2. Pingback: Neo4j With Scala: Neo4j vs ElasticSearch | Knoldus

  3. Pingback: Neo4j With Scala: Rest End Points with AKKA HTTP | Knoldus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s