Neo4j With Scala: Awesome Experience with Spark

Table of contents
Reading Time: 4 minutes

Lets start our journey again in this series. In the last blog we have discussed about the Data Migration from the Other Database to the Neo4j. Now we will discuss that how can we combine Neo4j with Spark?

Before starting the blog here is recap :

  1. Getting Started Neo4j with Scala : An Introduction
  2. Neo4j with Scala: Defining User Defined Procedures and APOC
  3. Neo4j with Scala: Migrate Data From Other Database to Neo4j

Now we start our journey. We know that we use Apache Spark is a generalized framework for distributed data processing providing functional API for manipulating data at large scale, in-memory data caching and reuse across computations. Now when we are using Neo4j as a graph database and we have to perform data processing that time we use Spark for the data processing. We have to follow some basic step before start playing with Spark.

  1. We can download Apache Spark 2.0.0 (Download Here). Here we have to remember that we can use only Spark 2.0.0 because which one connector we are going to use, it is build on Spark 2.0.0 .
  2. Set the SPARK_HOME Path in the .bashrc file(for Linux user).
  3. Now We can use Neo4j-Spark-Connector :

Configuration with Neo4j :

  1. When we are running Neo4j with default host and port then we have to configure only our password in the spark-defaults.conf.
     spark.neo4j.bolt.password=your_neo4j_password
  2. If we are not running Neo4j with default host and port then we have to configure it spark-defaults.conf.
      1. spark.neo4j.bolt.url=bolt://$host_name:$port_number
      2. spark.neo4j.bolt.user=user_name
      3. spark.neo4j.bolt.password=your_neo4j_password
    
  3. We can provide user and password with the URL.
     spark.neo4j.bolt.url=bolt://neo4j:<password>@localhost

Integration of Neo4j Spark Connector :

  1. Download Neo4j-Spark-Connector(Download Here) and build it. After building we can provide jar path when we start spark-shell.
    $ $SPARK_HOME/bin/spark-shell --jars neo4j-spark-connector_2.11-full-2.0.0-M2.jar
  2. We can provide Neo4j-Spark-Connector as a package also when we start spark-shell.
    $ $SPARK_HOME/bin/spark-shell --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2
  3. We can directly integrate Neo4j-Spark-Connector in SBT project.
    resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
    libraryDependencies += "neo4j-contrib" % "neo4j-spark-connector" % "2.0.0-M2"
  4. We can directly integrate Neo4j-Spark-Connector with POM(See Here).

Now we are ready to use Neo4j with Spark. We will start Spark from the command prompt.

$ $SPARK_HOME/bin/spark-shell --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2

Screenshot from 2016-09-26 14:03:43.png

Now we are ready to start a new journey with the Spark and Neo4j.

I suppose that you have created data in the Neo4j for running the basic commands. If you have not created yet then here is a cypher for creating 1K record in your Neo4j Database :

RDD Operations :

We start with some RDD operations on the data which store in the Neo4j.

Screenshot from 2016-10-04 15:17:28.png

DataFrame Operations :

Now we will perform some operation on the DataFrame.

Screenshot from 2016-10-04 16:42:46.png

Graphx Graph Operations :

Now we will perform some operation on the Graphx Graph.

This is the basic example for using Spark with Neo4j.

I hope it will help you to start working with Spark and Neo4j.

Thanks.

Reference:

  1. Neo4j Spark Connector

KNOLDUS-advt-sticker

Written by 

Anurag is the Sr. Software Consultant @ Knoldus Software LLP. In his 3 years of experience, he has become the developer with proven experience in architecting and developing web applications.

3 thoughts on “Neo4j With Scala: Awesome Experience with Spark5 min read

Comments are closed.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading