Neo4j With Scala : Migrate Data From Other Database to Neo4j


Hello Folks;

Lets continue the Neo4j with Scala. We have earlier discuss about the use of Neo4j with Scala and Neo4j APOC with Scala. In this blog we are going to discuss about how we can migrate data from the other database like MYSQL, PostgreSQL, Oracle and Cassandra.

But before starting the journey, To those who have caught the train late 😉 , this is what has happened till now:

  1. Getting Started Neo4j with Scala : An Introduction
  2. Neo4j with Scala: Defining User Defined Procedures and APOC

So till now we have some basic understanding of Neo4j and APOC with Scala. Now we will start blog for the series.

We use many databases for storing the data. But when we have a large amount of data and tables that time it becomes so hard to make query and execute them on the database. We have to be extra cautious to perform the task and we get bored to see same screen without any fun 😉 .

We have solution for this. We can use Neo4j for this, where we find more fun to do work and its not difficult as we have discuss it in first blog. Before starting the discussion I want to clear one think that when we want to migrate data then we have to keep Neo4j APOC Kit (Download and Install) in the $Neo4j_Home/plugins. Now we start discussing about every database which I have mention before that how we can connect and transfer data from that database to Neo4j :

PostgreSQL :

When we use PostgreSQL, we have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and restart the Neo4j. After restarting the Neo4j server we are set for migrating the data from the PostgreSQL to Neo4j.

  • Now we load the driver with the APOC.

CALL apoc.load.driver('org.postgresql.Driver');
  • Now we create the call for fetching the data from the PostgreSQL where we have a table with name employee_details to Neo4j.
with 'jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres' as url
CALL apoc.load.jdbc(url,'employee_details') YIELD row
RETURN count(*);
  • If we don’t want to use these step than we can provide URL in the $Neo4j_Home/conf/neo4j.conf and restart the server :
apoc.jdbc.postgresql_url.url=jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres

We can now fetch data direct. We don’t need to load driver also.

CALL apoc.load.jdbc('postgresql_url','employee_details') YIELD row
RETURN count(*);
  • Create Nodes and Relation in the data.
/**
* Here we define schema and key. In first column we define those column_name
* which can be null and In the second we those column name which we want unique.
*/

CALL apoc.schema.assert( {Detail:['name','age','address','salary']},
{Detail:['id']});

/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/

CALL apoc.load.jdbc('jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres','employee_details') yield row
CREATE (t:Detail {id:toString(row.id), name:row.name,
age:toString(row.age), address:row.address, salary:toString(row.salary)})
return t;

Screenshot from 2016-09-09 11:09:37.png

screenshot-from-2016-09-09-010005

MYSQL :

We want to migrate data from the MYSQL as before we have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and update $Neo4j_Home/conf/neo4j.conf as:

apoc.jdbc.mysql_url.url=jdbc:mysql://localhost:3306/test?user=user&password=pass

Restart the Neo4j server and we are set for migrating the data from the Cassandra to Neo4j.

  • We hit the MySQL and start fetching data and perform count operation.
CALL apoc.load.jdbc('mysql_url','employee_data') yield row
RETURN count(*);

Screenshot from 2016-09-10 12:36:34 (copy).png

Cassandra :

Now we migrate data from the Cassandra to Neo4j. Now we first import data into the cassandra if we don’t have data in the cassandra or we can use it for test also.

  • We have to run following command for setting up initial data in the cassandra :
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql
  • We have set our cassandra database with the data. We have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins. We can provide URL in the $Neo4j_Home/conf/neo4j.conf as :
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist

Restart the Neo4j server and we are set for migrating the data from the Cassandra to Neo4j.

  • We hit the cassandra and start fetching data and perform count operation.
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);

screenshot-from-2016-09-09-130005

  • Let’s create Index, Constraints and Relation the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
  {Track:['title','length']},
  {Artist:['name'],Track:['id'],Genre:['name']});

Screenshot from 2016-09-09 13:12:25.png

  • Now we will load data and perform Merge and Create operation so that we can create the node and relationship between the node.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track,
length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);

Screenshot from 2016-09-09 13:13:28.png

  • We can see Relation Graph and it will look something like this :
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:PERFORMED]->() RETURN p LIMIT 25;

Performed.png

/**
* For Displaying GENRE Relation
*/

MATCH p=()-[r:GENRE]->() RETURN p LIMIT 100;

Screenshot from 2016-09-09 14:11:41.png

Oracle :

We are in the last database to migrate data to Neo4j but as obvious not least. We can download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and restart the Neo4j. We can provide URL in the $Neo4j_Home/conf/neo4j.conf as :

apoc.jdbc.oracle_url.url=jdbc:oracle:thin:user/password@127.0.0.1:1521/XE
  • After restarting the Neo4j server we are set for migrating the data from the Oracle to Neo4j. We fetch the data from the Oracle where we have a table with name employee_details to Neo4j.Now we load the driver with the APOC.
CALL apoc.load.jdbc('oracle_url','employee_details') YIELD row
RETURN count(*);

screenshot-from-2016-09-10-123634

  • Let’s create Index, Constraints and Relation the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
 {EMPINFO:['name', 'age','salary']},
 {EMPINFO:['id'],ADDRESS:['address']});

Screenshot from 2016-09-10 14:53:00.png

  • Now we will load data and perform Merge and Create operation so that we can create the node and relationship between the node.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('oracle_url','employee_details') yield row
MERGE (g:ADDRESS {name:row.ADDRESS})
CREATE (t:EMPINFO {id:toString(row.ID), name:row.NAME, age:toString(row.AGE), salary:toString(row.SALARY)})
CREATE (t)-[:LIVE]->(g);

screenshot-from-2016-09-10-145400

  • We can see Relation Graph and it will look something like this :
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:LIVE]->() RETURN p LIMIT 25;

screenshot-from-2016-09-10-145506

Now we can see that it is so easy to migrate data from other database to Neo4j.

After importing the data in Neo4j, we have to thing about the sync of data. We can use schedule process which can be timebase and automatically sync data between the databases. We can also used event based integration where we will defined the event at which we want to update the database.

Note : As we discuss I want to notify again if you do not update driver name into $Neo4j_Home/conf/neo4j.conf then you have to load driver in Neo4j otherwise you have to provide only driver name into the query.

I hope it will help for migrating data into the Neo4j. After fetching data we can write simple Scala code for persisting data in the Neo4j as we discuss in the first blog.

Reference:

  1. Neo4j Apoc Procedures

 


KNOLDUS-advt-sticker

Advertisements
This entry was posted in database, Scala and tagged , , , , , . Bookmark the permalink.

6 Responses to Neo4j With Scala : Migrate Data From Other Database to Neo4j

  1. gst says:

    Yes it is the need of some operator..

  2. Pingback: Graph Story Newsletter: Graphs Make Us Safe From Tainted Burritos – Graph Story Blog

  3. Pingback: Neo4j With Scala: Awesome experience with Spark | Knoldus

  4. Pingback: Neo4j With Scala: Neo4j vs ElasticSearch | Knoldus

  5. Pingback: Neo4j With Scala: Rest End Points with AKKA HTTP | Knoldus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s