An alternate way to implement JOINs in MongoDB [Update]


I have updated this blog . As last description was creating some misunderstanding and lot of confusion .

In one of my project , there was a requirement to generate report in .csv format from MongoDB without using any reporting tool and any language .
I had to use only MongoDB query .
In that report , I had to use multiple collection . Since there was not a well structured DB schema .
We are using MongoDB , which is non -relational database . Despite of this fact , there were related collections .

As we know MongoDB is a no-SQL database and doesn’t support joins. But I found an alternate to implement JOINs in MongoDb using Map Reduce.

For Versions below Mongo 2.4:-

Let’s go with a small example :
We have two collections Employee and Department . We have to fetch details of all employees with their department .

1) Create Employee and Department collection .

db.createCollection("employee");
db.employee.insert(
   {
      _id: 1,
      name: { first: 'John', last: 'Backus' },
      city: 'New York',
      department:1
   }
);

db.employee.insert(
    {
      _id: 2,
      name: { first: 'Merry', last: 'Desuja' },
      city: 'London',
      department:2
   }
);
db.createCollection("department");
db.department.insert(
   {
      _id: 1,
      department: 'Manager'
   }
);

db.department.insert(
    {
      _id: 2,
      department: 'Accountant'
   }
);

2) Create a Map . This is a JavaScript function which emits key and value pair and processes for each input document .

var map = function () {
				var output= {firstname:this.name.first, lastname:this.name.last , department:db.department.findOne({_id:this.department}).department}
	    			emit(this._id, output);
				};

 

3) Create Reduce . This is a JavaScript function which accepts two arguments Key and Value .


var reduce = function(key, values) {
				var outs={ firstname:null , lastname:null , department:null}
 				values.forEach(function(v){
					if(outs.firstname ==null){
						outs.firstname = v.firstname
					}
					if(outs.lastname ==null){
						outs.lastname = v.lastname
					}
					if(outs.department ==null){
						outs.department = v.department
					}
					
				});
				return outs;
			};

 

4) MongoDB provides mapReduce commmand for Map Reduce operation .

db.employee.mapReduce(map,reduce,{out: 'emp_dept'});
 

5) ‘emp_dept’ is new resulted collection where result will be stored . The Result will be :

{ "_id" : 1, "value" : { "firstname" : "John", "lastname" : "Backus", "department" : "Manager" } }
{ "_id" : 2, "value" : { "firstname" : "Merry", "lastname" : "Desuja", "department" : "Accountant" } }

For Version Mongo 2.4:-

Go to http://blog.knoldus.com/2014/03/12/easiest-way-to-implement-joins-in-mongodb-2-4/

About these ads

About ayushmishra2005

Ayush is the Sr. Software Consultant @ Knoldus Software LLP. In his 5 years of experience he has become developer with proven experience in architecting and developing web applications. Ayush has a Masters in Computer Application from U.P. Technical University, Ayush is a strong-willed and self-motivated professional who takes deep care in adhering to quality norms within projects. He is capable of managing challenging projects with remarkable deadline sensitivity without compromising code quality. .
This entry was posted in Agile, Java, MongoDB, News, Scala, Web. Bookmark the permalink.

31 Responses to An alternate way to implement JOINs in MongoDB [Update]

  1. Andreas Jung says:

    You know that Map-Reduce and single-thread and can not be considered for production purposes. Single-threaded and blocking is a no-go. EIther use the aggregation framework or perform multiple queries. This is likely much faster than using MR here.

  2. Hey Andreas ,
    Thanks for your suggestion . Map Reduce is also part of MongoDB Aggregation and used to handle complex aggregation tasks . In our case it was batch processing where we need to do analytics
    at defined intervals . I did not find any way to implement JOINs using multiple queries .
    Any pointers would be helpful . Also let me know if I understood your point correctly .

  3. Thomas O'Rourke says:

    Hi Ayshmishra,
    The title of your article is misleading, i.e. “now possible”, since there is nothing new here. Also, you could have just as easily implemented the same function using some code in Ruby, Python, etc. I agree that the map/reduce is faster, but my point is that a lookup is required on each department for employees either way.

    Also, you are using Map to iterate over departments, but only finding one employee per department (findOne). If you really want to find all employees (plural) then you need to use find() instead of findOne(), which will complicate the map/reduce considerably — but it’s possible and a reasonable way to do it in my opinion, that is if you really need to join something. Aggregation uses Map/Reduce internally anyways.

    As far as Adreas’s comment about threading – that’s not correct. You *should* use Map/Reduce in production since it’s way faster than doing it in code, and in general Map/Reduce does not lock the entire database. See http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-concurrency

  4. Nuri says:

    When the “department” entity has no aggregate-able values (just name or some property) the reporting really just aggregates the “employee” entity. What I like to do for these is aggregate the “employee” only, and resolve names from department id in memory or secondary look up. A materialized join may not be necessary unless the product of the join contains aggregate-able items from both entities.

  5. Probably you are right . This is a simple example to show how it can be done in such scenario .
    It can be used in complex scenario .

  6. Steve says:

    Nice clear example, well laid out.

  7. jyotsna says:

    Hi,
    I hve query that in this particuaar exampe, there is a common key to join on i.e. departement in both the collections.
    But if i have the requirement to apply map reduce for the scenario where i have two input files one is multidimensional input data file with time stamps and other is meta data configuration and they dont have anything in common, then how to run map reduce over multiple files?
    > db.config.find();
    { “_id” : ObjectId(“51419da08366ded56c483ac5″), “_dim_id” : 2, “Type” : “categor
    ical”, “gran” : “4″, “value1″ : “B”, “value2″ : “G”, “value3″ : “R”, “value4″ :
    “Y” }
    { “_id” : ObjectId(“51419dc78366ded56c483ac6″), “_dim_id” : 1, “Type” : “Numeric
    “, “gran” : “2″, “value1″ : “0″, “value2″ : “50″ }
    { “_id” : ObjectId(“51419ddb8366ded56c483ac7″), “_dim_id” : 0, “Type” : “Numeric
    “, “gran” : “4″, “value1″ : “0″, “value2″ : “100″ }
    >
    > db.datafile.find();
    { “_id” : ObjectId(“51419f268366ded56c483ac8″), “_TS_id” : “6″, “data” : [ "46",
    "26", "Y" ] }
    { “_id” : ObjectId(“51419f4b8366ded56c483ac9″), “_TS_id” : 7, “data” : [ "90", "
    45", "B" ] }
    { “_id” : ObjectId(“51419f5c8366ded56c483aca”), “_TS_id” : 8, “data” : [ "23", "
    11", "R" ] }
    { “_id” : ObjectId(“51419f768366ded56c483acb”), “_TS_id” : 9, “data” : [ "22", "
    34", "G" ] }
    { “_id” : ObjectId(“51419f9b8366ded56c483acc”), “_TS_id” : 10, “data” : [ "78",
    "45", "B" ] }
    { “_id” : ObjectId(“51419faf8366ded56c483acd”), “_TS_id” : 11, “data” : [ "46",
    "26", "Y" ] }
    { “_id” : ObjectId(“51419fc28366ded56c483ace”), “_TS_id” : 12, “data” : [ "56",
    "33", "R" ] }
    >

    i hve to use map reduce to map data points coming wrt to the config file.

    Thnks in advance for ur guidance.

  8. LP says:

    Let’s suppose that there is a doc in db.employe with department = 3.
    Now suppose that there isn’t the corresponding _id =3 in db.department .
    Doing map-reduce, in this case, causes the rise of an error.
    How avoid the map reduce error?

  9. Luca Polverini says:

    Hi, I’ve got a problem with this kind of map reduce joining.
    Suppose that an employee is recorded as department 3.
    Suppose also that in the db.employee there isn’t department _id = 3.
    With SQL this kind of problem doesn’t affect the query.
    With map reduce this kind of problem give the rise of an error (9014).
    How to solve the problem?

  10. AlexP says:

    First of all, it will not work for sharded environment. Secondly, It is completely discouraged to use db calls from map/reduce functions. You can refer to requirements for map/reduce functions -http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce. Also see discussions about the subject in StackOverflow – http://stackoverflow.com/questions/9618711/accessing-another-collection-in-mongodbs-map-reduce

  11. foo says:

    > You know that Map-Reduce and single-thread and can not be considered for production purposes.

    This is incorrect; as of 2.4, JS operations are no longer limited to a single thread.

    But, you also can’t access db from MR any longer, so this doesn’t work.

  12. Pingback: JOINS en MongoDB | Un poco de Java

  13. Pingback: How do I perform the SQL Join equivalent in PyMongo? Or more specific call a Pymongo collection object in BSON code? | BlogoSfera

  14. Pingback: How do I perform the SQL Join equivalent in PyMongo? Or more specific call a Pymongo collection object in BSON code? - MongoDB Solutions - Developers Q & A

  15. numpu says:

    I try this example but it error please suggest me, thank in the advance.
    Command ‘mapreduce’ failed: exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2) (response: { “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”, “code” : 16722, “ok” : 0.0 })

  16. Just Browsing says:

    I see the same issue.

    > db.employee.mapReduce(map,reduce,{out: ‘emp_dept’});
    Tue Dec 3 23:14:26.706 map reduce failed:{
    “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
    “code” : 16722,
    “ok” : 0
    } at src/mongo/shell/collection.js:970

  17. Pingback: MongoDB, resolviendo problemas de agregación en NoSQL | Apium Technologies

  18. Since mongo 2.4, this functionality was removed. To use this functionality, you have to install Mongo 2.2.3.

  19. diwakar says:

    db.employee.mapReduce(map,reduce,{out: ‘emp_dept’});
    Mon Feb 24 16:46:51.560 map reduce failed:{
    “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
    “code” : 16722,
    “ok” : 0
    } at src/mongo/shell/collection.js:970

    This error is occuring how should i resolve this ?

  20. Pingback: Easiest Way to Implement JOINs in MongoDB 2.4 | Knoldus

  21. Orlando says:

    Hi,
    hmm I was thinking about and I think there is a more easy way:

    db.employee.find().forEach(
    function (newEmployee) {
    newEmployee.department = db.department.findOne( { _id: employee.department } );
    db.emp_dep.insert(newEmployee);
    }
    );

    Regards,

  22. orlandobcrra says:

    Hi,

    db.employee.find().forEach(
    function (newEmployee) {
    newEmployee.department = db.department.findOne( { _id: employee.department } );
    db.emp_dep.insert(newEmployee);
    }
    );

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s