An alternate way to implement JOINs in MongoDB [Update]

February 3, 2013March 12, 2014 Ayush MishraAgile, Java, MongoDB, News, Studio-Scala, Web35 Comments

Table of contents

Reading Time: 2 minutes

I have updated this blog . As last description was creating some misunderstanding and lot of confusion .

In one of my project , there was a requirement to generate report in .csv format from MongoDB without using any reporting tool and any language .
I had to use only MongoDB query .
In that report , I had to use multiple collection . Since there was not a well structured DB schema .
We are using MongoDB , which is non -relational database . Despite of this fact , there were related collections .

As we know MongoDB is a no-SQL database and doesn’t support joins. But I found an alternate to implement JOINs in MongoDb using Map Reduce.

For Versions below Mongo 2.4:-

Let’s go with a small example :
We have two collections Employee and Department . We have to fetch details of all employees with their department .

1) Create Employee and Department collection .

db.createCollection("employee");
db.employee.insert(
   {
      _id: 1,
      name: { first: 'John', last: 'Backus' },
      city: 'New York',
      department:1
   }
);

db.employee.insert(
    {
      _id: 2,
      name: { first: 'Merry', last: 'Desuja' },
      city: 'London',
      department:2
   }
);
db.createCollection("department");
db.department.insert(
   {
      _id: 1,
      department: 'Manager'
   }
);

db.department.insert(
    {
      _id: 2,
      department: 'Accountant'
   }
);

2) Create a Map . This is a JavaScript function which emits key and value pair and processes for each input document .

var map = function () {
				var output= {firstname:this.name.first, lastname:this.name.last , department:db.department.findOne({_id:this.department}).department}
	    			emit(this._id, output);
				};

3) Create Reduce . This is a JavaScript function which accepts two arguments Key and Value .


var reduce = function(key, values) {
				var outs={ firstname:null , lastname:null , department:null}
 				values.forEach(function(v){
					if(outs.firstname ==null){
						outs.firstname = v.firstname
					}
					if(outs.lastname ==null){
						outs.lastname = v.lastname
					}
					if(outs.department ==null){
						outs.department = v.department
					}
					
				});
				return outs;
			};

4) MongoDB provides mapReduce commmand for Map Reduce operation .

db.employee.mapReduce(map,reduce,{out: 'emp_dept'});

5) ’emp_dept’ is new resulted collection where result will be stored . The Result will be :

{ "_id" : 1, "value" : { "firstname" : "John", "lastname" : "Backus", "department" : "Manager" } }
{ "_id" : 2, "value" : { "firstname" : "Merry", "lastname" : "Desuja", "department" : "Accountant" } }

For Version Mongo 2.4:-

Go to http://blog.knoldus.com/2014/03/12/easiest-way-to-implement-joins-in-mongodb-2-4/

Written by Ayush Mishra

Ayush is the Sr. Lead Consultant @ Knoldus Software LLP. In his 10 years of experience he has become a developer with proven experience in architecting and developing web applications. Ayush has a Masters in Computer Application from U.P. Technical University, Ayush is a strong-willed and self-motivated professional who takes deep care in adhering to quality norms within projects. He is capable of managing challenging projects with remarkable deadline sensitivity without compromising code quality.

35 thoughts on “An alternate way to implement JOINs in MongoDB [Update]2 min read”

Andreas Jung says:

February 3, 2013 at 5:23 PM

You know that Map-Reduce and single-thread and can not be considered for production purposes. Single-threaded and blocking is a no-go. EIther use the aggregation framework or perform multiple queries. This is likely much faster than using MR here.
ayushmishra2005 says:

February 3, 2013 at 5:41 PM

Hey Andreas ,
Thanks for your suggestion . Map Reduce is also part of MongoDB Aggregation and used to handle complex aggregation tasks . In our case it was batch processing where we need to do analytics
at defined intervals . I did not find any way to implement JOINs using multiple queries .
Any pointers would be helpful . Also let me know if I understood your point correctly .
Thomas O'Rourke says:

February 5, 2013 at 7:42 AM

Hi Ayshmishra,
The title of your article is misleading, i.e. “now possible”, since there is nothing new here. Also, you could have just as easily implemented the same function using some code in Ruby, Python, etc. I agree that the map/reduce is faster, but my point is that a lookup is required on each department for employees either way.

Also, you are using Map to iterate over departments, but only finding one employee per department (findOne). If you really want to find all employees (plural) then you need to use find() instead of findOne(), which will complicate the map/reduce considerably — but it’s possible and a reasonable way to do it in my opinion, that is if you really need to join something. Aggregation uses Map/Reduce internally anyways.

As far as Adreas’s comment about threading – that’s not correct. You *should* use Map/Reduce in production since it’s way faster than doing it in code, and in general Map/Reduce does not lock the entire database. See http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-concurrency
Nuri says:

February 11, 2013 at 5:43 PM

When the “department” entity has no aggregate-able values (just name or some property) the reporting really just aggregates the “employee” entity. What I like to do for these is aggregate the “employee” only, and resolve names from department id in memory or secondary look up. A materialized join may not be necessary unless the product of the join contains aggregate-able items from both entities.
ayushmishra2005 says:

February 12, 2013 at 4:26 AM

Probably you are right . This is a simple example to show how it can be done in such scenario .
It can be used in complex scenario .
Steve says:

March 12, 2013 at 7:56 PM

Nice clear example, well laid out.
jyotsna says:

March 15, 2013 at 10:15 AM

Hi,
I hve query that in this particuaar exampe, there is a common key to join on i.e. departement in both the collections.
But if i have the requirement to apply map reduce for the scenario where i have two input files one is multidimensional input data file with time stamps and other is meta data configuration and they dont have anything in common, then how to run map reduce over multiple files?
> db.config.find();
{ “_id” : ObjectId(“51419da08366ded56c483ac5”), “_dim_id” : 2, “Type” : “categor
ical”, “gran” : “4”, “value1” : “B”, “value2” : “G”, “value3” : “R”, “value4” :
“Y” }
{ “_id” : ObjectId(“51419dc78366ded56c483ac6”), “_dim_id” : 1, “Type” : “Numeric
“, “gran” : “2”, “value1” : “0”, “value2” : “50” }
{ “_id” : ObjectId(“51419ddb8366ded56c483ac7”), “_dim_id” : 0, “Type” : “Numeric
“, “gran” : “4”, “value1” : “0”, “value2” : “100” }
>
> db.datafile.find();
{ “_id” : ObjectId(“51419f268366ded56c483ac8”), “_TS_id” : “6”, “data” : [ “46”,
“26”, “Y” ] }
{ “_id” : ObjectId(“51419f4b8366ded56c483ac9”), “_TS_id” : 7, “data” : [ “90”, ”
45″, “B” ] }
{ “_id” : ObjectId(“51419f5c8366ded56c483aca”), “_TS_id” : 8, “data” : [ “23”, ”
11″, “R” ] }
{ “_id” : ObjectId(“51419f768366ded56c483acb”), “_TS_id” : 9, “data” : [ “22”, ”
34″, “G” ] }
{ “_id” : ObjectId(“51419f9b8366ded56c483acc”), “_TS_id” : 10, “data” : [ “78”,
“45”, “B” ] }
{ “_id” : ObjectId(“51419faf8366ded56c483acd”), “_TS_id” : 11, “data” : [ “46”,
“26”, “Y” ] }
{ “_id” : ObjectId(“51419fc28366ded56c483ace”), “_TS_id” : 12, “data” : [ “56”,
“33”, “R” ] }
>

i hve to use map reduce to map data points coming wrt to the config file.

Thnks in advance for ur guidance.
LP says:

May 12, 2013 at 9:58 AM

Let’s suppose that there is a doc in db.employe with department = 3.
Now suppose that there isn’t the corresponding _id =3 in db.department .
Doing map-reduce, in this case, causes the rise of an error.
How avoid the map reduce error?
Luca Polverini says:

May 12, 2013 at 10:15 AM

Hi, I’ve got a problem with this kind of map reduce joining.
Suppose that an employee is recorded as department 3.
Suppose also that in the db.employee there isn’t department _id = 3.
With SQL this kind of problem doesn’t affect the query.
With map reduce this kind of problem give the rise of an error (9014).
How to solve the problem?
1. ayushmishra2005 says:
  
  May 15, 2013 at 5:20 PM
  
  You can use this :
  var obj = db.department.findOne({_id:3});
  if(obj && obj.department) { print(obj.department); }
AlexP says:

July 14, 2013 at 3:06 PM

First of all, it will not work for sharded environment. Secondly, It is completely discouraged to use db calls from map/reduce functions. You can refer to requirements for map/reduce functions -http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce. Also see discussions about the subject in StackOverflow – http://stackoverflow.com/questions/9618711/accessing-another-collection-in-mongodbs-map-reduce
foo says:

July 25, 2013 at 9:24 PM

> You know that Map-Reduce and single-thread and can not be considered for production purposes.

This is incorrect; as of 2.4, JS operations are no longer limited to a single thread.

But, you also can’t access db from MR any longer, so this doesn’t work.
1. ayushmishra2005 says:
  
  March 12, 2014 at 7:16 PM
  
  Now I also have solution for Mongo2.4. Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html
Pingback: JOINS en MongoDB | Un poco de Java
Pingback: How do I perform the SQL Join equivalent in PyMongo? Or more specific call a Pymongo collection object in BSON code? | BlogoSfera
Pingback: How do I perform the SQL Join equivalent in PyMongo? Or more specific call a Pymongo collection object in BSON code? - MongoDB Solutions - Developers Q & A
numpu says:

November 19, 2013 at 10:16 PM

I try this example but it error please suggest me, thank in the advance.
Command ‘mapreduce’ failed: exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2) (response: { “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”, “code” : 16722, “ok” : 0.0 })
1. prakash says:
  
  December 11, 2013 at 8:31 PM
  
  Hi Did you get the answer of your question? I did not found any solution. thank you
  1. ayushmishra2005 says:
    
    March 12, 2014 at 7:17 PM
    
    Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html to avoid this error.
Just Browsing says:

December 4, 2013 at 12:48 PM

I see the same issue.

> db.employee.mapReduce(map,reduce,{out: ’emp_dept’});
Tue Dec 3 23:14:26.706 map reduce failed:{
“errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
“code” : 16722,
“ok” : 0
} at src/mongo/shell/collection.js:970
1. ayushmishra2005 says:
  
  January 5, 2014 at 10:23 PM
  
  Install Mongo 2.2.3 to avoid this exception.
2. ayushmishra2005 says:
  
  March 12, 2014 at 7:19 PM
  
  Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html to avoid this error. Now I have solution for Mongo 2.4.
Pingback: MongoDB, resolviendo problemas de agregación en NoSQL | Apium Technologies
ayushmishra2005 says:

January 5, 2014 at 10:21 PM

Since mongo 2.4, this functionality was removed. To use this functionality, you have to install Mongo 2.2.3.
diwakar says:

February 24, 2014 at 4:55 PM

db.employee.mapReduce(map,reduce,{out: ’emp_dept’});
Mon Feb 24 16:46:51.560 map reduce failed:{
“errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
“code” : 16722,
“ok” : 0
} at src/mongo/shell/collection.js:970

This error is occuring how should i resolve this ?
1. ayushmishra2005 says:
  
  February 24, 2014 at 6:24 PM
  
  Install Mongo 2.2.3 to avoid this exception. Since mongo 2.4, this functionality was removed. To use this functionality, you have to install Mongo 2.2.3.
  1. Rajeev says:
    
    March 10, 2014 at 7:20 PM
    
    Same error mate!!! Could you suggest how to implement it in 2.4, rather than downgrading the installation? Also, some worthwhile links on “alternative of joins in mongodb” kind of link will be hugely appreciated. Breaking my head on it for long now!! Thanks
2. ayushmishra2005 says:
  
  March 12, 2014 at 7:18 PM
  
  Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html to avoid this error. Now I have solution for Mongo 2.4.
Pingback: Easiest Way to Implement JOINs in MongoDB 2.4 | Knoldus
Orlando says:

March 30, 2014 at 6:59 AM

Hi,
hmm I was thinking about and I think there is a more easy way:

db.employee.find().forEach(
function (newEmployee) {
newEmployee.department = db.department.findOne( { _id: employee.department } );
db.emp_dep.insert(newEmployee);
}
);

Regards,
orlandobcrra says:

March 30, 2014 at 7:00 AM

Hi,

db.employee.find().forEach(
function (newEmployee) {
newEmployee.department = db.department.findOne( { _id: employee.department } );
db.emp_dep.insert(newEmployee);
}
);
Pingback: Jointure de table avec MongoDB | Giwi
Anand Kumar Singh says:

April 17, 2015 at 11:01 PM

Reblogged this on Play!ng with Scala.
Malti Yadav says:

April 23, 2015 at 3:59 PM

Reblogged this on malti yadav.
sangamesh says:

August 25, 2015 at 10:47 AM

2015-08-25T10:34:53.739+0530 E QUERY Error: map reduce failed:{
“errmsg” : “exception: ReferenceError: db is not defined\n at _funcs1
(_funcs1:2:94) near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
“code” : 16722,
“ok” : 0

this error is occurring. can anybody tell me how to resolve it?