I have updated this blog . As last description was creating some misunderstanding and lot of confusion .
In one of my project , there was a requirement to generate report in .csv format from MongoDB without using any reporting tool and any language .
I had to use only MongoDB query .
In that report , I had to use multiple collection . Since there was not a well structured DB schema .
We are using MongoDB , which is non -relational database . Despite of this fact , there were related collections .
As we know MongoDB is a no-SQL database and doesn’t support joins. But I found an alternate to implement JOINs in MongoDb using Map Reduce.
Let’s go with a small example :
We have two collections Employee and Department . We have to fetch details of all employees with their department .
1) Create Employee and Department collection .
db.createCollection("employee");
db.employee.insert(
{
_id: 1,
name: { first: 'John', last: 'Backus' },
city: 'New York',
department:1
}
);
db.employee.insert(
{
_id: 2,
name: { first: 'Merry', last: 'Desuja' },
city: 'London',
department:2
}
);
db.createCollection("department");
db.department.insert(
{
_id: 1,
department: 'Manager'
}
);
db.department.insert(
{
_id: 2,
department: 'Accountant'
}
);
2) Create a Map . This is a JavaScript function which emits key and value pair and processes for each input document .
var map = function () {
var output= {firstname:this.name.first, lastname:this.name.last , department:db.department.findOne({_id:this.department}).department}
emit(this._id, output);
};
3) Create Reduce . This is a JavaScript function which accepts two arguments Key and Value .
var reduce = function(key, values) {
var outs={ firstname:null , lastname:null , department:null}
values.forEach(function(v){
if(outs.firstname ==null){
outs.firstname = v.firstname
}
if(outs.lastname ==null){
outs.lastname = v.lastname
}
if(outs.department ==null){
outs.department = v.department
}
});
return outs;
};
4) MongoDB provides mapReduce commmand for Map Reduce operation .
db.employee.mapReduce(map,reduce,{out: 'emp_dept'});
5) ‘emp_dept’ is new resulted collection where result will be stored . The Result will be :
{ "_id" : 1, "value" : { "firstname" : "John", "lastname" : "Backus", "department" : "Manager" } }
{ "_id" : 2, "value" : { "firstname" : "Merry", "lastname" : "Desuja", "department" : "Accountant" } }






You know that Map-Reduce and single-thread and can not be considered for production purposes. Single-threaded and blocking is a no-go. EIther use the aggregation framework or perform multiple queries. This is likely much faster than using MR here.
Hey Andreas ,
Thanks for your suggestion . Map Reduce is also part of MongoDB Aggregation and used to handle complex aggregation tasks . In our case it was batch processing where we need to do analytics
at defined intervals . I did not find any way to implement JOINs using multiple queries .
Any pointers would be helpful . Also let me know if I understood your point correctly .
Hi Ayshmishra,
The title of your article is misleading, i.e. “now possible”, since there is nothing new here. Also, you could have just as easily implemented the same function using some code in Ruby, Python, etc. I agree that the map/reduce is faster, but my point is that a lookup is required on each department for employees either way.
Also, you are using Map to iterate over departments, but only finding one employee per department (findOne). If you really want to find all employees (plural) then you need to use find() instead of findOne(), which will complicate the map/reduce considerably — but it’s possible and a reasonable way to do it in my opinion, that is if you really need to join something. Aggregation uses Map/Reduce internally anyways.
As far as Adreas’s comment about threading – that’s not correct. You *should* use Map/Reduce in production since it’s way faster than doing it in code, and in general Map/Reduce does not lock the entire database. See http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-concurrency
When the “department” entity has no aggregate-able values (just name or some property) the reporting really just aggregates the “employee” entity. What I like to do for these is aggregate the “employee” only, and resolve names from department id in memory or secondary look up. A materialized join may not be necessary unless the product of the join contains aggregate-able items from both entities.
Probably you are right . This is a simple example to show how it can be done in such scenario .
It can be used in complex scenario .
Nice clear example, well laid out.
Hi,
I hve query that in this particuaar exampe, there is a common key to join on i.e. departement in both the collections.
But if i have the requirement to apply map reduce for the scenario where i have two input files one is multidimensional input data file with time stamps and other is meta data configuration and they dont have anything in common, then how to run map reduce over multiple files?
> db.config.find();
{ “_id” : ObjectId(“51419da08366ded56c483ac5″), “_dim_id” : 2, “Type” : “categor
ical”, “gran” : “4″, “value1″ : “B”, “value2″ : “G”, “value3″ : “R”, “value4″ :
“Y” }
{ “_id” : ObjectId(“51419dc78366ded56c483ac6″), “_dim_id” : 1, “Type” : “Numeric
“, “gran” : “2″, “value1″ : “0″, “value2″ : “50″ }
{ “_id” : ObjectId(“51419ddb8366ded56c483ac7″), “_dim_id” : 0, “Type” : “Numeric
“, “gran” : “4″, “value1″ : “0″, “value2″ : “100″ }
>
> db.datafile.find();
{ “_id” : ObjectId(“51419f268366ded56c483ac8″), “_TS_id” : “6″, “data” : [ "46",
"26", "Y" ] }
{ “_id” : ObjectId(“51419f4b8366ded56c483ac9″), “_TS_id” : 7, “data” : [ "90", "
45", "B" ] }
{ “_id” : ObjectId(“51419f5c8366ded56c483aca”), “_TS_id” : 8, “data” : [ "23", "
11", "R" ] }
{ “_id” : ObjectId(“51419f768366ded56c483acb”), “_TS_id” : 9, “data” : [ "22", "
34", "G" ] }
{ “_id” : ObjectId(“51419f9b8366ded56c483acc”), “_TS_id” : 10, “data” : [ "78",
"45", "B" ] }
{ “_id” : ObjectId(“51419faf8366ded56c483acd”), “_TS_id” : 11, “data” : [ "46",
"26", "Y" ] }
{ “_id” : ObjectId(“51419fc28366ded56c483ace”), “_TS_id” : 12, “data” : [ "56",
"33", "R" ] }
>
i hve to use map reduce to map data points coming wrt to the config file.
Thnks in advance for ur guidance.
Let’s suppose that there is a doc in db.employe with department = 3.
Now suppose that there isn’t the corresponding _id =3 in db.department .
Doing map-reduce, in this case, causes the rise of an error.
How avoid the map reduce error?
Hi, I’ve got a problem with this kind of map reduce joining.
Suppose that an employee is recorded as department 3.
Suppose also that in the db.employee there isn’t department _id = 3.
With SQL this kind of problem doesn’t affect the query.
With map reduce this kind of problem give the rise of an error (9014).
How to solve the problem?
You can use this :
var obj = db.department.findOne({_id:3});
if(obj && obj.department) { print(obj.department); }