I have updated this blog . As last description was creating some misunderstanding and lot of confusion .
In one of my project , there was a requirement to generate report in .csv format from MongoDB without using any reporting tool and any language .
I had to use only MongoDB query .
In that report , I had to use multiple collection . Since there was not a well structured DB schema .
We are using MongoDB , which is non -relational database . Despite of this fact , there were related collections .
As we know MongoDB is a no-SQL database and doesn’t support joins. But I found an alternate to implement JOINs in MongoDb using Map Reduce.
For Versions below Mongo 2.4:-
Let’s go with a small example :
We have two collections Employee and Department . We have to fetch details of all employees with their department .
1) Create Employee and Department collection .
2) Create a Map . This is a JavaScript function which emits key and value pair and processes for each input document .
3) Create Reduce . This is a JavaScript function which accepts two arguments Key and Value .
4) MongoDB provides mapReduce commmand for Map Reduce operation .
5) ’emp_dept’ is new resulted collection where result will be stored . The Result will be :
For Version Mongo 2.4:-
Go to http://blog.knoldus.com/2014/03/12/easiest-way-to-implement-joins-in-mongodb-2-4/
You know that Map-Reduce and single-thread and can not be considered for production purposes. Single-threaded and blocking is a no-go. EIther use the aggregation framework or perform multiple queries. This is likely much faster than using MR here.
Hey Andreas ,
Thanks for your suggestion . Map Reduce is also part of MongoDB Aggregation and used to handle complex aggregation tasks . In our case it was batch processing where we need to do analytics
at defined intervals . I did not find any way to implement JOINs using multiple queries .
Any pointers would be helpful . Also let me know if I understood your point correctly .
Hi Ayshmishra,
The title of your article is misleading, i.e. “now possible”, since there is nothing new here. Also, you could have just as easily implemented the same function using some code in Ruby, Python, etc. I agree that the map/reduce is faster, but my point is that a lookup is required on each department for employees either way.
Also, you are using Map to iterate over departments, but only finding one employee per department (findOne). If you really want to find all employees (plural) then you need to use find() instead of findOne(), which will complicate the map/reduce considerably — but it’s possible and a reasonable way to do it in my opinion, that is if you really need to join something. Aggregation uses Map/Reduce internally anyways.
As far as Adreas’s comment about threading – that’s not correct. You *should* use Map/Reduce in production since it’s way faster than doing it in code, and in general Map/Reduce does not lock the entire database. See http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-concurrency
When the “department” entity has no aggregate-able values (just name or some property) the reporting really just aggregates the “employee” entity. What I like to do for these is aggregate the “employee” only, and resolve names from department id in memory or secondary look up. A materialized join may not be necessary unless the product of the join contains aggregate-able items from both entities.
Probably you are right . This is a simple example to show how it can be done in such scenario .
It can be used in complex scenario .
Nice clear example, well laid out.
Hi,
I hve query that in this particuaar exampe, there is a common key to join on i.e. departement in both the collections.
But if i have the requirement to apply map reduce for the scenario where i have two input files one is multidimensional input data file with time stamps and other is meta data configuration and they dont have anything in common, then how to run map reduce over multiple files?
> db.config.find();
{ “_id” : ObjectId(“51419da08366ded56c483ac5”), “_dim_id” : 2, “Type” : “categor
ical”, “gran” : “4”, “value1” : “B”, “value2” : “G”, “value3” : “R”, “value4” :
“Y” }
{ “_id” : ObjectId(“51419dc78366ded56c483ac6”), “_dim_id” : 1, “Type” : “Numeric
“, “gran” : “2”, “value1” : “0”, “value2” : “50” }
{ “_id” : ObjectId(“51419ddb8366ded56c483ac7”), “_dim_id” : 0, “Type” : “Numeric
“, “gran” : “4”, “value1” : “0”, “value2” : “100” }
>
> db.datafile.find();
{ “_id” : ObjectId(“51419f268366ded56c483ac8”), “_TS_id” : “6”, “data” : [ “46”,
“26”, “Y” ] }
{ “_id” : ObjectId(“51419f4b8366ded56c483ac9”), “_TS_id” : 7, “data” : [ “90”, ”
45″, “B” ] }
{ “_id” : ObjectId(“51419f5c8366ded56c483aca”), “_TS_id” : 8, “data” : [ “23”, ”
11″, “R” ] }
{ “_id” : ObjectId(“51419f768366ded56c483acb”), “_TS_id” : 9, “data” : [ “22”, ”
34″, “G” ] }
{ “_id” : ObjectId(“51419f9b8366ded56c483acc”), “_TS_id” : 10, “data” : [ “78”,
“45”, “B” ] }
{ “_id” : ObjectId(“51419faf8366ded56c483acd”), “_TS_id” : 11, “data” : [ “46”,
“26”, “Y” ] }
{ “_id” : ObjectId(“51419fc28366ded56c483ace”), “_TS_id” : 12, “data” : [ “56”,
“33”, “R” ] }
>
i hve to use map reduce to map data points coming wrt to the config file.
Thnks in advance for ur guidance.
Let’s suppose that there is a doc in db.employe with department = 3.
Now suppose that there isn’t the corresponding _id =3 in db.department .
Doing map-reduce, in this case, causes the rise of an error.
How avoid the map reduce error?
Hi, I’ve got a problem with this kind of map reduce joining.
Suppose that an employee is recorded as department 3.
Suppose also that in the db.employee there isn’t department _id = 3.
With SQL this kind of problem doesn’t affect the query.
With map reduce this kind of problem give the rise of an error (9014).
How to solve the problem?
You can use this :
var obj = db.department.findOne({_id:3});
if(obj && obj.department) { print(obj.department); }
First of all, it will not work for sharded environment. Secondly, It is completely discouraged to use db calls from map/reduce functions. You can refer to requirements for map/reduce functions -http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce. Also see discussions about the subject in StackOverflow – http://stackoverflow.com/questions/9618711/accessing-another-collection-in-mongodbs-map-reduce
> You know that Map-Reduce and single-thread and can not be considered for production purposes.
This is incorrect; as of 2.4, JS operations are no longer limited to a single thread.
But, you also can’t access db from MR any longer, so this doesn’t work.
Now I also have solution for Mongo2.4. Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html
I try this example but it error please suggest me, thank in the advance.
Command ‘mapreduce’ failed: exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2) (response: { “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”, “code” : 16722, “ok” : 0.0 })
Hi Did you get the answer of your question? I did not found any solution. thank you
Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html to avoid this error.
I see the same issue.
> db.employee.mapReduce(map,reduce,{out: ’emp_dept’});
Tue Dec 3 23:14:26.706 map reduce failed:{
“errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
“code” : 16722,
“ok” : 0
} at src/mongo/shell/collection.js:970
Install Mongo 2.2.3 to avoid this exception.
Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html to avoid this error. Now I have solution for Mongo 2.4.
Since mongo 2.4, this functionality was removed. To use this functionality, you have to install Mongo 2.2.3.
db.employee.mapReduce(map,reduce,{out: ’emp_dept’});
Mon Feb 24 16:46:51.560 map reduce failed:{
“errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
“code” : 16722,
“ok” : 0
} at src/mongo/shell/collection.js:970
This error is occuring how should i resolve this ?
Install Mongo 2.2.3 to avoid this exception. Since mongo 2.4, this functionality was removed. To use this functionality, you have to install Mongo 2.2.3.
Same error mate!!! Could you suggest how to implement it in 2.4, rather than downgrading the installation? Also, some worthwhile links on “alternative of joins in mongodb” kind of link will be hugely appreciated. Breaking my head on it for long now!! Thanks
Please refer my another blog http://www.dzone.com/links/r/easiest_way_to_implement_joins_in_mongodb_24.html to avoid this error. Now I have solution for Mongo 2.4.
Hi,
hmm I was thinking about and I think there is a more easy way:
db.employee.find().forEach(
function (newEmployee) {
newEmployee.department = db.department.findOne( { _id: employee.department } );
db.emp_dep.insert(newEmployee);
}
);
Regards,
Hi,
db.employee.find().forEach(
function (newEmployee) {
newEmployee.department = db.department.findOne( { _id: employee.department } );
db.emp_dep.insert(newEmployee);
}
);
Reblogged this on Play!ng with Scala.
Reblogged this on malti yadav.
2015-08-25T10:34:53.739+0530 E QUERY Error: map reduce failed:{
“errmsg” : “exception: ReferenceError: db is not defined\n at _funcs1
(_funcs1:2:94) near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
“code” : 16722,
“ok” : 0
this error is occurring. can anybody tell me how to resolve it?