An alternate way to implement JOINs in MongoDB [Update]

I have updated this blog . As last description was creating some misunderstanding and lot of confusion .

In one of my project , there was a requirement to generate report in .csv format from MongoDB without using any reporting tool and any language .
I had to use only MongoDB query .
In that report , I had to use multiple collection . Since there was not a well structured DB schema .
We are using MongoDB , which is non -relational database . Despite of this fact , there were related collections .

As we know MongoDB is a no-SQL database and doesn’t support joins. But I found an alternate to implement JOINs in MongoDb using Map Reduce.

For Versions below Mongo 2.4:-

Let’s go with a small example :
We have two collections Employee and Department . We have to fetch details of all employees with their department .

1) Create Employee and Department collection .

2) Create a Map . This is a JavaScript function which emits key and value pair and processes for each input document .

3) Create Reduce . This is a JavaScript function which accepts two arguments Key and Value .

4) MongoDB provides mapReduce commmand for Map Reduce operation .

5) ’emp_dept’ is new resulted collection where result will be stored . The Result will be :

For Version Mongo 2.4:-

Go to http://blog.knoldus.com/2014/03/12/easiest-way-to-implement-joins-in-mongodb-2-4/

Written by 

Ayush is the Sr. Software Consultant @ Knoldus Software LLP. In his 5 years of experience he has become developer with proven experience in architecting and developing web applications. Ayush has a Masters in Computer Application from U.P. Technical University, Ayush is a strong-willed and self-motivated professional who takes deep care in adhering to quality norms within projects. He is capable of managing challenging projects with remarkable deadline sensitivity without compromising code quality. .

35 thoughts on “An alternate way to implement JOINs in MongoDB [Update]

  1. You know that Map-Reduce and single-thread and can not be considered for production purposes. Single-threaded and blocking is a no-go. EIther use the aggregation framework or perform multiple queries. This is likely much faster than using MR here.

  2. Hey Andreas ,
    Thanks for your suggestion . Map Reduce is also part of MongoDB Aggregation and used to handle complex aggregation tasks . In our case it was batch processing where we need to do analytics
    at defined intervals . I did not find any way to implement JOINs using multiple queries .
    Any pointers would be helpful . Also let me know if I understood your point correctly .

  3. Hi Ayshmishra,
    The title of your article is misleading, i.e. “now possible”, since there is nothing new here. Also, you could have just as easily implemented the same function using some code in Ruby, Python, etc. I agree that the map/reduce is faster, but my point is that a lookup is required on each department for employees either way.

    Also, you are using Map to iterate over departments, but only finding one employee per department (findOne). If you really want to find all employees (plural) then you need to use find() instead of findOne(), which will complicate the map/reduce considerably — but it’s possible and a reasonable way to do it in my opinion, that is if you really need to join something. Aggregation uses Map/Reduce internally anyways.

    As far as Adreas’s comment about threading – that’s not correct. You *should* use Map/Reduce in production since it’s way faster than doing it in code, and in general Map/Reduce does not lock the entire database. See http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-concurrency

  4. When the “department” entity has no aggregate-able values (just name or some property) the reporting really just aggregates the “employee” entity. What I like to do for these is aggregate the “employee” only, and resolve names from department id in memory or secondary look up. A materialized join may not be necessary unless the product of the join contains aggregate-able items from both entities.

  5. Hi,
    I hve query that in this particuaar exampe, there is a common key to join on i.e. departement in both the collections.
    But if i have the requirement to apply map reduce for the scenario where i have two input files one is multidimensional input data file with time stamps and other is meta data configuration and they dont have anything in common, then how to run map reduce over multiple files?
    > db.config.find();
    { “_id” : ObjectId(“51419da08366ded56c483ac5”), “_dim_id” : 2, “Type” : “categor
    ical”, “gran” : “4”, “value1” : “B”, “value2” : “G”, “value3” : “R”, “value4” :
    “Y” }
    { “_id” : ObjectId(“51419dc78366ded56c483ac6”), “_dim_id” : 1, “Type” : “Numeric
    “, “gran” : “2”, “value1” : “0”, “value2” : “50” }
    { “_id” : ObjectId(“51419ddb8366ded56c483ac7”), “_dim_id” : 0, “Type” : “Numeric
    “, “gran” : “4”, “value1” : “0”, “value2” : “100” }
    >
    > db.datafile.find();
    { “_id” : ObjectId(“51419f268366ded56c483ac8”), “_TS_id” : “6”, “data” : [ “46”,
    “26”, “Y” ] }
    { “_id” : ObjectId(“51419f4b8366ded56c483ac9”), “_TS_id” : 7, “data” : [ “90”, ”
    45″, “B” ] }
    { “_id” : ObjectId(“51419f5c8366ded56c483aca”), “_TS_id” : 8, “data” : [ “23”, ”
    11″, “R” ] }
    { “_id” : ObjectId(“51419f768366ded56c483acb”), “_TS_id” : 9, “data” : [ “22”, ”
    34″, “G” ] }
    { “_id” : ObjectId(“51419f9b8366ded56c483acc”), “_TS_id” : 10, “data” : [ “78”,
    “45”, “B” ] }
    { “_id” : ObjectId(“51419faf8366ded56c483acd”), “_TS_id” : 11, “data” : [ “46”,
    “26”, “Y” ] }
    { “_id” : ObjectId(“51419fc28366ded56c483ace”), “_TS_id” : 12, “data” : [ “56”,
    “33”, “R” ] }
    >

    i hve to use map reduce to map data points coming wrt to the config file.

    Thnks in advance for ur guidance.

  6. Let’s suppose that there is a doc in db.employe with department = 3.
    Now suppose that there isn’t the corresponding _id =3 in db.department .
    Doing map-reduce, in this case, causes the rise of an error.
    How avoid the map reduce error?

  7. Hi, I’ve got a problem with this kind of map reduce joining.
    Suppose that an employee is recorded as department 3.
    Suppose also that in the db.employee there isn’t department _id = 3.
    With SQL this kind of problem doesn’t affect the query.
    With map reduce this kind of problem give the rise of an error (9014).
    How to solve the problem?

  8. First of all, it will not work for sharded environment. Secondly, It is completely discouraged to use db calls from map/reduce functions. You can refer to requirements for map/reduce functions -http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce. Also see discussions about the subject in StackOverflow – http://stackoverflow.com/questions/9618711/accessing-another-collection-in-mongodbs-map-reduce

  9. > You know that Map-Reduce and single-thread and can not be considered for production purposes.

    This is incorrect; as of 2.4, JS operations are no longer limited to a single thread.

    But, you also can’t access db from MR any longer, so this doesn’t work.

  10. I try this example but it error please suggest me, thank in the advance.
    Command ‘mapreduce’ failed: exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2) (response: { “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”, “code” : 16722, “ok” : 0.0 })

  11. I see the same issue.

    > db.employee.mapReduce(map,reduce,{out: ’emp_dept’});
    Tue Dec 3 23:14:26.706 map reduce failed:{
    “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
    “code” : 16722,
    “ok” : 0
    } at src/mongo/shell/collection.js:970

  12. db.employee.mapReduce(map,reduce,{out: ’emp_dept’});
    Mon Feb 24 16:46:51.560 map reduce failed:{
    “errmsg” : “exception: ReferenceError: db is not defined near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
    “code” : 16722,
    “ok” : 0
    } at src/mongo/shell/collection.js:970

    This error is occuring how should i resolve this ?

      1. Same error mate!!! Could you suggest how to implement it in 2.4, rather than downgrading the installation? Also, some worthwhile links on “alternative of joins in mongodb” kind of link will be hugely appreciated. Breaking my head on it for long now!! Thanks

  13. Hi,
    hmm I was thinking about and I think there is a more easy way:

    db.employee.find().forEach(
    function (newEmployee) {
    newEmployee.department = db.department.findOne( { _id: employee.department } );
    db.emp_dep.insert(newEmployee);
    }
    );

    Regards,

  14. Hi,

    db.employee.find().forEach(
    function (newEmployee) {
    newEmployee.department = db.department.findOne( { _id: employee.department } );
    db.emp_dep.insert(newEmployee);
    }
    );

  15. 2015-08-25T10:34:53.739+0530 E QUERY Error: map reduce failed:{
    “errmsg” : “exception: ReferenceError: db is not defined\n at _funcs1
    (_funcs1:2:94) near ‘epartment:db.department.findOne({_id:this’ (line 2)”,
    “code” : 16722,
    “ok” : 0

    this error is occurring. can anybody tell me how to resolve it?

Leave a Reply

%d bloggers like this: