Why Use MongoDB in Machine learning? And how to use MongoDB in Python?

Reading Time: 5 minutes

In this blog, we will learn why to use MongoDB in Machine Learning. And how we can use MongoDB in Python using Pymongo.

MongoDB is a document-oriented NoSQL database used for high-volume data storage. Instead of using tables and rows as in the traditional relational databases. MongoDB makes use of collections and documents. It is an open-source, cross-platform, document-oriented database written in C++.

Installing MongoDB

The Architecture of a MongoDB Database :

The information in MongoDB is stored in documents. Here, a document is analogous to rows in structured databases.

  • Each document is a collection of key-value pairs.
  • Each key-value pair is called a field.
  • Every document has an _id  field, which uniquely identifies the documents.
  • A document may also contain nested documents.
  • Documents may have a varying number of fields (they can be blank as well).
  • These documents are stored in a collection. A collection is literally a collection of documents in MongoDB. This is analogous to tables in traditional databases.

Why to use MongoDB In Machine Learning

MongoDB is one of the best databases for machine learning for several reasons.

1. Flexible Data Model

The first reason is that MongoDB stores JSON documents and has a flexible schema. Unlike a relational database where you have to define a schema and tables with column definitions. MongoDB allows you to load data directly without any upfront schema design. This means that you can load data from any new source and get to work immediately.

2. Powerful Query Language

Once the data is loaded, MongoDB provides you with a powerful query language. And secondary indexes to give you fast access to very specific values that you would want to use. You have the option to filter sort and aggregate the data, selecting and transforming the fields you need to use. This is a necessary step to prepare the data used for machine learning. This level of query sophistication is not available in most NoSQL datastores.

3. Store and Retrieve Trained Models as JSON Documents

MongoDB is the perfect place to store, share, and retrieve the trained models. It is possible to not only store our models but keep a history of our models in the database. Allowing us to restore a trained model from a previous version if we chose to do so. More importantly, sharing trained models reduce the time it takes to use those models for machine learning predictions. If I want to use an existing model for a prediction or for reinforcement learning. I simply query MongoDB for the model. And load it saving all the time it took to originally train the model.

4. MongoDB Atlas offers Database as a Service Across all Modern Cloud Providers

We can deploy applications across multiple cloud providers. Without the added operational complexity of managing data replication and migration across clouds. For instance, you can run your online store in Azure and have the data replicated to GCP in real-time for machine learning.

CRUD commands of MongoDB

Now, Let’s have a look at the basic CRUD commands of MongoDB –

  • To show all the databases we have : show dbs
  • Creating a database : use database_name
  • To create a collection : db.createCollection(“Name_of_collection”)
  • Inserting one document at a time: db.collection_name.insertOne({})
  • To show the collections : show collections
  • Inserting many documents at a time: db.collection_name.insertMany([ {} , {} ,{} , …. ])
  • View the documents : db.collection_name.find()
  • Update one document at a time : db.collection_name.updateOne ( <filter> , <update> )
  • Update many documents at a time : db.collection_name.updateMany( <filter> , <update> )
  • Delete the documents: db.collection_name.deleteMany( deletion criteria )

MongoDB in Python using Pymongo

PyMongo is a Python library that enables us to connect with MongoDB. It allows us to perform basic operations on the MongoDB database.

Installing Pymongo

  • pip install pymongo  – in anaconda terminal
  • !pip install pymongo – in Jupyter

CRUD commands of Pymongo

Now, Let’s have a look at the basic CRUD commands of Pymongo

  • For making a connection : client = pymongo.MongoClient(‘protocol://ip_address:port/’)
client = pymongo.MongoClient('mongodb://127.0.0.1:27017/')
# mongodb://127.0.0.1:27017/
# protocol://ip_address:port/

# Create a Database
mydb = client["emp"]

# Create a Collection(In SQL terms - Table)
info = mydb.employeeinformation
  • insert one document at a time: collection_name.insertOne()
# Create a record (json format)
record = {
    "firstname":"John",
    "lastname":"Doe",
    "department":"Analytics"
    }

# Insert a single record in collection
info.insert_one(record)
  • insert many documents at a time: collection_name.insertMany()
# Insert multiple records in collection
records = [
    {
        "firstname":"Hope",
        "lastname":"Marshall",
        "department":"Development"
    },{
        "firstname":"Hayley",
        "lastname":"Johnson",
        "department":"Analytics"
    },{
        "firstname":"Klaus",
        "lastname":"Mikalson",
        "department":"R&D"
    }
]
info.insert_many(records)

records1 = [
    {
        "firstname": "Jacob",
        "lastname":"Smith",
        "department":"Development",
        "age":32
    },{
        "firstname":"Hasel",
        "lastname":"Shah",
        "department":"Analytics",
        "age":29
    },{
        "firstname":"Elijah",
        "lastname":"Mikalson",
        "department":"R&D",
        "age":34
    }
]
info.insert_many(records1)
  • view the documents : collection_name.find()
# Simple way to query JSON Document
# View first record
info.find_one()
  • update one document at a time : collection_name.updateOne ()
# Update a single record
info.update_one(
{'firstname':'John'},
{'$set':{'age':30},
"$currentDate":{"lastModified":True}}
)
  • update many documents at a time : collection_name.updateMany()
# Update multiple records
info.update_many({'department':'Analytics'},
                {'$set':{'skills':'Statistics'},
                "$currentDate":{"lastModified":True}})
  • replace entries of one document at a time : collection_name.replace_one({}, {})
info.replace_one({'firstname':'John'},
                {'firstname':'John',
                'lastname':'Stalin',
                 'qualifications':'BE',
                 'skills':['Statistics','Machine Learning','Data Science'],
                'department':'Data Science'})
  • delete the documents: collection_name.deleteMany()
Myrecord = info.delete_many({'age':{'$lte':29}})

for record in info.find():
    print(record)

Conculsion

In this blog, we have covered the Why we should use MongoDB in Machine Learning. And How can we use MongoDB in Python using the library Pymongo. Along with the installation and CRUD commands.

Happy Learning !! 🙂

Written by 

Tanishka Garg is a Software Consultant working in AI/ML domain.