In this blog, we will learn why to use MongoDB in Machine Learning. And how we can use MongoDB in Python using Pymongo.
MongoDB is a document-oriented NoSQL database used for high-volume data storage. Instead of using tables and rows as in the traditional relational databases. MongoDB makes use of collections and documents. It is an open-source, cross-platform, document-oriented database written in C++.
Installing MongoDB
- For installing on your local computer, refer to the following link – https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
- For connecting with MongoDB Atlas, refer to the following youtube video link – https://www.youtube.com/watch?v=esKNjzDZItQ
The Architecture of a MongoDB Database :
The information in MongoDB is stored in documents. Here, a document is analogous to rows in structured databases.
- Each document is a collection of key-value pairs.
- Each key-value pair is called a field.
- Every document has an _id field, which uniquely identifies the documents.
- A document may also contain nested documents.
- Documents may have a varying number of fields (they can be blank as well).
- These documents are stored in a collection. A collection is literally a collection of documents in MongoDB. This is analogous to tables in traditional databases.
Why to use MongoDB In Machine Learning
MongoDB is one of the best databases for machine learning for several reasons.
1. Flexible Data Model
The first reason is that MongoDB stores JSON documents and has a flexible schema. Unlike a relational database where you have to define a schema and tables with column definitions. MongoDB allows you to load data directly without any upfront schema design. This means that you can load data from any new source and get to work immediately.
2. Powerful Query Language
Once the data is loaded, MongoDB provides you with a powerful query language. And secondary indexes to give you fast access to very specific values that you would want to use. You have the option to filter sort and aggregate the data, selecting and transforming the fields you need to use. This is a necessary step to prepare the data used for machine learning. This level of query sophistication is not available in most NoSQL datastores.
3. Store and Retrieve Trained Models as JSON Documents
MongoDB is the perfect place to store, share, and retrieve the trained models. It is possible to not only store our models but keep a history of our models in the database. Allowing us to restore a trained model from a previous version if we chose to do so. More importantly, sharing trained models reduce the time it takes to use those models for machine learning predictions. If I want to use an existing model for a prediction or for reinforcement learning. I simply query MongoDB for the model. And load it saving all the time it took to originally train the model.
4. MongoDB Atlas offers Database as a Service Across all Modern Cloud Providers
We can deploy applications across multiple cloud providers. Without the added operational complexity of managing data replication and migration across clouds. For instance, you can run your online store in Azure and have the data replicated to GCP in real-time for machine learning.
CRUD commands of MongoDB
Now, Let’s have a look at the basic CRUD commands of MongoDB –
- To show all the databases we have : show dbs

- Creating a database : use database_name



- To create a collection : db.createCollection(“Name_of_collection”)



- Inserting one document at a time: db.collection_name.insertOne({})



- To show the collections : show collections



- Inserting many documents at a time: db.collection_name.insertMany([ {} , {} ,{} , …. ])



- View the documents : db.collection_name.find()



- Update one document at a time : db.collection_name.updateOne ( <filter> , <update> )



- Update many documents at a time : db.collection_name.updateMany( <filter> , <update> )
- Delete the documents: db.collection_name.deleteMany( deletion criteria )
MongoDB in Python using Pymongo
PyMongo is a Python library that enables us to connect with MongoDB. It allows us to perform basic operations on the MongoDB database.
Installing Pymongo
- pip install pymongo – in anaconda terminal
- !pip install pymongo – in Jupyter
CRUD commands of Pymongo
Now, Let’s have a look at the basic CRUD commands of Pymongo
- For making a connection : client = pymongo.MongoClient(‘protocol://ip_address:port/’)
client = pymongo.MongoClient('mongodb://127.0.0.1:27017/') # mongodb://127.0.0.1:27017/ # protocol://ip_address:port/ # Create a Database mydb = client["emp"] # Create a Collection(In SQL terms - Table) info = mydb.employeeinformation
- insert one document at a time: collection_name.insertOne()
# Create a record (json format) record = { "firstname":"John", "lastname":"Doe", "department":"Analytics" } # Insert a single record in collection info.insert_one(record)
- insert many documents at a time: collection_name.insertMany()
# Insert multiple records in collection records = [ { "firstname":"Hope", "lastname":"Marshall", "department":"Development" },{ "firstname":"Hayley", "lastname":"Johnson", "department":"Analytics" },{ "firstname":"Klaus", "lastname":"Mikalson", "department":"R&D" } ] info.insert_many(records) records1 = [ { "firstname": "Jacob", "lastname":"Smith", "department":"Development", "age":32 },{ "firstname":"Hasel", "lastname":"Shah", "department":"Analytics", "age":29 },{ "firstname":"Elijah", "lastname":"Mikalson", "department":"R&D", "age":34 } ] info.insert_many(records1)
- view the documents : collection_name.find()
# Simple way to query JSON Document # View first record info.find_one()



- update one document at a time : collection_name.updateOne ()
# Update a single record info.update_one( {'firstname':'John'}, {'$set':{'age':30}, "$currentDate":{"lastModified":True}} )



- update many documents at a time : collection_name.updateMany()
# Update multiple records info.update_many({'department':'Analytics'}, {'$set':{'skills':'Statistics'}, "$currentDate":{"lastModified":True}})



- replace entries of one document at a time : collection_name.replace_one({}, {})
info.replace_one({'firstname':'John'}, {'firstname':'John', 'lastname':'Stalin', 'qualifications':'BE', 'skills':['Statistics','Machine Learning','Data Science'], 'department':'Data Science'})



- delete the documents: collection_name.deleteMany()
Myrecord = info.delete_many({'age':{'$lte':29}}) for record in info.find(): print(record)



Conculsion
In this blog, we have covered the Why we should use MongoDB in Machine Learning. And How can we use MongoDB in Python using the library Pymongo. Along with the installation and CRUD commands.
Happy Learning !! 🙂