Advanced MongoDB Operations: Aggregation, Security, and Replication
Advanced MongoDB Operations
Aggregation Pipeline
Aggregation in MongoDB is used for data processing calculations, similar to SQL functions like SUM() and AVG().
Syntax:
db.collection.aggregate([{pipeline:{expression}}])
Pipelines
In Unix and Linux, pipelines typically pass the output of one command as input to the next command:
ps ajx | grep mongo
In MongoDB, pipelines serve a similar purpose, processing documents sequentially through each stage.
Common pipeline operators:
$group: Groups documents in the collection for statistical results$match: Filters data to output only documents that match specified conditions$project: Modifies the structure of input documents, including renaming, adding, or deleting fields, and creating computed results$sort: Sorts input documents before output$limit: Restricts the number of documents returned by the aggregation pipeline$skip: Skips a specified number of documents and returns the remaining ones$unwind: Destructures an array field from the input document to output a document for each element
Expressions
Expressions process input documents and produce output:
Syntax:
expression:'$field_name'
Common expressions:
$sum: Calculates the sum.$sum:1acts as a counter$avg: Calculates the average value$min: Retrieves the minimum value$max: Retrieves the maximum value$push: Inserts values into an array in the resulting document$first: Retrieves the first document data based on the sort order of the source documents$last: Retrieves the last document data based on the sort order of the source documents
$group Operator
Groups documents in a collection for statistical results. The _id field specifies the grouping criteria, using the format '$field' for a specific field.
Example 1: Count total number of male and female students
db.students.aggregate([
{$group:
{
_id:'$gender',
count:{$sum:1}
}
}
])
Group by null
Groups all documents in the collection into a single group.
Example 2: Find total number of students and average age
db.students.aggregate([
{$group:
{
_id:null,
totalCount:{$sum:1},
avgAge:{$avg:'$age'}
}
}
])
Data Pivoting
Example 3: Count students by gender and collect their names
db.students.aggregate([
{$group:
{
_id:'$gender',
names:{$push:'$name'}
}
}
])
Using $$ROOT includes the entire document in the result array:
db.students.aggregate([
{$group:
{
_id:'$gender',
studentData:{$push:'$$ROOT'}
}
}
])
$match Operator
Filters documents to output only those that match specified conditions.
Example 1: Find students older than 20
db.students.aggregate([
{$match:{age:{$gt:20}}}
])
Example 2: Count male and female students older than 20
db.students.aggregate([
{$match:{age:{$gt:20}}},
{$group:{_id:'$gender',count:{$sum:1}}}
])
$project Operator
Modifies the structure of input documents, including renaming, adding, deleting fields, and creating computed results.
Example 1: Display student names and ages
db.students.aggregate([
{$project:{_id:0,name:1,age:1}}
])
Example 2: Count students by gender and display only the count
db.students.aggregate([
{$group:{_id:'$gender',totalCount:{$sum:1}}},
{$project:{_id:0,totalCount:1}}
])
$sort Operator
Sorts input documents before output.
Example 1: Display student information sorted by age in ascending order
db.students.aggregate([{$sort:{age:1}}])
Example 2: Count students by gender and sort by count in descending order
db.students.aggregate([
{$group:{_id:'$gender',count:{$sum:1}}},
{$sort:{count:-1}}
])
$limit Operator
Restricts the number of documents returned by the aggregation pipeline.
Example: Display 2 student records
db.students.aggregate([{$limit:2}])
$skip Operator
Skips a specified number of documents and returns the remaining ones.
Example 1: Display student records starting from the third one
db.students.aggregate([{$skip:2}])
Example 2: Count students by gender, sort by count in ascending order, and return the second result
db.students.aggregate([
{$group:{_id:'$gender',count:{$sum:1}}},
{$sort:{count:1}},
{$skip:1},
{$limit:1}
])
Note: Order matters - $skip should come before $limit.
$unwind Operator
Destructures an array field from the input document to output a document for each element.
Syntax 1
Unwind a specific field:
db.collection.aggregate([{$unwind:'$field_name'}])
Create test data:
db.products.insert({_id:1,item:'t-shirt',sizes:['S','M','L']})
Query:
db.products.aggregate([{$unwind:'$sizes'}])
Syntax 2
Handle empty arrays, non-arrays, missing fields, and null values:
db.inventory.aggregate([{
$unwind:{
path:'$field_name',
preserveNullAndEmptyArrays:<boolean>
}
}])
Create test data:
db.inventory.insert([
{ "_id" : 1, "item" : "a", "sizes": [ "S", "M", "L"] },
{ "_id" : 2, "item" : "b", "sizes" : [ ] },
{ "_id" : 3, "item" : "c", "sizes": "M" },
{ "_id" : 4, "item" : "d" },
{ "_id" : 5, "item" : "e", "sizes" : null }
])
Using Syntax 1:
db.inventory.aggregate([{$unwind:'$sizes'}])
Notice that documents with empty arrays, missing fields, and null values are discarded.
To prevent data loss, use Syntax 2:
db.inventory.aggregate([{$unwind:{path:'$sizes',preserveNullAndEmptyArrays:true}}])
User Management
Super Administrator
To enhance MongoDB security, users must provide usernames and passwords. MongoDB implements a role-user-database security model.
Common system roles:
root: Available only in the admin database, with superuser privilegesread: Allows users to read from the specified databasereadWrite: Allows users to read and write to the specified database
Create a super admin user:
use admin
db.createUser({
user:'admin',
pwd:'secure_password',
roles:[{role:'root',db:'admin'}]
})
Enable Authentication
Edit the configuration file:
sudo vi /etc/mongod.conf
Enable authentication (ensure there are spaces around the colon):
security:
authorization: enabled
Restart the service:
sudo service mongod stop
sudo service mongod start
Connect via terminal:
mongo -u 'admin' -p 'secure_password' --authenticationDatabase 'admin'
Regular User Management
Log in as the super admin and manage users:
View current database users:
use company_db
show users
Create a regular user:
db.createUser({
user:'app_user',
pwd:'user_password',
roles:[{role:'readWrite',db:'company_db'}]
})
Connect as a regular user:
mongo -u app_user -p user_password --authenticationDatabase company_db
Switch databases and test permissions.
Modify user credentials:
db.updateUser('app_user',{pwd:'new_password'})
Replication (Replica Sets)
What is Replication?
Replication provides redundant data copies across multiple servers, increasing data availability and ensuring data safety. It also enables recovery from hardware failures and service interruptions.
Why Use Replication?
- Data backup
- Disaster recovery
- Read/write separation
- High (24/7) data availability
- Downtime-free maintenance
- Transparent to applications
How Replication Works
Replication requires at least two nodes. One node acts as the primary, handling client requests, while others are secondaries that replicate data from the primary. Common configurations include primary-secondary and primary-multiple secondaries.
The primary node records all operations, and secondary nodes periodically poll the primary to get these operations, applying them to their data copies to maintain consistency.
Replication Features
- N-node cluster
- Any node can act as primary
- All write operations occur on the primary
- Automatic failover
- Automatic recovery
Setting Up Replication Nodes
Step 1: Create database directories
mkdir ~/data/node1
mkdir ~/data/node2
Step 2: Start MongoDB instances with the same replica set name
mongod --bind_ip 192.168.1.100 --port 27017 --dbpath ~/data/node1 --replSet myReplicaSet
mongod --bind_ip 192.168.1.100 --port 27018 --dbpath ~/data/node2 --replSet myReplicaSet
Step 3: Connect to the primary server
mongo --host 192.168.1.100 --port 27017
Step 4: Initialize the replica set
rs.initiate()
Step 5: Check current status
rs.status()
Step 6: Add a secondary node
rs.add('192.168.1.100:27018')
Step 7: Connect to the secondary node
mongo --host 192.168.1.100 --port 27018
Step 8: Insert data on the primary node
use products
for(i=0;i<10;i++){db.items.insert({_id:i})}
db.items.find()
Step 9: Query data from the secondary node
Note: To perform read operations on a secondary, enable rs.slaveOk():
rs.slaveOk()
db.items.find()
Additional Operations
Remove a secondary node:
rs.remove('192.168.1.100:27018')
Backup and Recovery
Backup
Syntax:
mongodump -h host -d database -o output_directory
-h: Server address, can include port-d: Database name to backup-o: Output directory for backup data
Example:
mkdir ~/backups
mongodump -h 127.0.0.1:27017 -d inventory -o ~/backups
Recovery
Syntax:
mongorestore -h host -d database --dir backup_directory
-h: Server address-d: Database instance to restore--dir: Location of backup data
Example:
mongorestore -h 127.0.0.1:27017 -d new_inventory --dir ~/backups/inventory
Interacting with Python
Install the Python driver:
pip install pymongo
Import the package:
import pymongo
Connect and create a client:
client = pymongo.MongoClient("localhost", 27017)
Access a database:
db = client.inventory
Access a collection:
products = db.products
Add a document:
item = {name:'laptop', price:1200}
item_id = products.insert_one(item).inserted_id
Find one document:
result = products.find_one()
Find multiple documents:
for item in products.find():
print(item)
Get document count:
print(products.count_documents({}))