Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Advanced MongoDB Operations: Aggregation, Security, and Replication

Tech May 14 1

Advanced MongoDB Operations

Aggregation Pipeline

Aggregation in MongoDB is used for data processing calculations, similar to SQL functions like SUM() and AVG().

Syntax:

db.collection.aggregate([{pipeline:{expression}}])

Pipelines

In Unix and Linux, pipelines typically pass the output of one command as input to the next command:

ps ajx | grep mongo

In MongoDB, pipelines serve a similar purpose, processing documents sequentially through each stage.

Common pipeline operators:

  • $group: Groups documents in the collection for statistical results
  • $match: Filters data to output only documents that match specified conditions
  • $project: Modifies the structure of input documents, including renaming, adding, or deleting fields, and creating computed results
  • $sort: Sorts input documents before output
  • $limit: Restricts the number of documents returned by the aggregation pipeline
  • $skip: Skips a specified number of documents and returns the remaining ones
  • $unwind: Destructures an array field from the input document to output a document for each element

Expressions

Expressions process input documents and produce output:

Syntax:

expression:'$field_name'

Common expressions:

  • $sum: Calculates the sum. $sum:1 acts as a counter
  • $avg: Calculates the average value
  • $min: Retrieves the minimum value
  • $max: Retrieves the maximum value
  • $push: Inserts values into an array in the resulting document
  • $first: Retrieves the first document data based on the sort order of the source documents
  • $last: Retrieves the last document data based on the sort order of the source documents

$group Operator

Groups documents in a collection for statistical results. The _id field specifies the grouping criteria, using the format '$field' for a specific field.

Example 1: Count total number of male and female students

db.students.aggregate([
    {$group:
        {
            _id:'$gender',
            count:{$sum:1}
        }
    }
])

Group by null

Groups all documents in the collection into a single group.

Example 2: Find total number of students and average age

db.students.aggregate([
    {$group:
        {
            _id:null,
            totalCount:{$sum:1},
            avgAge:{$avg:'$age'}
        }
    }
])

Data Pivoting

Example 3: Count students by gender and collect their names

db.students.aggregate([
    {$group:
        {
            _id:'$gender',
            names:{$push:'$name'}
        }
    }
])

Using $$ROOT includes the entire document in the result array:

db.students.aggregate([
    {$group:
        {
            _id:'$gender',
            studentData:{$push:'$$ROOT'}
        }
    }
])

$match Operator

Filters documents to output only those that match specified conditions.

Example 1: Find students older than 20

db.students.aggregate([
    {$match:{age:{$gt:20}}}
])

Example 2: Count male and female students older than 20

db.students.aggregate([
    {$match:{age:{$gt:20}}},
    {$group:{_id:'$gender',count:{$sum:1}}}
])

$project Operator

Modifies the structure of input documents, including renaming, adding, deleting fields, and creating computed results.

Example 1: Display student names and ages

db.students.aggregate([
    {$project:{_id:0,name:1,age:1}}
])

Example 2: Count students by gender and display only the count

db.students.aggregate([
    {$group:{_id:'$gender',totalCount:{$sum:1}}},
    {$project:{_id:0,totalCount:1}}
])

$sort Operator

Sorts input documents before output.

Example 1: Display student information sorted by age in ascending order

db.students.aggregate([{$sort:{age:1}}])

Example 2: Count students by gender and sort by count in descending order

db.students.aggregate([
    {$group:{_id:'$gender',count:{$sum:1}}},
    {$sort:{count:-1}}
])

$limit Operator

Restricts the number of documents returned by the aggregation pipeline.

Example: Display 2 student records

db.students.aggregate([{$limit:2}])

$skip Operator

Skips a specified number of documents and returns the remaining ones.

Example 1: Display student records starting from the third one

db.students.aggregate([{$skip:2}])

Example 2: Count students by gender, sort by count in ascending order, and return the second result

db.students.aggregate([
    {$group:{_id:'$gender',count:{$sum:1}}},
    {$sort:{count:1}},
    {$skip:1},
    {$limit:1}
])

Note: Order matters - $skip should come before $limit.

$unwind Operator

Destructures an array field from the input document to output a document for each element.

Syntax 1

Unwind a specific field:

db.collection.aggregate([{$unwind:'$field_name'}])

Create test data:

db.products.insert({_id:1,item:'t-shirt',sizes:['S','M','L']})

Query:

db.products.aggregate([{$unwind:'$sizes'}])

Syntax 2

Handle empty arrays, non-arrays, missing fields, and null values:

db.inventory.aggregate([{
    $unwind:{
        path:'$field_name',
        preserveNullAndEmptyArrays:<boolean>
    }
}])

Create test data:

db.inventory.insert([
{ "_id" : 1, "item" : "a", "sizes": [ "S", "M", "L"] },
{ "_id" : 2, "item" : "b", "sizes" : [ ] },
{ "_id" : 3, "item" : "c", "sizes": "M" },
{ "_id" : 4, "item" : "d" },
{ "_id" : 5, "item" : "e", "sizes" : null }
])

Using Syntax 1:

db.inventory.aggregate([{$unwind:'$sizes'}])

Notice that documents with empty arrays, missing fields, and null values are discarded.

To prevent data loss, use Syntax 2:

db.inventory.aggregate([{$unwind:{path:'$sizes',preserveNullAndEmptyArrays:true}}])

User Management

Super Administrator

To enhance MongoDB security, users must provide usernames and passwords. MongoDB implements a role-user-database security model.

Common system roles:

  • root: Available only in the admin database, with superuser privileges
  • read: Allows users to read from the specified database
  • readWrite: Allows users to read and write to the specified database

Create a super admin user:

use admin
db.createUser({
    user:'admin',
    pwd:'secure_password',
    roles:[{role:'root',db:'admin'}]
})

Enable Authentication

Edit the configuration file:

sudo vi /etc/mongod.conf

Enable authentication (ensure there are spaces around the colon):

security:
  authorization: enabled

Restart the service:

sudo service mongod stop
sudo service mongod start

Connect via terminal:

mongo -u 'admin' -p 'secure_password' --authenticationDatabase 'admin'

Regular User Management

Log in as the super admin and manage users:

View current database users:

use company_db
show users

Create a regular user:

db.createUser({
    user:'app_user',
    pwd:'user_password',
    roles:[{role:'readWrite',db:'company_db'}]
})

Connect as a regular user:

mongo -u app_user -p user_password --authenticationDatabase company_db

Switch databases and test permissions.

Modify user credentials:

db.updateUser('app_user',{pwd:'new_password'})

Replication (Replica Sets)

What is Replication?

Replication provides redundant data copies across multiple servers, increasing data availability and ensuring data safety. It also enables recovery from hardware failures and service interruptions.

Why Use Replication?

  • Data backup
  • Disaster recovery
  • Read/write separation
  • High (24/7) data availability
  • Downtime-free maintenance
  • Transparent to applications

How Replication Works

Replication requires at least two nodes. One node acts as the primary, handling client requests, while others are secondaries that replicate data from the primary. Common configurations include primary-secondary and primary-multiple secondaries.

The primary node records all operations, and secondary nodes periodically poll the primary to get these operations, applying them to their data copies to maintain consistency.

Replication Features

  • N-node cluster
  • Any node can act as primary
  • All write operations occur on the primary
  • Automatic failover
  • Automatic recovery

Setting Up Replication Nodes

Step 1: Create database directories

mkdir ~/data/node1
mkdir ~/data/node2

Step 2: Start MongoDB instances with the same replica set name

mongod --bind_ip 192.168.1.100 --port 27017 --dbpath ~/data/node1 --replSet myReplicaSet
mongod --bind_ip 192.168.1.100 --port 27018 --dbpath ~/data/node2 --replSet myReplicaSet

Step 3: Connect to the primary server

mongo --host 192.168.1.100 --port 27017

Step 4: Initialize the replica set

rs.initiate()

Step 5: Check current status

rs.status()

Step 6: Add a secondary node

rs.add('192.168.1.100:27018')

Step 7: Connect to the secondary node

mongo --host 192.168.1.100 --port 27018

Step 8: Insert data on the primary node

use products
for(i=0;i<10;i++){db.items.insert({_id:i})}
db.items.find()

Step 9: Query data from the secondary node

Note: To perform read operations on a secondary, enable rs.slaveOk():

rs.slaveOk()
db.items.find()

Additional Operations

Remove a secondary node:

rs.remove('192.168.1.100:27018')

Backup and Recovery

Backup

Syntax:

mongodump -h host -d database -o output_directory
  • -h: Server address, can include port
  • -d: Database name to backup
  • -o: Output directory for backup data

Example:

mkdir ~/backups
mongodump -h 127.0.0.1:27017 -d inventory -o ~/backups

Recovery

Syntax:

mongorestore -h host -d database --dir backup_directory
  • -h: Server address
  • -d: Database instance to restore
  • --dir: Location of backup data

Example:

mongorestore -h 127.0.0.1:27017 -d new_inventory --dir ~/backups/inventory

Interacting with Python

Install the Python driver:

pip install pymongo

Import the package:

import pymongo

Connect and create a client:

client = pymongo.MongoClient("localhost", 27017)

Access a database:

db = client.inventory

Access a collection:

products = db.products

Add a document:

item = {name:'laptop', price:1200}
item_id = products.insert_one(item).inserted_id

Find one document:

result = products.find_one()

Find multiple documents:

for item in products.find():
    print(item)

Get document count:

print(products.count_documents({}))

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.