MongoDB Disk Space Reclamation and Defragmentation
MongoDB Disk Space Management
Understanding MongoDB Fragmentation
When documents or collections are deleted from MongoDB, the database does not immediately return the freed disk space to the operating system. Instead, MongoDB maintains a list of empty records within the data files. When new data is inserted, MongoDB allocates storage from this empty records list rather than requesting additional space from the filesystem.
Defragmentation Strategies
Several approaches exist for reclaiming disk space and organizing data:
Approach 1: The compact Command
This command rewrites and reorganizes both data and indexes for a collection.
Approach 2: Secondary Node Rotation
Add a new secondary node to the replica set, then elect it as primary. This method is particularly useful for large datasets.
Approach 3: Collection Rebuilding
Create a new collection and migrate data with freshly built indexes.
Monitoring Collection Storage
Before performing defragmentation, examine current storage metrics:
// Retrieve total storage allocated for the collection
db.myCollection.storageSize()
// Get the size of all indexes associated with the collection
db.myCollection.totalIndexSize()
// Check available reclaimable space in WiredTiger
db.myCollection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
Implementing the compact Command
The compact command reorganizes collection data and indexes. Key considerations:
- Operations on the collection are blocked during compaction—schedule during off-peak hours
- With WiredTiger storage engine, compact reorganizes data and releases unused space back to the operating system
- With MMAPv1 storage engine, compact reorganizes data and rebuilds indexes, but does not release space to the OS; instead, the freed space is reused for new inserts
Command syntax:
db.runCommand({compact: "myCollection", force: false})
Parameters:
compact: Name of the target collectionforce: Boolean flag required when compacting the primary node in a replica set to avoid election errors
Replica Set Secondary Node Rotation
This technique offers several advantages:
- Does not interrupt read/write operations on the replica set
- Generally faster than in-place compaction
- Indexes are automatically created on the new node
- Not suitable for databases containing sharded collections
Collection Reconstruction Method
When compaction is not feasible, rebuild the collection using aggregation pipeline:
// Create a new collection with optimized structure
db.createCollection("items_new")
// Build indexes on the new collection
db.items_new.createIndex({"productId": 1}, {"background": true})
// Migrate all documents from the old collection
db.items_new.aggregate([
{$match: {}},
{$out: "items_old"}
])
// Swap collection names
db.runCommand({renameCollection: "mydb.items_new", to: "mydb.items"})
This approach creates a fresh collection with defragmented data and optimized indexes, then atomically swaps the collection names to complete the migration.