Object Detection with MMDetection: Setup, Inference, and Custom Training
MMDetection is a comprehensive deep learning toolkit designed for object detection and instance segmentation. It features an extensive library of over 440 pre-trained models and reproduces more than 60 academic papers. The framework supports a diverse range of architectures, including two-stage, single-stage, cascade, anchor-free, and transformer-based detectors. It provides streamlined utilities for training, testing, and inference.
Environment Configuration
The OpenMMLab ecosystem relies on foundational libraries like MMEngine and MMCV. These must be installed prior to setting up MMDetection.
bash pip install -U openmim mim install mmengine mim install "mmcv>=2.0.0"
Install the MMDetection package using MIM:
bash mim install mmdet
Verify the installation:
python import mmdet print(mmdet.version)
Model Zoo and Inference
MMDetection offers out-of-the-box inference via Python APIs. You can search for models using the MIM tool:
bash mim search mmdet --model "mask r-cnn"
Download a specific configuration and its corresponding weights:
bash mim download mmdet --config mask-rcnn_r50_fpn_2x_coco --dest ./weights
Perform inference on an image using the downloaded assets:
python import cv2 import mmcv from mmdet.apis import init_detector, inference_detector from mmdet.registry import VISUALIZERS
cfg_path = 'mask-rcnn_r50_fpn_2x_coco.py' weights_path = 'mask_rcnn_r50_fpn_2x_coco_bbox_mAP-0.392__segm_mAP-0.354_20200505_003907-3e542a40.pth'
detector = init_detector(cfg_path, weights_path, device='cpu') detections = inference_detector(detector, 'sample_image.jpg')
vis = VISUALIZERS.build(detector.cfg.visualizer) vis.dataset_meta = detector.dataset_meta
input_img = mmcv.imread('sample_image.jpg') vis.add_datasample( name='prediction', image=input_img, data_sample=detections, draw_gt=False, pred_score_thr=0.3, show=False, out_file='output_visualization.png' )
Configuration System
Deep learning experiments require defining several components: model architecture, dataset pipelines, training schedules (optimizers, learning rates, epochs), runtime environments (GPUs, distributed setups), and hooks (logging, checkpointing). In MMDetection, all these elements are consolidated into a single Python configuration file.
Key fields include:
model: Defines the network structure.data: Specifies dataset paths and augmentation strategies.optimizerandlr_config: Manage the training strategy.load_from: Points to pre-trained weight files (.pthfiles storing PyTorch parameters).
Custom Training and Fine-tuning
Custom training typically involves fine-tuning a model pre-trained on datasets like COCO. Since the model already has converged weights, the learning rate must be reduced to prevent catastrophic forgetting.
To avoid duplicating configurations, MMDetection uses an inheritance mechanism. A custom config can inherit from a base config:
python
custom_detector.py
base = 'mask-rcnn_r50_fpn_2x_coco.py'
When loaded, the framework parses the base configuraton and merges it with the custom settings.
python from mmcv import Config
parsed_cfg = Config.fromfile('custom_detector.py') print(parsed_cfg.pretty_text)
Launch the training job using the MIM command-line interface:
bash mim train mmdet custom_detector.py
COCO Dataset Format
When preparing custom data, the COCO format is widely adopted. It consists of a JSON annotation file containing three primary keys:
images: Metadata for all images in the dataset.annotations: Bounding boxes, segmentation masks, and labels for every object instance.categories: Class definitions and mapping IDs.