Real-Time Driver Fatigue Monitoring System Using Dlib and Facial Landmark Analysis
Developing a real-time fatigue detection system is a critical safety application in computer vision. This implementation utilizes facial landmark detection to monitor physical indicators of exhaustion, specifically focusing on eye closure patterns (EAR), yawning frequency (MAR), and head pose stability.
Facial Feature Extraction with Dlib
Dlib provides robust pre-trained models for identifying 68 specific facial landmarks. These points serve as the coordinate system for calculating geometric ratios that indicate the driver's state. The process involves identifying the bounding box of the face using a HOG-based detector followed by a shape predictor to localize features such as eyes, mouth, and nose.
import dlib
import cv2
import numpy as np
# Initialize detection components
face_finder = dlib.get_frontal_face_detector()
landmark_locator = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
def capture_landmarks(frame):
gray_img = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
detected_faces = face_finder(gray_img, 0)
all_points = []
for face in detected_faces:
shape = landmark_locator(gray_img, face)
coords = np.array([[p.x, p.y] for p in shape.parts()])
all_points.append(coords)
return all_points
Eye Aspect Ratio (EAR) for Blink Detection
The Eye Aspect Ratio (EAR) is a scalar value that describes the openness of the eye. By calculating the ratio of distances between vertical landmarks and horizontal landmarks, the system can distinguish between normal blinking and prolonged eye closure (micro-sleep).
The formula for EAR is: $$EAR = \frac{||p2 - p6|| + ||p3 - p5||}{2||p1 - p4||}$$
from scipy.spatial import distance as dist
def compute_ear(eye_landmarks):
# Vertical distances
v1 = dist.euclidean(eye_landmarks[1], eye_landmarks[5])
v2 = dist.euclidean(eye_landmarks[2], eye_landmarks[4])
# Horizontal distance
h_dist = dist.euclidean(eye_landmarks[0], eye_landmarks[3])
ear_value = (v1 + v2) / (2.0 * h_dist)
return ear_value
When the EAR drops below a specific threshold (typically around 0.2) for a sustained number of frames, the system triggers a fatigue alert based on the PERCLOS (Percentage of Eye Closure) metric.
Mouth Aspect Ratio (MAR) for Yawn Detection
Yawning is identified by the Mouth Aspect Ratio (MAR). This metric tracks the vertical expansion of the lips. Similar to EAR, it uses the Euclidean distance between the upper and lower lip points relative to the mouth width.
def compute_mar(mouth_points):
# Vertical distance between inner lips
vert_a = dist.euclidean(mouth_points[2], mouth_points[10]) # Indices 51, 59
vert_b = dist.euclidean(mouth_points[4], mouth_points[8]) # Indices 53, 57
# Horizontal distance
horiz = dist.euclidean(mouth_points[0], mouth_points[6]) # Indices 49, 55
mar_value = (vert_a + vert_b) / (2.0 * horiz)
return mar_value
Head Pose Estimation (HPE)
To detect if a driver is nodding off or distracted, Head Pose Estimation calculates the orientation of the head in 3D space. This involves mapping 2D image points to a generic 3D facial model and using the Perspective-n-Point (PnP) algorithm to determine the rotation matrix. The rotation matrix is then converted into Euler angles (Pitch, Yaw, and Roll).
If the Pitch angle (nodding forward) exceeds a threshold (e.g., 20 degrees) for a significant duration, it indicates a loss of consciousness or deep fatigue.
System Integration and UI
The backend logic is integrated into a graphical interface using PyQt5, which handles the video stream processing and real-time visualization of the calculated metrics. The UI allows for threshold adjustment and logs fatigue events into a local dataabse for further analysis.
from PyQt5.QtWidgets import QMainWindow, QApplication
from PyQt5.QtCore import QTimer
class FatigueMonitorUI(QMainWindow):
def __init__(self):
super().__init__()
self.video_source = cv2.VideoCapture(0)
self.refresh_timer = QTimer()
self.refresh_timer.timeout.connect(self.process_stream)
self.refresh_timer.start(30)
def process_stream(self):
ret, frame = self.video_source.read()
if ret:
landmarks = capture_landmarks(frame)
for pts in landmarks:
left_eye = pts[36:42]
ear = compute_ear(left_eye)
# Update UI with current EAR and check thresholds
if ear < 0.2:
self.trigger_alarm()