Troubleshooting Persistent High CPU Load in Linux kworker Threads
The kworker subsystem manages deferred tasks within the kernel space, typically operating without impacting overall system performance. These threads handle various background operations, including flushing page caches, processing hardware interrupts, managing timers, and executing I/O completions. While generally benign, specific conditions can cause a kworker thread to consume excessive CPU resources, occasionally spiking above 50% utilization.
To identify the specific function causing the bottleneck, inspect the stack trace of the offending process. Replace <pid> with the actual process identifier:
sudo cat /proc/<pid>/stack
For a more comprehensive analysis, system performance counters can capture activity over a defined interval. The following sequence lowers the kernel log level, records call graphs for 15 seconds, and generates a report:
sudo sh -c 'echo "1" > /proc/sysrq-trigger'
sudo perf record -g -a -- sleep 15
sudo perf report --stdio
The SysRq interface supports various diagnostic commands, such as triggering a crash dump, displaying held locks, or dumping task lists. However, for this specific high-load scenario, analyzing kernel logs often yields faster results. Searching the ring buffer for I2C communication errors can reveal hardware polling issues:
dmesg -T | grep -i i2c
In cases involving integrated graphics, repeated failures to read EDID data via the I2C bus often indicate a driver defect. Comparing logs between a stable system and the affected machine usually shows constant scheduling of EDID reads that time out. To confirm the graphics adapter is the source, identify the PCI address associated with the VGA controller:
lspci -nn | grep -i vga
Once the device ID (formatted as domain:bus:slot.func) is identified, unbind the device from the kernel driver to stop the polling loop. This operation removes the device from the bus without a reboot:
echo 1 | sudo tee /sys/bus/pci/devices/<device_id>/remove
Monitoring system load after executing this command typically shows CPU usage returning to baseline levels. This behavior confirms the integrated graphics driver is initiating faulty hardware polls. Resolution requires updating the driver or cooordinating with the hardware vendor to address the underlying polling logic.