Zabbix Troubleshooting Guide: Frequent Errors and Practical Fixes
1. Agent fails to start, PID file missing, semaphore errors
Sample logs:
PID file /run/zabbix/zabbix_agentd.pid not readable (yet?) after start
zabbix-agent.service never wrote its PID file. Failing
Agent log indicates IPC/semaphore allocation failure:
zabbix_agentd[5922]: cannot open log: cannot create semaphore set: [28] No space left on device
Fix: increase System V semaphores and reload kernel parameters.
# /etc/sysctl.conf
kernel.sem = 500 64000 64 256
# apply changes
sysctl -p /etc/sysctl.conf
Parameter reference:
- SEMMSL: maximum semaphores per set
- SEMMNS: total semaphores system-wide
- SEMOPM: max operations per semop call
- SEMMNI: max semaphore sets system-wide
Also verify SELinux if systemctl status mentions it. Temporarily set permissive to validate:
setenforce 0
If the PID path is missing or has wrong permissions:
mkdir -p /run/zabbix
chown zabbix:zabbix /run/zabbix
systemctl restart zabbix-agent
2. "Unreachable poller processes more than 75% busy"
Meaning: internal unreachable pollers are saturated. Typical causes:
- Agents went down (crash/agent died) while hosts are still monitored
- Network latency/timeouts between server and agents
- Server-side resource contention (DB/IO/memory)
Mitigations:
- Increase poller capacity
- Review timeouts and network reliability
- Validate DB and disk IO
# /etc/zabbix/zabbix_server.conf
StartPollers=500
# then
systemctl restart zabbix-server
3. "Zabbix alerter processes more than 75% busy"
Possible reasons:
- Back-end database slowness
- High IO wait on the Zabbix server
- Insufficient memory allocated to Zabbix processes
- Network delays impacting media delivery APIs
Actions:
- Scale out sender/poller/discoverer processes as needed
- Optimize DB and storage
# /etc/zabbix/zabbix_server.conf
StartAlerters=20 # if present; otherwise scale trappers/pollers
StartPollers=500 # ensure polling capacity
StartDiscoverers=100 # if low discovery cycles cause backlog
systemctl restart zabbix-server
To reduce outgoing alert pressure temporarily, you can modify alert scripts to queue/log instead of sending.
# /usr/lib/zabbix/alertscripts/sms
#!/usr/bin/env bash
printf '%(%F %T)T\n' -1 >>/tmp/sms.log
4. "Value cache/config cache working in low memory mode"; server exits with out-of-memory
Server log:
[file:dbconfig.c,line:653] zbx_mem_malloc(): out of memory (requested 136 bytes)
[file:dbconfig.c,line:653] zbx_mem_malloc(): please increase CacheSize configuration parameter
Increase cache regions in the server config and restart.
# /etc/zabbix/zabbix_server.conf
CacheSize=2048M
ValueCacheSize=2048M
systemctl restart zabbix-server
5. DB error: connection failed [1040] Too many connections
Symptom in server log while MariaDB appears otherwise healthy. Encrease DB connection limits and systemd resource limits.
# check current
mysql -uroot -p -e "show variables like 'max_connections';"
# /etc/my.cnf (mysqld section)
max_connections=1000
# systemd unit for MariaDB
sed -n '1,200p' /usr/lib/systemd/system/mariadb.service
# add under [Service]
LimitNOFILE=10000
LimitNPROC=10000
systemctl daemon-reload
systemctl restart mariadb
mysql -uroot -p -e "show variables like 'max_connections';" # expect 1000
6. "More than 100 items missing data for more than 10 minutes" and pollers >75% busy
Increase parallelism and caches:
# /usr/local/zabbix/etc/zabbix_server.conf
StartPollers=500
StartPollersUnreachable=50
StartTrappers=30
StartDiscoverers=6
StartDBSyncers=20
CacheSize=1G
CacheUpdateFrequency=300
HistoryCacheSize=512M
TrendCacheSize=256M
HistoryTextCacheSize=80M
ValueCacheSize=1G
systemctl restart zabbix-server
7. "first network error, wait for 15 seconds"
Increase Zabbix server Timeout to account for slow endpoints:
# /etc/zabbix/zabbix_server.conf
Timeout=30
systemctl restart zabbix-server
8. "Zabbix poller processes more than 75% busy" (general)
Common triggers:
- Hung/zombie data collection subprocesses
- Large number of monitored items with slow responses
- Network latency
- Memory pressure causing stalls
Quick remediation:
# Restart periodically to clear stuck workers (optional)
service zabbix-server restart
# or via cron
@daily service zabbix-server restart >/dev/null 2>&1
Scale pollers based on host/item count and available memory:
# /etc/zabbix/zabbix_server.conf
StartPollers=12
systemctl restart zabbix-server
9. "No route to host"
If agent appears red (ZBX) and server-to-agent TCP test fails:
# from server to agent
nc -vz <agent_ip> 10050
# Error: No route to host
Check host firewall/ACLs or close the firewall on the client or add rules allowing TCP/10050.
10. Active checks timeout: "ZBX_TCP_READ() timed out"
Agent log:
active check configuration update from [<server_ip>:10051] started to fail (ZBX_TCP_READ() timed out)
Open TCP/10051 on the server firewall and network path to alow active agents to reach the server.
11. Server fails to start: missing libmysqlclient.so
Error:
... zabbix_server: error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file
Either install the correct MySQL/MariaDB client devel package or add to runtime linker paths:
# Example link (adjust paths/version to your environment)
ln -s /usr/local/mysql/lib/mysql/libmysqlclient.so.16 /usr/lib64/
# or add library path
echo "/usr/local/mysql/lib" >/etc/ld.so.conf.d/mysql.conf
ldconfig
12. "Received empty response from Zabbix Agent at [127.0.0.1]. Assuming access permission issue."
Likely Server/ServerActive mismatch or agent ListenIP mismatch. Ensure server connects to correct agent IP and the agent trusts the server.
# /etc/zabbix/zabbix_agentd.conf
Server=<zabbix_server_ip>
ServerActive=<zabbix_server_ip>
ListenIP=<agent_ip>
systemctl restart zabbix-agent
systemctl restart zabbix-server
13. "Zabbix discoverer processes more than 75% busy"
Increase discovery workers to match discovery rules volume.
# /etc/zabbix/zabbix_server.conf
StartDiscoverers=5
systemctl restart zabbix-server
Also avoid setting discovery Delay too low (e.g., 60s) unless needed.
14. Agent cannot create PID file
Logs:
zabbix_agentd[1232]: cannot create PID file [/var/run/zabbix/zabbix_agentd.pid]: [2] No such file or directory
zabbix_agentd[1724]: cannot create PID file ...: [13] Permission denied
Create the runtime directory and set ownership:
mkdir -p /var/run/zabbix
chown zabbix:zabbix /var/run/zabbix
systemctl restart zabbix-agent
15. Web-related busy process warnings and tuning
- Alerter >75% busy: likely action interval too short or alert storm. Throttle actions and/or temporarily change alert script to log timestamps (see section 3). Scale StartAlerters if using newer versions.
- Discoverer >75% busy: increase StartDiscoverers (e.g., 5–20 depending on hardware) and avoid very frequent discovery cycles.
# /etc/zabbix/zabbix_server.conf
StartDiscoverers=5
systemctl restart zabbix-server
- Poller >75% busy: increase StartPollers and/or set "Keep lost resources period" for discovery to 0 to prune unreachable entities.
StartPollers=10
- Housekeeper >75% busy: tune housekeeping to smaller regular batches.
HousekeepingFrequency=1 # run every hour
MaxHousekeeperDelete=1000000
- Server OOM on start: increase CacheSize
CacheSize=1024M
- PHP memory exhausted:
# e.g., Apache PHP-FPM integration
# /etc/httpd/conf.d/zabbix.conf
php_value memory_limit 512M
16. "cannnot connect to [[]:10050]: [113] No route to host"
Validate server-to-agent connectivity and host firewall/SELinux.
nc -vz <agent_ip> 10050
# check iptables/ firewalld and SELinux settings
17. Web says "Zabbix server is not running: information may not be currrent."
Ensure frontend points to the actual server IP.
# /etc/zabbix/web/zabbix.conf.php
$ZBX_SERVER = '192.0.2.10';
Verify the zabbix-server service is active.
18. Miscellaneous UI/backend issues
-
PHP error: scandir() disabled. Remove it from disable_functions in php.ini and restart php-fpm/nginx.
-
Windows item "ZBX_TCP_READ() failed: [104] Connection reset by peer": fix the ServerActive/Server in the Windows agent config.
-
Browser rendering issues: try a different browser; some older browsers/extensions may block scripts.
-
IPMI build error: "Invalid OPENIPMI directory - unable to find ipmiif.h". Install dependencies first:
yum install -y net-snmp-devel OpenIPMI OpenIPMI-devel rpm-build
19. "zabbix_server dead but subsys locked"; value cache low-memory
Logs show cache exhaustion, e.g., zbx_mem_malloc out of memory. Increase core caches and restart.
# /etc/zabbix/zabbix_server.conf
CacheSize=512M..2G # based on scale
ValueCacheSize=2048M # adjust for heavy history reads
systemctl restart zabbix-server
20. Common installation errors
- GCC not found building from source:
yum -y groupinstall "Development Tools"
- mysqlclient not found:
yum -y install mysql-devel # or mariadb-connector-c-devel on newer distros
- Web installer 403 on setup.php: set SELinux permissive/disabled or configure proper contexts.
setenforce 0
# or edit /etc/selinux/config -> SELINUX=disabled then reboot
-
Agent unreachable due to hostname mismatch: Ensure agent Hostname matches the "Host name" in Zabbix frontend and Server points to server IP.
-
Low free swap warnings: create, format, and enable swap.
# 2 GB swap example
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
# persist
echo "/swapfile swap swap defaults 0 0" >> /etc/fstab
- "bad interpreter" in custom script: remove Windows CRLF (^M) line endings.
sed -i 's/\r$//' your_script.sh
21. Deployment pitfalls and localization
- PHP missing mysqli when using source builds: configure PHP with mysqlnd, e.g., --with-mysqli=mysqlnd.
- configure error: invalid net-snmp dir: install net-snmp-devel libxml2-devel libcurl-devel.
- "frontend does not match Zabbix database" after installer: ensure DB schema initialized/imported successfully.
- "Unable to create configuration file" at web installer: grant web server user write access to Zabbix conf/ directory.
- Enable Chinese in UI by toggling locales.inc.php and ensuring system locales are installed.
# /usr/share/zabbix/include/locales.inc.php
# set display to true for zh_CN if needed
# Install locales (Ubuntu example)
apt-get install language-pack-zh-hans language-pack-zh-hant
update-locale LANG=zh_CN.UTF-8
- Graphs show squares for CJK: replace graph font with a CJK-capable TTF.
cd /usr/share/zabbix/fonts
cp DejaVuSans.ttf DejaVuSans.ttf.bak
cp /path/to/cjk-font.ttf DejaVuSans.ttf
22. Dashboard "Zabbix server is running: No" and item type mismatch
- If frontend shows "No", verify the DB user privileges used by zabbix-server and the service status.
- Item error: "Received value [...] is not suitable for value type [Numeric (unsigned)]" → adjust item "Type of information" or use preprocessing to cast, and/or validate cache sizes.
23. Percona template import error: invalid XML tag date
Workaround: import the template into a 2.4 server, export it, then import the re-exported XML into 3.0.
24. Server aborts with "please increase CacheSize"
Increase CacheSize to match environment size and restart.
# /etc/zabbix/zabbix_server.conf
CacheSize=2048M
systemctl restart zabbix-server
25. Agent and server log errors: quick references
- Agent: "no active checks on server [x.x.x.x:10051]: host [name] not found" → Hostname mismatch between agent config and frontend host name.
- Agent: "active check configuration update ... connection refused" → ServerActive/Server IP wrong or server-side 10051 not reachable.
# /etc/zabbix/zabbix_agentd.conf
Hostname=<frontend_host_name>
Server=<server_ip>
ServerActive=<server_ip>
- Server: "failed to accept an incoming connection: connection from ... rejected, allowed hosts: "127.0.0.1"" → adjust agent’s Server to include server IP and ensure Allow/Deny rules.
26. MySQL won’t start after power failure; InnoDB metadata errors
Symptoms include missing mysql.innodb_table_stats/index_stats, or space ID conflicts.
Options:
- If certain tables are dispensable or restorable from backup, remove the affected .ibd files and let InnoDB recover (use with caution)
- Temporarily start with forced recovery to dump/repair:
# /etc/my.cnf
[mysqld]
innodb_force_recovery=1 # increase carefully up to 6 as last resort
systemctl start mysqld
Migrate data off, rebuild system tables if necessary, and switch to InnoDB for Zabbix for better safety and performance.
27. Additional dashboard/UI localization and SNMP MIBs
- If frontend language selection complains about missing locales, install OS language packs and restart web + zabbix-server.
- SNMP MIBs for devices: install snmp-mibs-downloader (Debien/Ubuntu) or vendor packages as needed.
apt-get install snmp-mibs-downloader
28. APT update 403 using proxy
Error:
Failed to fetch http://ubuntu.kurento.org/... 403 Forbidden
Remove or fix proxy settings in /etc/apt/apt.conf if not intended.
29. WeChat (or other) alert not delivered via script
Ensure the alert script has the proper shebang and executable permission.
#!/usr/bin/env bash
# ...
chmod +x /usr/lib/zabbix/alertscripts/your_script
30. Upgrade 3.2 → 3.4 shows "frontend does not match Zabbix database"
Proper approach is to run the upgraded zabbix-server and let it perform the DB schema upgrade. If you must unblock the frontend temporarily (not recommended), updating dbversion may suppress the message but does not update the schema:
mysql> use zabbix;
mysql> update dbversion set mandatory=3040000;
mysql> flush privileges;
Always back up and run the official upgrade procedure.
31. "cannot connect to [[x.x.x.x]:10050]: [111] Connection refused"
Typical causes:
- Network blocked
- Host firewall blocks 10050
- Perimeter firewall blocks the segment
Check logs and connectivity, then allow ports:
# example iptables rule
iptables -I INPUT -p tcp -m multiport --dports 10050,10051 -j ACCEPT
32. Zombie processes due to sudo bug affecting custom checks
Symptom: missing item data; agent log shows stuck command run via sudo and inability to kill it; zombie processes observed.
Root cause: Old sudo versions had a race near select()/SIGCHLD (e.g., < 1.7.5/1.8.0), leaving the child as zombie and blocking the agent.
Mitigations:
- Avoid invoking sudo directly in Zabbix keys; wrap privilege escalation inside scripts with proper timeouts
- Upgrade sudo to a version with the fix
- Ensure custom scripts return promptly and handle timeouts cleanly
Diagnosis helpers:
ps -ef | grep <script>
strace -p <parent_pid>
lsof -p <pid>
33. "Can't open PID file /run/zabbix/zabbix_server.pid (yet?) after start: No such file or directory"
Ensure the runtime path exists and is writable by the zabbix user, then start the service. If the environment is inconsistent (e.g., tmpfs cleared), a full reboot may recreate /run with correct permissions.
mkdir -p /run/zabbix
chown zabbix:zabbix /run/zabbix
systemctl restart zabbix-server
# if still failing, reboot the VM