Fading Coder

An Old Coder’s Final Dance

Home > Tech > Content

Zabbix Troubleshooting Guide: Frequent Errors and Practical Fixes

Tech 1

1. Agent fails to start, PID file missing, semaphore errors

Sample logs:

PID file /run/zabbix/zabbix_agentd.pid not readable (yet?) after start
zabbix-agent.service never wrote its PID file. Failing

Agent log indicates IPC/semaphore allocation failure:

zabbix_agentd[5922]: cannot open log: cannot create semaphore set: [28] No space left on device

Fix: increase System V semaphores and reload kernel parameters.

# /etc/sysctl.conf
kernel.sem = 500 64000 64 256

# apply changes
sysctl -p /etc/sysctl.conf

Parameter reference:

  • SEMMSL: maximum semaphores per set
  • SEMMNS: total semaphores system-wide
  • SEMOPM: max operations per semop call
  • SEMMNI: max semaphore sets system-wide

Also verify SELinux if systemctl status mentions it. Temporarily set permissive to validate:

setenforce 0

If the PID path is missing or has wrong permissions:

mkdir -p /run/zabbix
chown zabbix:zabbix /run/zabbix
systemctl restart zabbix-agent

2. "Unreachable poller processes more than 75% busy"

Meaning: internal unreachable pollers are saturated. Typical causes:

  • Agents went down (crash/agent died) while hosts are still monitored
  • Network latency/timeouts between server and agents
  • Server-side resource contention (DB/IO/memory)

Mitigations:

  • Increase poller capacity
  • Review timeouts and network reliability
  • Validate DB and disk IO
# /etc/zabbix/zabbix_server.conf
StartPollers=500
# then
systemctl restart zabbix-server

3. "Zabbix alerter processes more than 75% busy"

Possible reasons:

  • Back-end database slowness
  • High IO wait on the Zabbix server
  • Insufficient memory allocated to Zabbix processes
  • Network delays impacting media delivery APIs

Actions:

  • Scale out sender/poller/discoverer processes as needed
  • Optimize DB and storage
# /etc/zabbix/zabbix_server.conf
StartAlerters=20          # if present; otherwise scale trappers/pollers
StartPollers=500          # ensure polling capacity
StartDiscoverers=100      # if low discovery cycles cause backlog
systemctl restart zabbix-server

To reduce outgoing alert pressure temporarily, you can modify alert scripts to queue/log instead of sending.

# /usr/lib/zabbix/alertscripts/sms
#!/usr/bin/env bash
printf '%(%F %T)T\n' -1 >>/tmp/sms.log

4. "Value cache/config cache working in low memory mode"; server exits with out-of-memory

Server log:

[file:dbconfig.c,line:653] zbx_mem_malloc(): out of memory (requested 136 bytes)
[file:dbconfig.c,line:653] zbx_mem_malloc(): please increase CacheSize configuration parameter

Increase cache regions in the server config and restart.

# /etc/zabbix/zabbix_server.conf
CacheSize=2048M
ValueCacheSize=2048M
systemctl restart zabbix-server

5. DB error: connection failed [1040] Too many connections

Symptom in server log while MariaDB appears otherwise healthy. Encrease DB connection limits and systemd resource limits.

# check current
mysql -uroot -p -e "show variables like 'max_connections';"

# /etc/my.cnf (mysqld section)
max_connections=1000

# systemd unit for MariaDB
sed -n '1,200p' /usr/lib/systemd/system/mariadb.service
# add under [Service]
LimitNOFILE=10000
LimitNPROC=10000

systemctl daemon-reload
systemctl restart mariadb

mysql -uroot -p -e "show variables like 'max_connections';"  # expect 1000

6. "More than 100 items missing data for more than 10 minutes" and pollers >75% busy

Increase parallelism and caches:

# /usr/local/zabbix/etc/zabbix_server.conf
StartPollers=500
StartPollersUnreachable=50
StartTrappers=30
StartDiscoverers=6
StartDBSyncers=20

CacheSize=1G
CacheUpdateFrequency=300
HistoryCacheSize=512M
TrendCacheSize=256M
HistoryTextCacheSize=80M
ValueCacheSize=1G

systemctl restart zabbix-server

7. "first network error, wait for 15 seconds"

Increase Zabbix server Timeout to account for slow endpoints:

# /etc/zabbix/zabbix_server.conf
Timeout=30
systemctl restart zabbix-server

8. "Zabbix poller processes more than 75% busy" (general)

Common triggers:

  • Hung/zombie data collection subprocesses
  • Large number of monitored items with slow responses
  • Network latency
  • Memory pressure causing stalls

Quick remediation:

# Restart periodically to clear stuck workers (optional)
service zabbix-server restart
# or via cron
@daily service zabbix-server restart >/dev/null 2>&1

Scale pollers based on host/item count and available memory:

# /etc/zabbix/zabbix_server.conf
StartPollers=12
systemctl restart zabbix-server

9. "No route to host"

If agent appears red (ZBX) and server-to-agent TCP test fails:

# from server to agent
nc -vz <agent_ip> 10050
# Error: No route to host

Check host firewall/ACLs or close the firewall on the client or add rules allowing TCP/10050.

10. Active checks timeout: "ZBX_TCP_READ() timed out"

Agent log:

active check configuration update from [<server_ip>:10051] started to fail (ZBX_TCP_READ() timed out)

Open TCP/10051 on the server firewall and network path to alow active agents to reach the server.

11. Server fails to start: missing libmysqlclient.so

Error:

... zabbix_server: error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file

Either install the correct MySQL/MariaDB client devel package or add to runtime linker paths:

# Example link (adjust paths/version to your environment)
ln -s /usr/local/mysql/lib/mysql/libmysqlclient.so.16 /usr/lib64/

# or add library path
echo "/usr/local/mysql/lib" >/etc/ld.so.conf.d/mysql.conf
ldconfig

12. "Received empty response from Zabbix Agent at [127.0.0.1]. Assuming access permission issue."

Likely Server/ServerActive mismatch or agent ListenIP mismatch. Ensure server connects to correct agent IP and the agent trusts the server.

# /etc/zabbix/zabbix_agentd.conf
Server=<zabbix_server_ip>
ServerActive=<zabbix_server_ip>
ListenIP=<agent_ip>

systemctl restart zabbix-agent
systemctl restart zabbix-server

13. "Zabbix discoverer processes more than 75% busy"

Increase discovery workers to match discovery rules volume.

# /etc/zabbix/zabbix_server.conf
StartDiscoverers=5
systemctl restart zabbix-server

Also avoid setting discovery Delay too low (e.g., 60s) unless needed.

14. Agent cannot create PID file

Logs:

zabbix_agentd[1232]: cannot create PID file [/var/run/zabbix/zabbix_agentd.pid]: [2] No such file or directory
zabbix_agentd[1724]: cannot create PID file ...: [13] Permission denied

Create the runtime directory and set ownership:

mkdir -p /var/run/zabbix
chown zabbix:zabbix /var/run/zabbix
systemctl restart zabbix-agent

15. Web-related busy process warnings and tuning

  • Alerter >75% busy: likely action interval too short or alert storm. Throttle actions and/or temporarily change alert script to log timestamps (see section 3). Scale StartAlerters if using newer versions.
  • Discoverer >75% busy: increase StartDiscoverers (e.g., 5–20 depending on hardware) and avoid very frequent discovery cycles.
# /etc/zabbix/zabbix_server.conf
StartDiscoverers=5
systemctl restart zabbix-server
  • Poller >75% busy: increase StartPollers and/or set "Keep lost resources period" for discovery to 0 to prune unreachable entities.
StartPollers=10
  • Housekeeper >75% busy: tune housekeeping to smaller regular batches.
HousekeepingFrequency=1     # run every hour
MaxHousekeeperDelete=1000000
  • Server OOM on start: increase CacheSize
CacheSize=1024M
  • PHP memory exhausted:
# e.g., Apache PHP-FPM integration
# /etc/httpd/conf.d/zabbix.conf
php_value memory_limit 512M

16. "cannnot connect to [[]:10050]: [113] No route to host"

Validate server-to-agent connectivity and host firewall/SELinux.

nc -vz <agent_ip> 10050
# check iptables/ firewalld and SELinux settings

17. Web says "Zabbix server is not running: information may not be currrent."

Ensure frontend points to the actual server IP.

# /etc/zabbix/web/zabbix.conf.php
$ZBX_SERVER = '192.0.2.10';

Verify the zabbix-server service is active.

18. Miscellaneous UI/backend issues

  • PHP error: scandir() disabled. Remove it from disable_functions in php.ini and restart php-fpm/nginx.

  • Windows item "ZBX_TCP_READ() failed: [104] Connection reset by peer": fix the ServerActive/Server in the Windows agent config.

  • Browser rendering issues: try a different browser; some older browsers/extensions may block scripts.

  • IPMI build error: "Invalid OPENIPMI directory - unable to find ipmiif.h". Install dependencies first:

yum install -y net-snmp-devel OpenIPMI OpenIPMI-devel rpm-build

19. "zabbix_server dead but subsys locked"; value cache low-memory

Logs show cache exhaustion, e.g., zbx_mem_malloc out of memory. Increase core caches and restart.

# /etc/zabbix/zabbix_server.conf
CacheSize=512M..2G     # based on scale
ValueCacheSize=2048M   # adjust for heavy history reads
systemctl restart zabbix-server

20. Common installation errors

  • GCC not found building from source:
yum -y groupinstall "Development Tools"
  • mysqlclient not found:
yum -y install mysql-devel   # or mariadb-connector-c-devel on newer distros
  • Web installer 403 on setup.php: set SELinux permissive/disabled or configure proper contexts.
setenforce 0
# or edit /etc/selinux/config -> SELINUX=disabled then reboot
  • Agent unreachable due to hostname mismatch: Ensure agent Hostname matches the "Host name" in Zabbix frontend and Server points to server IP.

  • Low free swap warnings: create, format, and enable swap.

# 2 GB swap example
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# persist
echo "/swapfile swap swap defaults 0 0" >> /etc/fstab
  • "bad interpreter" in custom script: remove Windows CRLF (^M) line endings.
sed -i 's/\r$//' your_script.sh

21. Deployment pitfalls and localization

  • PHP missing mysqli when using source builds: configure PHP with mysqlnd, e.g., --with-mysqli=mysqlnd.
  • configure error: invalid net-snmp dir: install net-snmp-devel libxml2-devel libcurl-devel.
  • "frontend does not match Zabbix database" after installer: ensure DB schema initialized/imported successfully.
  • "Unable to create configuration file" at web installer: grant web server user write access to Zabbix conf/ directory.
  • Enable Chinese in UI by toggling locales.inc.php and ensuring system locales are installed.
# /usr/share/zabbix/include/locales.inc.php
# set display to true for zh_CN if needed

# Install locales (Ubuntu example)
apt-get install language-pack-zh-hans language-pack-zh-hant
update-locale LANG=zh_CN.UTF-8
  • Graphs show squares for CJK: replace graph font with a CJK-capable TTF.
cd /usr/share/zabbix/fonts
cp DejaVuSans.ttf DejaVuSans.ttf.bak
cp /path/to/cjk-font.ttf DejaVuSans.ttf

22. Dashboard "Zabbix server is running: No" and item type mismatch

  • If frontend shows "No", verify the DB user privileges used by zabbix-server and the service status.
  • Item error: "Received value [...] is not suitable for value type [Numeric (unsigned)]" → adjust item "Type of information" or use preprocessing to cast, and/or validate cache sizes.

23. Percona template import error: invalid XML tag date

Workaround: import the template into a 2.4 server, export it, then import the re-exported XML into 3.0.

24. Server aborts with "please increase CacheSize"

Increase CacheSize to match environment size and restart.

# /etc/zabbix/zabbix_server.conf
CacheSize=2048M
systemctl restart zabbix-server

25. Agent and server log errors: quick references

  • Agent: "no active checks on server [x.x.x.x:10051]: host [name] not found" → Hostname mismatch between agent config and frontend host name.
  • Agent: "active check configuration update ... connection refused" → ServerActive/Server IP wrong or server-side 10051 not reachable.
# /etc/zabbix/zabbix_agentd.conf
Hostname=<frontend_host_name>
Server=<server_ip>
ServerActive=<server_ip>
  • Server: "failed to accept an incoming connection: connection from ... rejected, allowed hosts: "127.0.0.1"" → adjust agent’s Server to include server IP and ensure Allow/Deny rules.

26. MySQL won’t start after power failure; InnoDB metadata errors

Symptoms include missing mysql.innodb_table_stats/index_stats, or space ID conflicts.

Options:

  • If certain tables are dispensable or restorable from backup, remove the affected .ibd files and let InnoDB recover (use with caution)
  • Temporarily start with forced recovery to dump/repair:
# /etc/my.cnf
[mysqld]
innodb_force_recovery=1  # increase carefully up to 6 as last resort

systemctl start mysqld

Migrate data off, rebuild system tables if necessary, and switch to InnoDB for Zabbix for better safety and performance.

27. Additional dashboard/UI localization and SNMP MIBs

  • If frontend language selection complains about missing locales, install OS language packs and restart web + zabbix-server.
  • SNMP MIBs for devices: install snmp-mibs-downloader (Debien/Ubuntu) or vendor packages as needed.
apt-get install snmp-mibs-downloader

28. APT update 403 using proxy

Error:

Failed to fetch http://ubuntu.kurento.org/... 403 Forbidden

Remove or fix proxy settings in /etc/apt/apt.conf if not intended.

29. WeChat (or other) alert not delivered via script

Ensure the alert script has the proper shebang and executable permission.

#!/usr/bin/env bash
# ...
chmod +x /usr/lib/zabbix/alertscripts/your_script

30. Upgrade 3.2 → 3.4 shows "frontend does not match Zabbix database"

Proper approach is to run the upgraded zabbix-server and let it perform the DB schema upgrade. If you must unblock the frontend temporarily (not recommended), updating dbversion may suppress the message but does not update the schema:

mysql> use zabbix;
mysql> update dbversion set mandatory=3040000;
mysql> flush privileges;

Always back up and run the official upgrade procedure.

31. "cannot connect to [[x.x.x.x]:10050]: [111] Connection refused"

Typical causes:

  • Network blocked
  • Host firewall blocks 10050
  • Perimeter firewall blocks the segment

Check logs and connectivity, then allow ports:

# example iptables rule
iptables -I INPUT -p tcp -m multiport --dports 10050,10051 -j ACCEPT

32. Zombie processes due to sudo bug affecting custom checks

Symptom: missing item data; agent log shows stuck command run via sudo and inability to kill it; zombie processes observed.

Root cause: Old sudo versions had a race near select()/SIGCHLD (e.g., < 1.7.5/1.8.0), leaving the child as zombie and blocking the agent.

Mitigations:

  • Avoid invoking sudo directly in Zabbix keys; wrap privilege escalation inside scripts with proper timeouts
  • Upgrade sudo to a version with the fix
  • Ensure custom scripts return promptly and handle timeouts cleanly

Diagnosis helpers:

ps -ef | grep <script>
strace -p <parent_pid>
lsof -p <pid>

33. "Can't open PID file /run/zabbix/zabbix_server.pid (yet?) after start: No such file or directory"

Ensure the runtime path exists and is writable by the zabbix user, then start the service. If the environment is inconsistent (e.g., tmpfs cleared), a full reboot may recreate /run with correct permissions.

mkdir -p /run/zabbix
chown zabbix:zabbix /run/zabbix
systemctl restart zabbix-server
# if still failing, reboot the VM

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.