使用Zabbix LLD实现进程数监控

目的

  • 针对特定进程数量做监控报警

思路

  1. 通过Zabbix LLD自动发现:每台机器都跑了什么服务、每个服务应该跑多少进程
  2. Zabbix Agent 30s将当前机器跑了哪些服务、每个服务进程数上报Zabbix Server
  3. 开发给定配置文件proccessInfo.txt: IP 服务名称 进程数量,此配置作为监控依据
  4. proccessInfo.txt配置文件需在每次变更配置时,自动生成最新

配置流程

  1. LLD自动发现脚本
  2. 数据采集脚本
  3. Agent添加Key
  4. Zabbix Server添加模板组
  5. 创建自动发现规则(监控项、报警触发器)
  6. 添加当前进程数监控项(通过Zabbix Trapper方式,由Agent端)
  7. 定义报警内容

具体步骤

LLD自动发现脚本

LLD自动发现,将进程名称及进程总数上报Zabbix Server:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/usr/bin/python services.py services_list
{
"data": [
{
"{#SERVICENAME}": "192.168.1.2-p_q1_server",
"{#TRIGGER_VALUE}": 3
},
{
"{#SERVICENAME}": "192.168.1.2-p_world_d2_server",
"{#TRIGGER_VALUE}": 1
},
{
"{#SERVICENAME}": "192.168.1.2-p_gate_server",
"{#TRIGGER_VALUE}": 2
},
{
"{#SERVICENAME}": "192.168.1.2-p_world_d1_server",
"{#TRIGGER_VALUE}": 1
}
]
}
数据采集上报: /usr/bin/python services.py {HOST.HOST}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# -*- coding: utf-8 -*-
import json
import commands
import subprocess
import re
import sys
class services_monitor:
def __init__(self):
self.zabbix_server_ip = '192.168.1.1'
self.info_path = '/home/proccessInfo.txt'
self.data_path = '/tmp/.process_number_monitor.log'
def ip(self):
ipstr = '([0-9]{1,3}\.){3}[0-9]{1,3}'
ipconfig_process = subprocess.Popen("ifconfig", stdout=subprocess.PIPE)
output = ipconfig_process.stdout.read()
ip_pattern = re.compile('(inet addr:%s)' % ipstr)
pattern = re.compile(ipstr)
iplist = []
for ipaddr in re.finditer(ip_pattern, str(output)):
ip = pattern.search(ipaddr.group())
if ip.group() != "127.0.0.1":
iplist.append(ip.group())
ip = '|'.join(iplist)
return ip
def check_proc(self,proc_name):
cmd = 'ps -ef |grep %s|grep -v grep|wc -l' % proc_name
proccess_info = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
# list=proccess_info.stdout.read().strip().split('\n')
procss_num = proccess_info.communicate()[0]
return procss_num
def get_info(self,ip):
service = []
status, result = commands.getstatusoutput("grep -E '%s' %s" % (str(ip),self.info_path))
result = result.split('\n')
for i in result:
i = list(i.split(' '))
service.append({"{#SERVICENAME}": i[0].strip() + "-" + i[1].strip(), "{#TRIGGER_VALUE}":int(i[2].strip())})
data = json.dumps({'data': service}, sort_keys=True, indent=4)
return data
def collect_data(self,data):
data = json.loads(data)["data"]
commands.getstatusoutput('cat /dev/null >%s' % self.data_path)
f = open(self.data_path,'a')
for i in data:
name = i['{#SERVICENAME}'].split('-')
ip = name[0]
proc_name = name[1]
f.write('%s\tproc_num[%s]\t%s' %(ip,i['{#SERVICENAME}'],self.check_proc(proc_name)))
f.close()
def send_data(self,data_path):
status,output = commands.getstatusoutput('/bin/bash -c "zabbix_sender -z %s -i %s &>/dev/null"' % (self.zabbix_server_ip,self.data_path))
print status,output
if __name__ == '__main__':
services = services_monitor()
ip = services.ip()
data = services.get_info(ip)
try:
argv = sys.argv[1]
if argv == "services_list":
print data
else:
services.collect_data(data)
services.send_data(services.data_path)
except IndexError:
print data

Agent添加Key

1
2
3
vim /usr/local/etc/zabbix_agentd.conf
UserParameter=dzpt.service.process.discovery,/usr/bin/python /home/opt/scripts/services.py services_list
UserParameter=dzpt.service.process.exec[*],/usr/bin/python /home/opt/scripts/services.py $1

创建自动发现规则

(监控项Trapper方式、报警触发器)

添加当前进程数监控项

定义报警内容

Action中定义(此处略)

将定义好的模板链接到主机或者其他模板即可

最后

使用Zabbix LLD之后,可以设定多久更新一次监控项及监控阀值;当配置文件变更时,无需人为调整阀值和监控项

坚持原创分享,您的支持将鼓励我继续创作