前言
公司买了一堆深信服的设备,最近打算纳入自己的监控,本来以为开启snmp检测几个oid就好,结果发现深信服很坑很坑,总结下。现有三种深信服的设备:AC(访问控制),V**(虚拟隧道网络)和FW(防火墙)。
大的问题两个:
1. 通用指标的snmp OID不是统一的,虽然都是深信服的牌子,但是就连uptime这种通用标准的oid都没有统一!?
2. 输出字符编码不统一,同样输出Hex-STRING,有用utf8,有用gbk......
小问题就多了:
输出随意不讲逻辑,比如同样在v**里面,前一条是CPU使用率,输出一个数字(14),后一条是剩余内存,输出字符串 (110 MB),而AC和FW都有数字输出内存使用率;
再比如AC和FW输出连接数是数字(1324),V**输出连接数变成字符串(1174 sessions in all);
输出格式不讲究,比如下面的v**,为什么第二个和第六个要换行?
iso.3.6.1.2.1.1.1.0 = STRING: "Sangfor AF"
iso.3.6.1.2.1.1.1.0 = STRING: "Linux sslvpn 3.10.0 #3 SMP Tue Dec 17 14:24:33 CST 2019 x86_64 x86_64 x86_64 GNU/Linux
"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.35047.2.10
iso.3.6.1.2.1.1.3.0 = Timeticks: (1913141400) 221 days, 10:16:54.00
iso.3.6.1.2.1.1.4.0 = STRING: "support@sangfor.com.cn"
iso.3.6.1.2.1.1.5.0 = STRING: "Linux
"
iso.3.6.1.2.1.1.6.0 = STRING: "China"
iso.3.6.1.2.1.1.7.0 = INTEGER: 72
处理过程
原本想直接使用nagios插件自带的check_snmp,再把结果导入granfana生成漂亮图,结果各种错误搞到崩溃,最后强行编了一个自己看着都难受的脚本,凑活着获取几个值就收工。
脚本
脚本如下:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# huky0924@aliyun.com
# 因为深信服设备而来的痛苦编程
import os
import sys
import getopt
import logging
from subprocess import PIPE, run
import codecs
def returnToIcinga(outStr, status, outPerf):
out = outStr + ' |' + outPerf
if status:
if 'CRITICAL' in status:
return (out, 2)
elif 'WARNING' in status:
return (out, 1)
elif 'OK' in status:
return (out, 0)
else:
return (out, 3)
else:
return (out, 0)
if __name__ == '__main__':
# 临时自行修改标准输出为utf-8,后面获取的编码有多种编码
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
logging.info('\n开始')
logFile = '/tmp/check_snmp_wrapper.log'
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
filename=logFile,
filemode='a')
argv = sys.argv[1:]
opts, args = getopt.getopt(argv, "H:C:o:", ["hostname=", "community=", "oid="]) # 长选项模式
for opt, arg in opts:
if opt in ['-H', '--hostname']:
hostname = arg
elif opt in ['-C', '--community']:
community = arg
elif opt in ['-o', '--oid']:
oid = arg
CMD = ['/usr/bin/snmpwalk', '-Os', '-v2c', '-t3', '-c', community, hostname, oid]
outStr = ''
outPerf = ''
status = ''
try:
logging.info('检测地址: ' + hostname)
exe = run(CMD, timeout=3, stdout=PIPE, stderr=PIPE)
if not exe.returncode == 0:
logging.error(exe.stderr)
sys.exit(1)
res = exe.stdout.decode('utf-8')
#logging.info('检测结果: ' + res)
if len(res) > 20:
output = res.replace('\n',' ')
logging.info('合并为一行: ' + output)
if 'INTEGER' in output:
#logging.info('结果为整数值: ' + output)
result = output.split('INTEGER: ')[-1]
status = 'OK:'
outPerf = 'alarm=' + result.strip() + ';'
outStr = '获取值: ' + result
#logging.info(outStr, outPerf)
else:
if 'Hex-STRING' in output:
result = output.split('STRING: ')[-1]
#logging.warn('结果为十六进制字符: ' + output)
try:
logStr = bytes.fromhex(result).decode('utf8')
except UnicodeDecodeError as e:
logStr = bytes.fromhex(result).decode('gbk')
#logging.warn('解码为: ' + logStr)
else:
logStr = output.split('STRING: ')[-1].replace('"', '')
# 日志生成列表可能只有一个元素
try:
outList = logStr.split('|')
if len(outList) > 1:
outStr = outList[0].strip()
outPerf = outList[1].strip()
outStrL = outStr.split(':')
status = outStrL.pop(0)
outStr = ' '.join(outStrL)
else:
outStr = outList[0].strip()
except Exception as e:
logging.error(e)
else:
exit(0)
except Exception as e:
logging.error(e)
#logging.info(status, status, outPerf)
#logging.info(type(outPerf))
(rev, ren) = returnToIcinga(outStr, status, outPerf)
print(rev)
sys.exit(ren)
上面的脚本保存为 /usr/lib/nagios/plugins/check_snmp_wrapper.py,并创建命令供icinga调用,以后使用命令snmp_wrapy即可
配置
#snmp warpper python
object CheckCommand "snmp_wrapy" {
command = [ PluginDir + "/check_snmp_wrapper.py" ]
arguments = {
"-H" = "$address$"
"-C" = "$snmp_community$"
"-o" = "$snmpoid$"
}
}
定义主机设备
注意为了识别和区分深信服的AC /V** /FW,自定义了一个主机变量vars.manufacturer并赋值为"sangfor",同样方法可以识别区分Huawei(华为)H3C(华三)Cisco(思科等),
为了进步以区分,在命名的时候使用AC/V**/FW开头,后面创建服务的时候可以执行相应的匹配,如:
定义中的vars.client_endpoint是因为设置了卫星服务器来分担主服务器的负载,不是必须的。
object Host "AC-XXXGS" {
import "generic-switch"
display_name = "AC-XXX公司"
address = "192.168.10.66"
vars.type = "switch"
vars.manufacturer = "sangfor"
vars.client_endpoint = "yyyyy"
vars.snmp_community = "public"
vars.snmp_version = "2c"
icon_image = "img/icons/sangfor.png"
}
object Host "fwXXX" {
import "generic-switch"
display_name = "XXX防火墙"
address = "192.168.10.200"
vars.type = "switch"
vars.manufacturer = "sangfor"
vars.client_endpoint = "yyyyy"
vars.snmp_community = "public"
vars.snmp_version = "2c"
icon_image = "img/icons/sangfor.png"
}
object Host "vpnXXX" {
import "generic-switch"
display_name = "XXXvpn"
address = "192.168.10.100"
vars.type = "switch"
vars.manufacturer = "sangfor"
vars.client_endpoint = "yyyyy"
vars.snmp_community = "public"
vars.snmp_version = "2c"
icon_image = "img/icons/sangfor.png"
}
定义服务
其中最重要的是assign匹配,根据上面的主机定义,按与运算匹配三个条件(client_endpoint,manufacturer,主机名开头字符),如下:
apply Service "memory" {
display_name = "内存使用率-snmp"
import "generic-service-sw"
check_command = "snmp_wrapy"
vars.check_command = "memory"
vars.snmpoid = "iso.3.6.1.2.1.1.12"
assign where (host.vars.client_endpoint == "yyyy" && host.vars.manufacturer == "sangfor" && match("fw*", host.name))
}
apply Service "memory" {
display_name = "剩余内存-snmp"
import "generic-service-sw"
check_command = "snmp_wrapy"
vars.grafana_graph_disable = 1
vars.snmpoid = "iso.3.6.1.4.1.35047.1.4.0"
assign where (host.vars.client_endpoint == "yyyy" && host.vars.manufacturer == "sangfor" && match("vpn*", host.name))
}
apply Service "users" {
display_name = "用户数-snmp"
import "generic-service-sw"
check_command = "snmp_wrapy"
vars.check_command = "users"
vars.snmpoid = ".1.3.6.1.4.1.35047.2.1.1.1"
assign where (host.vars.client_endpoint == "yyyy" && host.vars.manufacturer == "sangfor" && match("AC*", host.name))
}
重载icinga2
$ sudo /etc/init.d/icinga2 reload
[ ok ] Reloading icinga2 configuration (via systemctl): icinga2.service.
最后还是有少部分指标可以绘图的
结束
顺便说下,华为或者华三可以直接使用centreon-plugins检测,思科等国外品牌通常都可以,直接查看是否支持即可。