Azure Monitor 通过 SNMP 协议监控网络设备 Fortigate Firewall

1. 前言

快过年了,先在这里祝大家鼠年大吉。最近真是爱上了 Azure Monitor 这个服务,不仅功能强大而且还是云厂商托管滴,真的是让用户只需要关注业务场景的监控需求落地即可。这篇博客跟大家分享一下实际生产过程当中的一个案例,许多用户都会在混合云或者公有云多 Region 场景中选择自建防火墙来做更深入的定制化网络设计,而网络设备的监控也是非常必要的,这关系到整个基础架构的稳定性和可靠性,那本文来介绍下如何通过 SNMP 协议监控网络设备 Fortigate Firewall。

 

2. 实验架构图及相关组件介绍

本文的实验的具体架构图如下所示:

解释一下具体使用的一些产品和服务:
Fortigate:飞塔防火墙设备,Azure上可以通过marketplace来做部署,分为PAYG和BYOL,分别代表两种lic激活模式。由于订阅的一些限制,本文不能通过此方式部署,只能通过vhd的方式部署,具体部署过程不进行赘述,大家可以参阅这里
Collectd:Collectd是一个守护 (daemon) 进程,用来定期收集系统和应用程序的性能指标,同时提供了以不同的方式来存储这些指标值的机制。Collectd从各种来源收集指标,例如 操作系统,应用程序,日志文件和外部设备,并存储此信息或通过网络使其可用。 这些统计数据可用于监控系统、查找性能瓶颈(即性能分析)并预测未来的系统负载(即容量规划)等。支持插件,所有的插件可以看这里

3. Azure Monitor 通过 SNMP 协议监控网络设备 Fortigate Firewall

3.1 创建 Fortigate Firewall 并启用 SNMP 监听

具体过程不进行赘述了,配置好之后,如图所示:

3.2 部署 CentOS 7.7 并配置 Collectd Service 将数据转发 Azure Log Analysis Workspace

首先,通过 Terraform 自动化部署 CentOS 7.7 实例,不贴具体 tf 文件了,具体可以参考这里。通过 Terraform 进行部署的过程大概需要10分钟左右。创建完成后,登陆节点测试 snmpwalk 是否能够拉取 metric:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ snmpwalk -c public -v 2c 10.11.0.5 |more
SNMPv2-MIB::sysDescr.0 = STRING: fortigate
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.12356.101.1.90081
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (647990) 1:47:59.90
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: FGVM08TM20000228
SNMPv2-MIB::sysLocation.0 = STRING:
SNMPv2-MIB::sysServices.0 = INTEGER: 78
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORIndex.1 = INTEGER: 1
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-SMI::zeroDotZero.0
SNMPv2-MIB::sysORDescr.1 = STRING:
SNMPv2-MIB::sysORUpTime.1 = Timeticks: (0) 0:00:00.00
IF-MIB::ifNumber.0 = INTEGER: 2
IF-MIB::ifIndex.1 = INTEGER: 1
IF-MIB::ifIndex.2 = INTEGER: 2
IF-MIB::ifDescr.1 = STRING:
IF-MIB::ifDescr.2 = STRING:
IF-MIB::ifType.1 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.2 = INTEGER: tunnel(131)
IF-MIB::ifMtu.1 = INTEGER: 1500
IF-MIB::ifMtu.2 = INTEGER: 1500
IF-MIB::ifSpeed.1 = Gauge32: 1000000000
IF-MIB::ifSpeed.2 = Gauge32: 0
...

10.11.0.5 是 Fortigate 的内网地址,SNMP 监控的 Metric 定义可以根据 Fortigate 的 MiB 库进行定制,具体可以参考这里。部署配置 Collectd,采用分向式配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
$ yum install epel-release -y
$ yum install collectd collectd-snmp -y
$ vi /etc/collectd.d/snmp.conf
LoadPlugin snmp
<Plugin snmp>
<Data "session_count">
Type "counter"
Table false
Instance "session_count"
Values "1.3.6.1.4.1.12356.101.4.1.8.0"
</Data>
<Data "cpu_usage">
Type "counter"
Table false
Instance "cpu_usage"
Values "1.3.6.1.4.1.12356.101.4.1.3.0"
</Data>
<Data "version">
Type "absolute"
Table false
Instance "version"
Values "1.3.6.1.4.1.12356.101.4.1.1.0"
</Data>
<Data "memory_usage">
Type "counter"
Table false
Instance "memory_usage"
Values "1.3.6.1.4.1.12356.101.4.1.4.0"
</Data>
<Data "disk_usage">
Type "counter"
Table false
Instance "disk_usage"
Values "1.3.6.1.4.1.12356.101.4.1.6.0"
</Data>
<Data "fgSysSesRate1">
# The average session setup rate over the past minute.
Type "counter"
Table false
Instance "fgSysSesRate1"
Values "1.3.6.1.4.1.12356.101.4.1.11.0"
</Data>
<Data "fgSysSesRate10">
# The average session setup rate over the past 10 minute.
Type "counter"
Table false
Instance "fgSysSesRate10"
Values "1.3.6.1.4.1.12356.101.4.1.12.0"
</Data>
<Data "fgSysSesRate30">
# The average session setup rate over the past 30 minute.
Type "counter"
Table false
Instance "fgSysSesRate30"
Values "1.3.6.1.4.1.12356.101.4.1.13.0"
</Data>
<Data "fgSysSesRate60">
# The average session setup rate over the past 60 minute.
Type "counter"
Table false
Instance "fgSysSesRate60"
Values "1.3.6.1.4.1.12356.101.4.1.14.0"
</Data>
<Data "fgHwSensorCount">
# The number of entries in fgHwSensorTable
Type "counter"
Table false
Instance "fgHwSensorCount"
Values "1.3.6.1.4.1.12356.101.4.3.1.0"
</Data>
<Data "leePrdVpnStatus">
# The number of entries in fgHwSensorTable
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.2
Type "counter"
Table false
Instance "leePrdVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.2"
</Data>
<Data "leeIntVpnStatus">
# The number of entries in fgHwSensorTable
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.3
Type "counter"
Table false
Instance "leeIntVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.3"
</Data>
<Data "leePpdVpnStatus">
# The number of entries in fgHwSensorTable
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.4
Type "counter"
Table false
Instance "leePpdVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.4"
</Data>
<Data "AMMPrdVpnStatus">
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.5
Type "counter"
Table false
Instance "AMMPrdVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.5"
</Data>
<Data "AMMAdmVpnStatus">
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.6
Type "counter"
Table false
Instance "AMMAdmVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.6"
</Data>
<Data "AMMIntVpnStatus">
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.7
Type "counter"
Table false
Instance "AMMIntVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.7"
</Data>
<Data "AMMPpdVpnStatus">
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.8
Type "counter"
Table false
Instance "AMMPddVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.8"
</Data>
<Data "CSTVpnStatus">
# 1.3.6.1.4.1.12356.101.12.2.2.1.3.4
Type "counter"
Table false
Instance "CSTVpnStatus"
Values "1.3.6.1.4.1.12356.101.12.2.2.1.20.4"
</Data>
<Data "ifInOctetsmgmt1">
Type "counter"
Table false
Instance "ifInOctetsmgmt1"
Values "1.3.6.1.2.1.2.2.1.10.1"
</Data>
<Data "ifInOctetsmgmt2">
Type "counter"
Table false
Instance "ifInOctetsmgmt2"
Values "1.3.6.1.2.1.2.2.1.10.2"
</Data>
<Data "ifInOctetsAMEWAN1">
Type "counter"
Table false
Instance "ifInOctetsAMEWAN1"
Values "1.3.6.1.2.1.2.2.1.10.15"
</Data>
<Data "ifInOctetsAMELAN1">
Type "counter"
Table false
Instance "ifInOctetsAMELAN1"
Values "1.3.6.1.2.1.2.2.1.10.16"
</Data>
<Data "ifInOctetsAMELANPRD1">
Type "counter"
Table false
Instance "ifInOctetsAMELANPRD1"
Values "1.3.6.1.2.1.2.2.1.10.17"
</Data>
<Data "ifInOctetsAMELANPPD1">
Type "counter"
Table false
Instance "ifInOctetsAMELANPPD1"
Values "1.3.6.1.2.1.2.2.1.10.18"
</Data>
<Data "ifInOctetsAMELANINT1">
Type "counter"
Table false
Instance "ifInOctetsAMELANINT1"
Values "1.3.6.1.2.1.2.2.1.10.19"
</Data>
<Data "ifInOctetsAMELANADMIN1">
Type "counter"
Table false
Instance "ifInOctetsAMELANADMIN1"
Values "1.3.6.1.2.1.2.2.1.10.23"
</Data>

<Host "fortigate0001">
Address "10.11.0.5"
Version 2c
Community "public"
Collect "session_count" "version" "cpu_usage" "memory_usage" "disk_usage" "fgSysSesRate1" "fgSysSesRate10" "fgSysSesRate30" "fgSysSesRate60" "fgHwSensorCount" "leePrdVpnStatus" "leeIntVpnStatus" "leePpdVpnStatus" "AMMPrdVpnStatus""AMMAdmVpnStatus" "AMMIntVpnStatus" "AMMPpdVpnStatus" "CSTVpnStatus" "ifInOctetsmgmt1" "ifInOctetsmgmt2" "ifInOctetsAMEWAN1" "ifInOctetsAMELAN1" "ifInOctetsAMELANPRD1" "ifInOctetsAMELANPPD1" "ifInOctetsAMELANINT1" "ifInOctetsAMELANADMIN1"
Interval 300
</Host>
</Plugin>

保存退出后,配置 Azure Log Analysis Agent,如图所示:

<img src=”LA-Installation.jpg” “height:800px” width=”800px” div align=center/>

显示绿色图标,代表连接正常。

3.2 部署单节点 MongoDB Server 4.2.0

3.2.1 配置 CentOS 7.7 MongoDB 4.x Yum 源

1
2
3
4
5
6
7
8
$ cd /etc/yum.repos.d/
$ vi mongodb.repo
[mongodb-org-4.2]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.2/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc

保存退出,Yum 源配置完成。

3.2.2 安装单节点 MongoDB Server 4.2.0

安装 MongoDB Server 4.2.0:

1
$ yum install -y mongodb-org-4.2.0 mongodb-org-server-4.2.0 mongodb-org-shell-4.2.0 mongodb-org-mongos-4.2.0 mongodb-org-tools-4.2.0

检查安装包及版本:

1
2
3
4
5
6
$ rpm -qa |grep mongodb
mongodb-org-tools-4.2.0-1.el7.x86_64
mongodb-org-mongos-4.2.0-1.el7.x86_64
mongodb-org-shell-4.2.0-1.el7.x86_64
mongodb-org-4.2.0-1.el7.x86_64
mongodb-org-server-4.2.0-1.el7.x86_64

修改 MongoDB Server 配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ vi /etc/mongod.conf
# mongod.conf

# for documentation of all options, see:
# http://docs.mongodb.org/manual/reference/configuration-options/

# where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log

# Where and how to store data.
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
# engine:
# wiredTiger:

# how the process runs
processManagement:
fork: true # fork and run in background
pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile
timeZoneInfo: /usr/share/zoneinfo

# network interfaces
net:
port: 27017
bindIp: 0.0.0.0 # Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses or, alternatively, use the net.bindIpAll setting.

#security:
#operationProfiling:
replication:
oplogSizeMB: "20480"
replSetName: repconfig

启动 MongoDB Sever 服务:

1
systemctl start mongod && systemctl enable mongod

MongoDB 初始化:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ mongo --port 27017
> repconfig = { _id : "repconfig", members : [ {_id : 0, host : "10.11.0.4:27017" , priority: 1 } ] }
{
"_id" : "repconfig",
"members" : [
{
"_id" : 0,
"host" : "10.11.0.4:27017",
"priority" : 1
}
]
}
> rs.initiate(repconfig);
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1579339466, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1579339466, 1)
}
repconfig:SECONDARY>
repconfig:PRIMARY>

初始化结束后,可以通过 Mongo Shell rs.status() 来查 Replica 信息。

3.3 通过 Python 调用 Azure Monitor Restful API 来监控 Mongodb Server 4.2.0

3.3.1 创建 Azure Log Analysis Workspace

具体操作过程不再赘述,通过 Portal 或者 Cli 等方式 Step by Step 点击创建即可,如图:

<img src=”LA-Creation.jpg” “height:800px” width=”600px” div align=center/>

创建好了需要的几个信息,如图:

<img src=”LA-Info.jpg” “height:800px” width=”800px” div align=center/>

“WORKSPACE ID” 为脚本中的 customer_id,”PRIMARY KEY” 为脚本中的 shared_key。

3.3.2 Python 调用 Azure Monitor Restful API 向 Azure Log Analysis Workspace 传送数据

Azure Monitor Restful API 不做赘述,具体可以参考这里。本实验先通过 pymongo 收集 MongoDB Metrics,然后送到 Azure Log Analysis Workspace 中,先安装 pymongo:

1
2
$ yum install python-pip y
$ pip install pymongo requests

具体的 Python 脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
#!/usr/bin/env python
"""
Date: 01/18/2020
Author: Xinsheng Wang
Description: A custom script to get MongoDB metrics and send data to Azure Monitor
Requires: MongoClient in python
"""

from calendar import timegm
from time import gmtime

from pymongo import MongoClient, errors
from sys import exit

import json
import requests
import datetime
import hashlib
import hmac
import base64
import os
from glob import glob

# Update the customer ID to your Log Analytics workspace ID
customer_id = '86df0cbc-076c-4483-8a32-c59c6550a771'

# For the shared key, use either the primary or the secondary Connected Sources client authentication key
shared_key = "b3uLsEOXBFBqTiAHDGp9boTeKR6v86f/9cLPWWsWUvs+LcjBIqjDp9CDJL+7vxlKDDRxqXIf1jjjKcZbdV0H/Q=="

# The log type is the name of the event that is being submitted
log_type = 'MongoDBMonitorLog'

class MongoDB(object):
"""main script class"""
# pylint: disable=too-many-instance-attributes

def delete_temporary_files(self):
"""delete temporary files"""
for file in glob('/tmp/mongometrics000*'):
os.remove(file)

def __init__(self):
self.mongo_host = "10.11.0.4"
self.mongo_port = 27017
self.mongo_db = ["admin", ]
self.mongo_user = None
self.mongo_password = None
self.__conn = None
self.__dbnames = None
self.__metrics = []

def connect(self):
"""Connect to MongoDB"""
if self.__conn is None:
if self.mongo_user is None:
try:
self.__conn = MongoClient('mongodb://%s:%s' %
(self.mongo_host,
self.mongo_port))
except errors.PyMongoError as py_mongo_error:
print('Error in MongoDB connection: %s' %
str(py_mongo_error))
else:
try:
self.__conn = MongoClient('mongodb://%s:%s@%s:%s' %
(self.mongo_user,
self.mongo_password,
self.mongo_host,
self.mongo_port))
except errors.PyMongoError as py_mongo_error:
print('Error in MongoDB connection: %s' %
str(py_mongo_error))

def add_metrics(self, k, v):
"""add each metric to the metrics list"""
global body
dict_metrics = {}
dict_metrics["key"] = k
dict_metrics["value"] = v
self.__metrics.append(dict_metrics)
dic = json.dumps(dict_metrics, sort_keys=True, indent=4, separators=(',', ':')).replace('}', '},')

f = open('/tmp/mongometrics0001.txt','a')
f.write(dic)
f.close

os.system("cat /tmp/mongometrics0001.txt |sed '$s/\}\,/\}\]/g;1s/{/[{/' > /tmp/mongometrics0002.txt")

with open('/tmp/mongometrics0002.txt','r') as src:
body = src.read()
print(body)

def get_db_names(self):
"""get a list of DB names"""
if self.__conn is None:
self.connect()
db_handler = self.__conn[self.mongo_db[0]]

master = db_handler.command('isMaster')['ismaster']
dict_metrics = {}
dict_metrics['key'] = 'mongodb.ismaster'
if master:
dict_metrics['value'] = 1
db_names = self.__conn.database_names()
self.__dbnames = db_names
else:
dict_metrics['value'] = 0
self.__metrics.append(dict_metrics)

def get_mongo_db_lld(self):
"""print DB list in json format, to be used for mongo db discovery"""
if self.__dbnames is None:
db_names = self.get_db_names()
else:
db_names = self.__dbnames
dict_metrics = {}
db_list = []
dict_metrics['key'] = 'mongodb.discovery'
dict_metrics['value'] = {"data": db_list}
if db_names is not None:
for db_name in db_names:
dict_lld_metric = {}
dict_lld_metric['{#MONGODBNAME}'] = db_name
db_list.append(dict_lld_metric)
dict_metrics['value'] = '{"data": ' + json.dumps(db_list) + '}'
self.__metrics.insert(0, dict_metrics)

def get_oplog(self):
"""get replica set oplog information"""
if self.__conn is None:
self.connect()
db_handler = self.__conn['local']

coll = db_handler.oplog.rs

op_first = (coll.find().sort('$natural', 1).limit(1))
op_last = (coll.find().sort('$natural', -1).limit(1))

# if host is not a member of replica set, without this check we will
# raise StopIteration as guided in
# http://api.mongodb.com/python/current/api/pymongo/cursor.html

if op_first.count() > 0 and op_last.count() > 0:
op_fst = (op_first.next())['ts'].time
op_last_st = op_last[0]['ts']
op_lst = (op_last.next())['ts'].time

status = round(float(op_lst - op_fst), 1)
self.add_metrics('mongodb.oplog', status)

current_time = timegm(gmtime())
oplog = int(((str(op_last_st).split('('))[1].split(','))[0])
self.add_metrics('mongodb.oplog-sync', (current_time - oplog))


def get_maintenance(self):
"""get replica set maintenance info"""
if self.__conn is None:
self.connect()
db_handler = self.__conn

fsync_locked = int(db_handler.is_locked)
self.add_metrics('mongodb.fsync-locked', fsync_locked)

try:
config = db_handler.admin.command("replSetGetConfig", 1)
connstring = (self.mongo_host + ':' + str(self.mongo_port))
connstrings = list()

for i in range(0, len(config['config']['members'])):
host = config['config']['members'][i]['host']
connstrings.append(host)

if connstring in host:
priority = config['config']['members'][i]['priority']
hidden = int(config['config']['members'][i]['hidden'])

self.add_metrics('mongodb.priority', priority)
self.add_metrics('mongodb.hidden', hidden)
except errors.PyMongoError:
print ('Error while fetching replica set configuration.'
'Not a member of replica set?')
except UnboundLocalError:
print ('Cannot use this mongo host: must be one of ' + ','.join(connstrings))
exit(1)

def get_server_status_metrics(self):
"""get server status"""
if self.__conn is None:
self.connect()
db_handler = self.__conn[self.mongo_db[0]]
ss = db_handler.command('serverStatus')

# db info
self.add_metrics('mongodb.version', ss['version'])
self.add_metrics('mongodb.storageEngine', ss['storageEngine']['name'])
self.add_metrics('mongodb.uptime', int(ss['uptime']))
self.add_metrics('mongodb.okstatus', int(ss['ok']))

# asserts
for k, v in ss['asserts'].items():
self.add_metrics('mongodb.asserts.' + k, v)

# operations
for k, v in ss['opcounters'].items():
self.add_metrics('mongodb.operation.' + k, v)

# connections
for k, v in ss['connections'].items():
self.add_metrics('mongodb.connection.' + k, v)

# extra info
self.add_metrics('mongodb.page.faults',
ss['extra_info']['page_faults'])

#wired tiger
if ss['storageEngine']['name'] == 'wiredTiger':
self.add_metrics('mongodb.used-cache',
ss['wiredTiger']['cache']
["bytes currently in the cache"])
self.add_metrics('mongodb.total-cache',
ss['wiredTiger']['cache']
["maximum bytes configured"])
self.add_metrics('mongodb.dirty-cache',
ss['wiredTiger']['cache']
["tracked dirty bytes in the cache"])

# global lock
lock_total_time = ss['globalLock']['totalTime']
self.add_metrics('mongodb.globalLock.totalTime', lock_total_time)
for k, v in ss['globalLock']['currentQueue'].items():
self.add_metrics('mongodb.globalLock.currentQueue.' + k, v)
for k, v in ss['globalLock']['activeClients'].items():
self.add_metrics('mongodb.globalLock.activeClients.' + k, v)

def get_db_stats_metrics(self):
"""get DB stats for each DB"""
if self.__conn is None:
self.connect()
if self.__dbnames is None:
self.get_db_names()
if self.__dbnames is not None:
for mongo_db in self.__dbnames:
db_handler = self.__conn[mongo_db]
dbs = db_handler.command('dbstats')
for k, v in dbs.items():
if k in ['storageSize', 'ok', 'avgObjSize', 'indexes',
'objects', 'collections', 'fileSize',
'numExtents', 'dataSize', 'indexSize',
'nsSizeMB']:
self.add_metrics('mongodb.stats.' + k +
'[' + mongo_db + ']', int(v))
def close(self):
"""close connection to mongo"""
if self.__conn is not None:
self.__conn.close()

# Build the API signature
def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
x_headers = 'x-ms-date:' + date
string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
bytes_to_hash = bytes(string_to_hash).encode('utf-8')
decoded_key = base64.b64decode(shared_key)
encoded_hash = base64.b64encode(hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest())
authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
return authorization

# Build and send a request to the POST API
def mongodb_azuremonitor_loganalysis_post_data(customer_id, shared_key, body, log_type):
method = 'POST'
content_type = 'application/json'
resource = '/api/logs'
rfc1123date = datetime.datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
content_length = len(body)
signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

headers = {
'content-type': content_type,
'Authorization': signature,
'Log-Type': log_type,
'x-ms-date': rfc1123date
}

response = requests.post(uri,data=body, headers=headers)
if (response.status_code >= 200 and response.status_code <= 299):
print 'Accepted'
else:
print "Response code: {}".format(response.status_code)

if __name__ == '__main__':
mongodb = MongoDB()
mongodb.delete_temporary_files()
mongodb.get_db_names()
mongodb.get_mongo_db_lld()
mongodb.get_oplog()
mongodb.get_maintenance()
mongodb.get_server_status_metrics()
mongodb.get_db_stats_metrics()
mongodb.close()
mongodb_azuremonitor_loganalysis_post_data(customer_id, shared_key, body, log_type)

注意相应的 customer_id 和 shared_key 替换成图中的 key,log_type 为收集到 Log Analysis Workspace 的 namespace 名字。保存脚本名字为 AzureMonitor_LogAnalysis_MongodbMetrics.py,执行脚本:

1
python AzureMonitor_LogAnalysis_MongodbMetrics.py

通过 print 可以打印一些交互式信息,观察数据是否已经送到 Azure Log Analysis Workspace。

3.3.3 利用 Kusto 查询 MongoDB Custom Logs

在 Azure Portal 查看并通过 KQL 查询 MongoDB 的数据:

<img src=”Azure-Monitor-Restful-API-MongoDB.jpg” “height:800px” width=”800px” div align=center/>

如图可见,数据已经成功送到了 Azure Log Analysis Workspace,后续可以写更复杂的 Kusto 来做相关的查询,并可以考虑结合调度或者 Linux Crontab 等服务做定时的数据拉取和告警了。

4. 总结

测试至此,Azure Monitor Restful API 监控 MongoDB 的示例完成了,希望能够给想要通过 Azure Monitor 做二次开发的同学们一些参考。但本文还有一些可以优化之处,比如 Python 脚本没有考虑更多的 Log Output 以及执行失败时候的 Retry 逻辑,在实际生产中,为了严谨,还是希望大家要充分考虑逻辑在上线,不过这些功能有待大家自己添加了。