Notification-Manager 原码分析(二)概念了解及配置发送通知提醒

notification-manager资源概念

服务组件

建筑学

  • Operator
    • 管理维护CRD资源。用户基于定义的CRD资源结构,创建定义资源配置。
  • manager
    • 基于CRD资源,提供webhook api 接收alert消息,匹配转发到对应接收者。

CRD资源

  • 消息接收、匹配、过滤转发告警流程图

建筑学

  • Silence: 用于定义、配置是否静默、忽略告警消息
  • Receiver: 接收者,用于配置消息告警通知目标
  • Router: 将匹配的消息分组转发到一个活多个接收者

告警消息通知配置演示

配置receiver 接收告警消息

  • 这里使用email演示, 邮箱配置需要全局配置一个邮箱账户smtp相关信息配置

  • 邮箱配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# cat email.conf.yaml

apiVersion: notification.kubesphere.io/v2beta2
kind: Config
metadata:
name: default-config
labels:
type: default
spec:
email:
hello: "hello"
authIdentify: nil
authPassword:
value: "****" # 这里仅测试,图省事用value放126邮箱的授权码了
# valueFrom: ### 支持从 k8s secret读取
# secretKeyRef:
# key: password
# name: default-config-secret
# namespace: kubesphere-monitoring-system
authUsername: xxx #自行根据具体情况修改
from: xxx@126.com #自行根据具体情况修改
requireTLS: false
smartHost:
host: smtp.126.com
port: 465
tls: {}


  • 配置一个receiver接收告警消息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# cat receiver.yaml
apiVersion: notification.kubesphere.io/v2beta2
kind: Receiver
metadata:
name: test-receiver
labels:
type: global
spec:
email:
alertSelector:
matchExpressions: ## 匹配告警消息,如果消息匹配再发送给receiver,不匹配则不会发送给receiver
- key: namespace
operator: Exists
enabled: true
template: nm.default.html
subjectTemplate: nm.default.subject
tmplType: html
tmplText:
name: notification-manager-template
namespace: kubesphere-monitoring-system
to:
- xxxx@126.com
  • 部署配置文件
1
2
mk apply -f email.conf.yaml
mk apply -f receiver.yaml
1
2
3
4
5
```
$ curl -XPOST http://localhost:19093/api/v2/alerts -d @./docs/api/alert.json
{"Status":200,"Message":"Notification request accepted"}

```

image-20231105165652038

配置router

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# cat router.yaml

apiVersion: notification.kubesphere.io/v2beta2
kind: Router
metadata:
name: test-router
spec:
alertSelector:
matchExpressions:
- key: namespace
operator: In
values:
- pp1
- pp2
receivers:
name:
- test-receiver ### 刚刚配置的receiver
# regexName: "user1.*?" ### 也支持正则通配多个receiver
# selector: []
type: email

  • 再次测试,正常能够接收到邮件
1
2
3
$ curl -XPOST http://localhost:19093/api/v2/alerts  -d @./docs/api/alert.json
{"Status":200,"Message":"Notification request accepted"}

  • 那么罗技是什么呢? 不要router也能收到,要router也能收到。接下来改下数据再测试。

    • 刚刚的配置
      • router: namespace in pp1,pp2
      • receiver: namespace Exists
    • 所以把json数据修改下,alerts中的"namespace": "pp1", 修改成pp3 呢,修改后再测试一次

image-20231105171403636

  • 一样收到了消息。说明router->receiver只是流程,都会匹配,router先匹配,之后再匹配receiver。

  • 目前看router简单理解应该是用来定义匹配规则后转发给多个reciever的一个聚合配置。但具体配置recevier也可以通过label选择过滤,并非router不匹配就不会发给receiver了。

  • 反之:如果router匹配,receiver不匹配能发送么?

    • 对配置进行修改:
      • router: namespace in pp1,pp2
      • receiver: namespace in pp3
    • 同样json数据文件也进行修改 "namespace": "pp1",
    • 修改完后确认k8s中生效
    1
    2
    mk get router test-router -o yaml
    mk get receiver test-receiver -o yaml
    • 最终未收到邮件。说明receiver要想收到必须要匹配。receiver才是核心。

    • 对比log

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    ##### receiver 未发送邮件的log


    level=debug ts=2023-11-05T09:25:00.078785773Z caller=dispatcher.go:65 msg="Dispatcher: Begins to process alerts..." alerts=1
    level=debug ts=2023-11-05T09:25:00.080080787Z caller=dispatcher.go:99 msg="Dispatcher: Acquired worker queue lock..."
    level=debug ts=2023-11-05T09:25:00.080296884Z caller=silence.go:31 msg="Start silence stage" seq=7 alert=1
    level=debug ts=2023-11-05T09:25:00.082627899Z caller=router.go:46 msg="RouteStage: start" seq=7 alert=1
    level=debug ts=2023-11-05T09:25:00.084062255Z caller=filter.go:31 msg="Start filter stage" seq=7
    level=debug ts=2023-11-05T09:25:00.084163168Z caller=aggregation.go:34 msg="Start aggregation stage" seq=7 groupby=alertname,namespace
    level=debug ts=2023-11-05T09:25:00.084228503Z caller=notify.go:73 msg="Start notify stage" seq=7
    level=debug ts=2023-11-05T09:25:00.095293871Z caller=dispatcher.go:84 msg="Dispatcher: Processor exit after 15.188461ms"


    ##### receiver 发送邮件的log

    level=debug ts=2023-11-05T09:28:00.081466898Z caller=dispatcher.go:65 msg="Dispatcher: Begins to process alerts..." alerts=1
    level=debug ts=2023-11-05T09:28:00.081529966Z caller=dispatcher.go:99 msg="Dispatcher: Acquired worker queue lock..."
    level=debug ts=2023-11-05T09:28:00.081639193Z caller=silence.go:31 msg="Start silence stage" seq=8 alert=1
    level=debug ts=2023-11-05T09:28:00.08182137Z caller=router.go:46 msg="RouteStage: start" seq=8 alert=1
    level=debug ts=2023-11-05T09:28:00.082100699Z caller=filter.go:31 msg="Start filter stage" seq=8
    level=debug ts=2023-11-05T09:28:00.082344088Z caller=aggregation.go:34 msg="Start aggregation stage" seq=8 groupby=alertname,namespace
    level=debug ts=2023-11-05T09:28:00.083333246Z caller=notify.go:73 msg="Start notify stage" seq=8
    ############################ 多出了下边两行,调用具体的通知实现执行推送消息
    level=debug ts=2023-11-05T09:28:00.980041376Z caller=email.go:150 msg="EmailNotifier: send message" from=xxxx@126.com to=xxxx@126.com
    level=debug ts=2023-11-05T09:28:00.980282337Z caller=email.go:141 msg="EmailNotifier: send message" used=878.210407ms
    level=debug ts=2023-11-05T09:28:00.980373131Z caller=dispatcher.go:84 msg="Dispatcher: Processor exit after 898.78962ms"


    • 之后再了解具体代码实现逻辑

配置Silence

  • 基于能正常接收邮件的配置环境,创建silence CRD资源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# cat silence.yaml

apiVersion: notification.kubesphere.io/v2beta2
kind: Silence
metadata:
name: test-silence
labels:
type: global
spec:
matcher:
matchExpressions:
- key: namespace
operator: In
values:
- pp1
- pp3
# startsAt: "2022-02-29T00:00:00Z" # 时间
# duration: 24h # 持续时长
1
mk apply -f silence.yaml
  • 使用 "namespace": "pp3", json数据发送测试,不再收到邮件
  • 删除silence又可以再收到告警邮件

至此基础功能CRD配置测试完毕。

  • 其他相关参数:
    • 相关参数和配置可以参考docs目录文档
    • CRD代码结构定义:
      • Silence: pkg/apis/v2beta2/silence_types.go
      • Router: pkg/apis/v2beta2/router_types.go
      • Receiver: pkg/apis/v2beta2/receiver_types.go

notification-manager开启debug

  • 方便调试和查看日志,可开启notification-manager debug,方便查看manager接受请求后发送数据的log流程打印
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# vi config/samples/notification_manager.yaml


apiVersion: notification.kubesphere.io/v2beta2
kind: NotificationManager
metadata:
name: notification-manager
spec: ### 增加下边两行
args:
- --log.level=debug
replicas: 1


# apply
mk apply -f config/samples/notification_manager.yaml


# 之后查看pod已经运行更新的配置了

mk get po
NAME READY STATUS RESTARTS AGE
notification-manager-deployment-f8fff9b84-g7rvs 1/1 Running 0 2m16s
notification-manager-operator-5b8b7cd49f-rkst9 2/2 Running 0 3h12m


# 查看pod log
mk logs notification-manager-deployment-f8fff9b84-g7rvs -f --tail 50