前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Kube-Prometheus集群安装教程

Kube-Prometheus集群安装教程

原创
作者头像
dufu
修改2024-07-26 11:33:52
400
修改2024-07-26 11:33:52
举报
文章被收录于专栏:k8s相关实操
1 .版本要求

k8s集群版本

kube-prometheus版本

部署方式

v1.18

<=v0.6.0

单节点中心化部署

2. 最小化安装说明

服务

是否保留部署

副本数

部署形式

alertmanager-main

1

statefulset

kube-state-metrics

1

deployment

node-exporter

1

daemonset

prometheus-adapter

1

deployment

prometheus-operator

1

deployment

grafana

1

deployment

prometheus-k8s

1

statefulset

blackbox-exporter

deployment

3.告警模块配置(alertmanager-secret.yaml)

代码语言:txt
复制
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    "global":
      "resolve_timeout": "5m"
    "inhibit_rules":
    - "equal":
      - "namespace"
      - "alertname"
      "source_match":
        "severity": "critical"
      "target_match_re":
        "severity": "warning|info"
    - "equal":
      - "namespace"
      - "alertname"
      "source_match":
        "severity": "warning"
      "target_match_re":
        "severity": "info"
    "receivers":
    - "name": "simplecloud"
      "webhook_configs":
      - "url": "http://xxx:8554/notifications"
        "http_config": 
          "bearer_token": "xxx"
    - "name": "Watchdog"
    - "name": "Critical"
    "route":
      "group_by":
      - "namespace"
      "group_interval": "5m"
      "group_wait": "30s"
      "receiver": "xxx"
      "repeat_interval": "12h"
      "routes":
      - "match":
          "alertname": "Watchdog"
        "receiver": "Watchdog"
      - "match":
          "severity": "critical"
          "repeat_interval": "1h"
        "receiver": "Critical"
      - "match":
          "severity": "warning"
          "repeat_interval": "1d"
      - "match":
          "severity": "info"
          "repeat_interval": "7d"
type: Opaque

4.告警规则配置(prometheus-rules.yaml

代码语言:txt
复制
- name: Pod状态异常
    rules:
    - alert: Pod状态异常
      annotations:
        description: The pod {{ $labels.pod }} in namespace {{ $labels.namespace }}
          was unavailable.
        summary: Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is unavailable.
      expr: min_over_time(sum by (namespace, pod, phase) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[5m:1m])
        > 0
      for: 2m
      labels:
        severity: critical
  - name: Deployment可用副本状态异常
    rules:
    - alert: 工作负载可用副本数异常
      annotations:
        description: The pods of {{ $labels.deployment}} is unavalilable.
        summary: The Status of {{ $labels.deployment}} pods is abnomal
      expr: kube_deployment_spec_replicas{} != kube_deployment_status_replicas_available{}
      for: 2m
      labels:
        severity: critical
  - name: Pod启动失败
    rules:
    - alert: 5分钟内Pod重启累计3次以上
      annotations:
        description: The Pod {{ $labels.namespace }}/{{ $labels.pod }} has failed
          to start.
        summary: Pod {{ $labels.namespace }}/{{ $labels.pod }} failed to start
      expr: sum_over_time(increase(kube_pod_container_status_restarts_total{}[1m])[5m:1m])
        >3
      for: 5m
      labels:
        severity: critical

更多个性化告警规则配置可参考阿里云告警配置,这里插入友方超链接会被屏蔽,有需要的小伙伴可以在文章底下私信我。

5.k8s常用指标自定义标签配置

原脚本所有xxx-serviceMonitor.yaml添加以下配置片段:

代码语言:txt
复制
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus
  name: prometheus
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: web
    metricRelabelings:
    - sourceLabels: []
      targetLabel: env
      replacement: '测试'
    - sourceLabels: []
      targetLabel: cluster
      replacement: '华南1b测试'
    - replacement: k8s-test
      sourceLabels: []
      targetLabel: type
    - replacement: huanan1b-sc-test
      sourceLabels: []
      targetLabel: from
    - replacement: prometheus-k8s-0
      sourceLabels: []
      targetLabel: prometheus_replica
  selector:
    matchLabels:
      prometheus: k8s

6.cadvisor指标自定义标签配置

代码语言:txt
复制
remote_write:
  - url: "http://remote-write-service:9090/api/v1/write"
    write_relabel_configs:
      - source_labels: ["__name__"]
        regex: "my_metric|another_metric|yet_another_metric"
        action: keep

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1 .版本要求
  • 2. 最小化安装说明
  • 3.告警模块配置(alertmanager-secret.yaml)
  • 4.告警规则配置(prometheus-rules.yaml)
  • 5.k8s常用指标自定义标签配置
  • 6.cadvisor指标自定义标签配置
相关产品与服务
容器服务
腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档