
整理一份Grafana项目实战案例教程:对接内部监控系统。 为有相关需求的技术人员提供一个思路和一个技术指导。希望能给大家带来帮助。
为企业内部监控系统提供统一的可视化与告警解决方案
[内部监控系统] --> [数据适配层] --> [Grafana数据源]
│ │
├─[API接口] ├─[ETL工具]
└─[数据库直连] └─[消息队列] 内部系统类型 | 推荐对接方案 | 适用场景 |
|---|---|---|
关系型数据库 | Grafana SQL数据源 + 定时查询 | MySQL/Oracle存储的指标 |
HTTP API服务 | Grafana Infinity插件 + 自定义脚本 | 自研监控系统REST接口 |
日志文件 | Filebeat + InfluxDB | 传统日志型监控数据 |
二进制协议 | Telegraf自定义插件开发 | 工业协议/私有协议 |
内部系统A:使用MySQL存储监控指标(表结构如下)
CREATE TABLE metrics (
id INT PRIMARY KEY,
service_name VARCHAR(50),
metric_name VARCHAR(50),
value FLOAT,
timestamp DATETIME
); 内部系统B:提供REST API返回JSON格式监控数据
步骤1:配置MySQL数据源
# grafana.ini 配置段
[database]
type = mysql
host = 192.168.1.100:3306
name = monitor_db
user = grafana
password = xxxxxx 步骤2:编写跨服务查询SQL
-- 服务成功率面板
SELECT
UNIX_TIMESTAMP(timestamp) as time_sec,
service_name,
AVG(CASE WHEN metric_name='success_rate' THEN value END) as success_rate
FROM metrics
WHERE timestamp >= $__timeFrom()
AND timestamp <= $__timeTo()
GROUP BY service_name, UNIX_TIMESTAMP(timestamp) DIV 300步骤3:设置查询缓存
# 数据源高级配置
jsonData:
cacheDuration: "5m"
timeInterval: "2m" 使用Infinity插件配置
// 数据源配置示例
{
"type": "json",
"url": "http://internal-monitor/api/v1/metrics",
"root_selector": "$.data.items[*]",
"columns": [
{"selector": "$.timestamp", "type": "timestamp"},
{"selector": "$.service", "type": "string"},
{"selector": "$.latency", "type": "number"}
]
} 带认证的API请求
secureJsonData:
httpHeaderValue1: "Bearer ${API_TOKEN}"方案:Kafka + 流处理
# 数据转换脚本示例(Flink Job)
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
t_env = StreamTableEnvironment.create(env)
# 从Kafka读取原始数据
t_env.execute_sql("""
CREATE TABLE input_metrics (
raw_data STRING
) WITH (
'connector' = 'kafka',
'topic' = 'internal_metrics',
'properties.bootstrap.servers' = 'kafka:9092',
'format' = 'raw'
)
""")
# 转换逻辑
t_env.execute_sql("""
CREATE TABLE output_metrics (
ts TIMESTAMP(3),
service STRING,
value DOUBLE
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://mysql:3306/monitor_db',
'table-name' = 'metrics'
)
INSERT INTO output_metrics
SELECT
TO_TIMESTAMP(JSON_VALUE(raw_data, '$.timestamp')),
JSON_VALUE(raw_data, '$.service'),
CAST(JSON_VALUE(raw_data, '$.value') AS DOUBLE)
FROM input_metrics
""") 核心面板类型:
模板变量配置:
-- 服务名称变量
SELECT DISTINCT service_name FROM metrics ORDER BY service_name 业务指标告警:
SELECT
service_name,
COUNT(*) as error_count
FROM metrics
WHERE metric_name = 'http_errors'
AND timestamp >= NOW() - INTERVAL 5 MINUTE
GROUP BY service_name
HAVING error_count > 10 通知渠道集成:
# 企业微信机器人配置
alerting:
notifiers:
- name: wecom-alert
type: webhook
settings:
url: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx"
httpMethod: "POST"
contentType: "application/json"
message: |-
{
"msgtype": "markdown",
"markdown": {
"content": "**Grafana告警**\n>状态: ${STATUS}\n>名称: ${ALERT_NAME}"
}
} 物化视图预计算:
CREATE MATERIALIZED VIEW service_summary
ENGINE = AggregatingMergeTree()
ORDER BY (service, timestamp)
AS SELECT
service_name as service,
toStartOfFiveMinute(timestamp) as timestamp,
sumState(value) AS total_errors
FROM metrics
WHERE metric_name = 'errors'
GROUP BY service, timestamp Grafana查询改写:
SELECT
timestamp,
service,
sumMerge(total_errors) as errors
FROM service_summary
GROUP BY service, timestamp graph LR
A[浏览器缓存] --> B[Grafana结果缓存]
B --> C[数据库查询缓存]
C --> D[物化视图] LDAP配置示例:
[auth.ldap]
enabled = true
config_file = /etc/grafana/ldap.toml
# ldap.toml
[[servers]]
host = "ldap.corp.com"
port = 636
use_ssl = true
bind_dn = "cn=grafana,ou=system,dc=corp,dc=com"
bind_password = "****"
search_filter = "(sAMAccountName=%s)"
search_base_dns = ["ou=users,dc=corp,dc=com"] 行级安全策略:
-- 使用Grafana Enterprise功能
CREATE DATABASE POLICY filter_team
ON metrics
USING (team_id = CURRENT_USER_TEAM()) 症状 | 排查步骤 | 工具命令 |
|---|---|---|
数据延迟超过阈值 | 1. 检查ETL日志 2. 验证Kafka积压量 | kafka-consumer-groups |
面板显示"无数据" | 1. 检查数据源连通性 2. 验证SQL时间区间 | curl -X POST datasource/query |
告警通知未触发 | 1. 检查Alert规则评估结果 2. 验证Webhook可达性 | grafana-cli alerts test-rule |
// 在仪表盘显示资产信息
{
"datasource": "CMDB_API",
"query": "GET /assets?env=${ENV}",
"display": "table"
} 
**处理步骤**:
1. 检查服务日志:`kubectl logs ${POD}`
2. 验证依赖服务状态 项目交付物清单:
通过本方案,某金融企业成功整合5套独立监控系统,查询性能提升6倍,告警响应时间缩短至30秒内。建议生产环境采用蓝绿部署方式逐步迁移。
本篇的分享就到这里了,感谢观看,如果对你有帮助,别忘了点赞+收藏+关注。