《Prometheus监控实战》第9章日志监控

yeedomliu

发布于 2019-12-19 16:36:23

12.7K3

发布于 2019-12-19 16:36:23

文章被收录于专栏：yeedomliu

第9章日志监控

虽然我们的主机、服务和应用程序可以生成关键指标和事件，但它们也会生成日志，这些日志可以告诉我们其状态的有用信息
特别是对于没有设置监控或者不容易进行监控的遗留应用程序，有时重写、修补或重构该应用程序以暴露内部状态的成本绝对不是一项有利的工程投资，或者还可能存在监控上的技术限制。但是你仍然需要了解应用程序内部发生的情况，最简单的方法之一是调整日志输出
提示：另一种方法是使用Process exporter查看/proc子系统的内容（https://github.com/ncabatoff/process-exporter）

docker run -d --rm -p 9256:9256 --privileged -v /proc:/host/proc -v `pwd`:/config ncabatoff/process-exporter --procfs /host/proc -config.path /config/filename.yml

9.1 日志处理

为了从日志条目中提取数据，我们将使用日志处理工具。处理工具有很多种选择，包括Grok Exporter（https://github.com/fstab/grok_exporter）和名为mtail的Google实用程序（https://github.com/google/mtail）。我们选择mtail，因为它更轻巧，也更受欢迎
提示：你是否安装了Logstash或者ELK？目前它们无法直接输出到Prometheus，但你可以使用Logstash的指标过滤器来创建指标并将其直接输出到Alertmanager（https://github.com/wtliuNA/logstash-output-prometheus）

9.2 mtail简介

mtail日志处理器是由Google的SRE人员编写的，其采用Apache 2.0许可证，并且使用Go语言。mtail日志处理器专门用于从应用程序日志中提取要导出到时间序列数据库中的指标
mtail日志处理器通过运行“程序”（program）来工作，它定义了日志匹配模式，并且指定了匹配后要创建和操作的指标。它与Prometheus配合得很好，可以暴露任何要抓取的指标，也可以配置为将指标发送到collectd、StatsD或Graphite等工具

9.2.1 安装mtail

代码清单：下载并安装mtail二进制文件

wget https://github.com/google/mtail/releases/download/v3.0.0-rc33/mtail_v3.0.0-rc33_linux_amd64 -0 mtail
chmod 0755 mtail
sudo cp mtail /usr/local/bin

代码清单：运行mtail二进制文件

mtail --version

9.2.2 使用mtail

sudo mkdir /etc/mtail

代码清单：创建line_count.mtail程序

sudo touch /etc/mtail/line_count.mtail

代码清单：编辑line_count.mtail程序

counter line_count

/$/ {
  line_count++
}

我们定义了一个名为line_count的计数器。计数器名称以counter为前缀（自然地，测量型以gauge为前缀）。这让计数和测量通过mtail导出到你定义的任何目的地
我们定义mtail程序的内容：匹配的条件和采取的操作；首先指定条件，然后执行以下操作，包含在{}中
你可以在程序中指定多组条件和操作，也可以使用条件逻辑以else子句的形式扩展（https://github.com/google/mtail/blob/master/docs/Language.md）

/foo/ {
  ACTION1
} else {
  ACTION2
}

9.2.3 运行mtail

代码清单：运行mtail

sudo mtail --progs /etc/mtail --logs '/var/log/*.log'

第一个参数--progs告诉mtail在哪里找到我们的程序，第二个参数--logs告诉mtail在哪里找到要解析的日志文件。我们使用glob模式（https://godoc.org/path/filepath#Match）来匹配/var/log目录中的所有日志文件。你可以指定以逗号分隔的文件列表，也可以多次指定--logs参数

func Match
func Match(pattern, name string) (matched bool, err error)
Match reports whether name matches the shell file name pattern. The pattern syntax is:

pattern:
    { term }
term:
    '*'         matches any sequence of non-Separator characters
    '?'         matches any single non-Separator character
    '[' [ '^' ] { character-range } ']'
                character class (must be non-empty)
    c           matches character c (c != '*', '?', '\\', '[')
    '\\' c      matches character c

character-range:
    c           matches character c (c != '\\', '-', ']')
    '\\' c      matches character c
    lo '-' hi   matches character c for lo <= c <= hi
Match requires pattern to match all of name, not just a substring. The only possible returned error is ErrBadPattern, when pattern is malformed.

On Windows, escaping is disabled. Instead, '\\' is treated as path separator.

注意：运行mtail的用户需要针对正在解析的日志文件的权限，否则mtail将无法读取文件。当无法读取文件时，你将在使用--logtostderr参数获得的mtail日志输出中看到读取错误
它将在端口3903上启动Web服务器（可以使用--address和--port参数来设置IP地址和端口）。浏览一下这个Web服务器，根路径下会显示一些诊断信息

提示：你还可以将指标发送到StatsD和Graphite等工具
代码清单：mtail的/metrics路径

可以将--emit_prog_label参数设置为false来省略此标签

9.3 处理Web服务器访问日志

使用mtail从Apache访问日志中撮一些指标，特别是使用combined日志格式的指标
代码清单：apache_combined程序

# Parser for the common apache "NCSA extended/combined" log format
# LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
counter apache_http_requests_total by request_method, http_version, request_status
counter apache_http_bytes_total by request_method, http_version, request_status

/^/ +
/(?P<hostname>[0-9A-Za-z\.:-]+) / + # %h
/(?P<remote_logname>[0-9A-Za-z-]+) / + # %l
/(?P<remote_username>[0-9A-Za-z-]+) / + # %u
/\[(?P<timestamp>\d{2}\/\w{3}\/\d{4}:\d{2}:\d{2}:\d{2} (\+|-)\d{4})\] / + # %u
/"(?P<request_method>[A-Z]+) (?P<URI>\S+) (?P<http_version>HTTP\/[0-9\.]+)" / + # \"%r\"
/(?P<request_status>\d{3}) / + # %>s
/((?P<response_size>\d+)|-) / + # %b
/"(?P<referer>\S+)" / + # \"%{Referer}i\"
/"(?P<user_agent>[[:print:]]+)"/ + # \"%{User-agent}i\"
/$/ {
  strptime($timestamp, "02/Jan/2006:15:04:05 -0700") # for tests

  apache_http_requests_total[$request_method][$http_version][$request_status]++
  $response_size > 0 {
      apache_http_bytes_total[$request_method][$http_version][$request_status] += $response_size
  }
}

大量其他示例程序：https://github.com/google/mtail/tree/master/examples
定义了两个计数器

counter apache_http_requests_total by request_method, http_version, request_status
counter apache_http_bytes_total by request_method, http_version, request_status

by运算符指定要添加到指标的其他维度。在第一个计数器apache_http_requests_total中，我们添加了request_method、http_version、request_status的附加维度，这些维度将作为标签添加到结果计数器中
提示：在解析复杂的日志行时，这些正则表达式也会变得非常复杂，因此mtail还允许你通过将正则天工定义为常量来重用它们

Constant pattern fragments
To re-use parts of regular expressions, you can assign them to a const identifier:

const PREFIX /^\w+\W+\d+ /

PREFIX {
  ACTION1
}

PREFIX + /foo/ {
  ACTION2
}
In this example, ACTION1 is done for every line that starts with the prefix regex, and ACTION2 is done for the subset of those lines that also contain 'foo'.

Pattern fragments like this don't need to be prefixes, they can be anywhere in the expression.

counter maybe_ipv4

const IPv4 /(?P<ip>\d+\.\d+\.\d+\.\d+)/

/something with an / + IPv4 + / address/ {
  maybe_ipv4++
}

在这些正则表达式中，你可以看到一系列捕获，如下所示

(?P<request_status>\d{3})

这些是命名捕获组（named capture group）。在示例中，我们捕获request_status的命名值，然后可以在操作中使用这些捕获

Capture Groups
Regular expressions in patterns can contain capture groups -- subexpressions wrapped in parentheses. These can be referred to in the action block to extract data from the line being matched.

For example, part of a program that can extract from rsyncd logs may want to break down transfers by operation and module.

counter transfers_total by operation, module

/(?P<operation>\S+) (\S+) \[\S+\] (\S+) \(\S*\) \S+ (?P<bytes>\d+)/ {
  transfers_total[$operation][$3]++
}
Or, the value of the counter can be increased by the value of a capture group:

counter bytes_total by operation, module

/(?P<operation>\S+) (\S+) \[\S+\] (\S+) \(\S*\) \S+ (?P<bytes>\d+)/ {
  bytes_total[$operation][$3] += $bytes
}
Numeric capture groups address subexpressions in the match result as you might expect from regular expression groups in other languages, like awk and perl -- e.g. the expression $3 refers to the third capture group in the regular expression.

Named capture groups can be referred to by their name as indicated in the regular expression using the ?P<name> notation, as popularised by the Python regular expression library -- e.g. $bytes refers to (?P<bytes>\d+) in the examples above.

Capture groups can be used in the same expression that defines them, for example in this expression that matches and produces $x, then compares against that value.

/(?P<x>\d+)/ && $x > 1 {
  nonzero_positives++
}

这些是命名捕获组（named capture group）。在示例中，我们捕获request_status的命名值，然后可以在操作中使用这些捕获
代码清单：combined访问日志操作

{
  apache_http_requests_total[$request_method][$http_version][$request_status]++
  apache_http_bytes_total[$request_method][$http_version][$request_status] += $response_size
}

操作会递增第一个计数器apache_http_requests_total，将一些前缀为$的捕获添加到计数器中作为维度。每个维度都包含在[]方括号中
第二个计数器有一个加法运算，使用+=运算符将每个新的响应大小（以字节为单位）添加到计数器
如果我们再次运行mtail，这次加载一些Apache（或其他使用combined日志格式的Web服务器），那么会看到这些新生成的指标
代码清单：运行mtail

sudo mtail --progs /etc/mtail --logs '/var/log/apache/*.access'

然后浏览/metrics路径
代码清单：Apache combined指标

可以看到一组新的计数器，每个方法都有一个计数器和HTTP响应代码维度。我们还可以执行更复杂的操作，例如构建直方图

9.4 解析Rails日志到直方图

代码清单：rails程序（https://github.com/google/mtail/blob/master/examples/rails.mtail）

counter rails_requests_started_total
counter rails_requests_started by verb

counter rails_requests_completed_total
counter rails_requests_completed by status

histogram rails_requests_completed_seconds by status buckets 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 15.0

/^Started (?P<verb>[A-Z]+) .*/ {
  ###
  # Started HTTP requests by verb (GET, POST, etc.)
  #
  rails_requests_started_total++
  rails_requests_started[$verb]++
}

/^Completed (?P<status>\d{3}) .+ in (?P<request_seconds>\d+)ms .*$/ {
  ###
  # Total numer of completed requests by status
  #
  rails_requests_completed_total++
  rails_requests_completed[$status]++

  ###
  # Completed requests by status with histogram buckets
  #
  # These statements "fall through", so the histogram is cumulative.  The
  # collecting system can compute the percentile bands by taking the ratio of
  # each bucket value over the final bucket.

  rails_requests_completed_seconds[$status] = $request_seconds / 1000.0
}

首先定义已启动、已完成的请求计数器；然后看到一个条件和操作；接下来计算完成的请求；我们捕获状态码和请求时间，并使用这些数据来计算按状态创建请求时间和请求计数的总和
代码清单：Rails mtail指标输出

可以看到，针对不同请求方法和总数的计数器，以及对已完成请求的总烽和按状态码请求的总数的统计

9.5 部署mtail

我们现在有了两个mtail程序，可以通过多种方式部署它们。我们建议为每个应用程序运行一个mtail实例，并作为依赖项通过配置管理部署在应用程序周围。这种模式通常被称为边车（sidecar）模式，非常适合容器化应用
也可以在一个mtail实例中运行多个程序，但有一点需要注意，mtail会在传递给它的每个日志文件上运行每个程序，这可能会对主机产生性能影响

9.6 抓取mtail端点

我们已经暴露了一些指标，接下来创建一个Prometheus作业抓取它们
代码清单：mtail作业

scrape_configs:
- job_name: 'mtail'
  file_sd_configs:
    - files:
      - targets/mtail/*.json
      refresh_interval: 5m

作业使用基于文件的服务发现方式来定义几个目标，一个Web服务器和一个Rails服务器，两个目标都在端口3903上被抓取
代码清单：工作文件发现

[{
  "targets": [
    "web:3903",
    "rails:3903"
  ]
}]

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-12-15，如有侵权请联系 cloudcommunity@tencent.com 删除

https

本文分享自 yeedomliu 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度

《Prometheus监控实战》第9章日志监控

《Prometheus监控实战》第9章日志监控

第9章日志监控

9.1 日志处理

9.2 mtail简介

9.2.1 安装mtail

9.2.2 使用mtail

9.2.3 运行mtail

9.3 处理Web服务器访问日志

9.4 解析Rails日志到直方图

9.5 部署mtail

9.6 抓取mtail端点

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

《Prometheus监控实战》第9章 日志监控

《Prometheus监控实战》第9章 日志监控

第9章 日志监控

9.1 日志处理

9.2 mtail简介

9.2.1 安装mtail

9.2.2 使用mtail

9.2.3 运行mtail

9.3 处理Web服务器访问日志

9.4 解析Rails日志到直方图

9.5 部署mtail

9.6 抓取mtail端点

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

《Prometheus监控实战》第9章日志监控

《Prometheus监控实战》第9章日志监控

第9章日志监控