随着 Kubernetes 的广泛使用,如何保证集群稳定运行,成为了开发和运维团队关注的焦点。在集群中部署应用时,像忘记配置资源请求或忘记配置限制这样简单的事情可能就会破坏自动伸缩,甚至导致工作负载耗尽资源。这样种种的配置问题常常导致生产中断,为了避免它们我们用 Polaris 来预防。Polaris是fairwinds开发的一款开源的kubernetes集群健康检查组件。通过分析集群中的部署配置,从而发现并避免影响集群稳定性、可靠性、可伸缩性和安全性的配置问题。
Polaris是一款通过分析部署配置,从而发现集群中存在的问题的健康检查组件。当然,Polaris的目标可不仅仅只是发现问题,同时也提供避免问题的解决方案,确保集群处于健康状态。下面将会介绍Polaris的主要功能: Polaris 包含3个组件,分别实现了不同的功能:
Dashboard是polaris提供的可视化工具,可以查看Kubernetes workloads状态的概览以及优化点。也可以按类别、名称空间和工作负载查看。
# kubectl apply -f https://github.com/fairwindsops/polaris/releases/latest/download/dashboard.yaml
# kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80
按类别查看检查结果
按名称空间查看检查结果
polaris dashboard --port 8080 --audit-path=/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx
Polaris可以作为一个admission controller运行,作为一个validating webhook。它接受与仪表板相同的配置,并可以运行相同的验证。这个webhook将拒绝任何触发验证错误的workloads 。这表明了Polaris更大的目标,不仅仅是通过仪表板的可见性来鼓励更好的配置,而是通过这个webhook来实际执行它。Polaris不会修复workloads,只会阻止他们。
在命令行上也可以使用Polaris来审计本地文件或正在运行的集群。这对于在CI/CD管道的基础设施代码上运行Polaris特别有帮助。如果Polaris给出的审计分数低于某个阈值,或者出现任何错误,可使用命令行标志来导致CI/CD失败。
polaris支持kubectl
, helm
and local binary
三种安装方式,本文选择最简单的安装方式,分别介绍三个组件的安装
Helm
添加helm charts仓库
helm repo add reactiveops-stable https://charts.reactiveops.com/stable
更新charts仓库并安装Dashboard组件
helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris
如果需要在本地查看Dashboard仪表盘,可以使用以下命令,进行本地端口转发
kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80
在集群中安装Webhook组件后,将会阻止不符合标准的应用部署在集群中。
helm
添加helm charts仓库
helm repo add reactiveops-stable https://charts.reactiveops.com/stable
更新charts仓库并安装Webhook组件
helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris \
--set webhook.enable=true --set dashboard.enable=false
如果需要在本地测试polaris,可以下载二进制文件安装 releases page,也可以使用 Homebrew安装:
brew tap reactiveops/tap
brew install reactiveops/tap/polaris
polaris --version
使用CLI检查本地配置文件
polaris --audit --audit-path ./deploy/
可以将扫描结果保存到yaml文件中
polaris --audit --output-format yaml > report.yaml
上面简单的介绍了,polaris的安装与基本使用。但是,如果要根据我们项目的实际情况来结合polaris,使用默认配置就不能满足需求了。所以我们还需要知道如何定义polaris检查规则的配置文件,实现自定义配置。 在自定义配置polaris之前,我们需要先了解一下polaris检查的等级以及支持的检查类型。 polaris检查的严重等级分为error
、warning
和ignore
,polaris不会检查ignore
等级的配置项。 polaris支持的检查类型有:Health Checks
、Images
、Networking
、Resources
、Security
,下面我们将一一介绍:
Polaris 支持校验pods中是否存在readiness和liveiness探针
key | default | description |
---|---|---|
readinessProbeMissing | warning | 没有为Pod配置readiness探针时失败 |
livenessProbeMissing | warning | 没有为Pod配置liveness探针时失败 |
tagNotSpecified | danger | 没有为镜像指定tag或者指定tag为latest时失败 |
pullPolicyNotAlways | warning | 当镜像拉取策略不是 always时失败 |
priorityClassNotSet | ignore | 当没有为Pod配置priorityClassName 时失败 |
multipleReplicasForDeployment | ignore | 当Deployment的Replicas为1时失败 |
missingPodDisruptionBudget | ignore |
polaris支持校验内存、cpu使用限制是否配置
key | default | description |
---|---|---|
cpuRequestsMissing | warning | 没有配置 resources.requests.cpu 时失败 |
memoryRequestsMissing | warning | 没有配置 resources.requests.memory 时失败 |
cpuLimitsMissing | warning | 没有配置 resources.limits.cpu 时失败 |
memoryLimitsMissing | warning | 没有配置 resources.limits.memory 时失败 |
对于内存、cpu等资源配置,还可以配置范围检查。只有当配置在指定区间内才可以通过检查。
limits:
type: object
required:
- memory
- cpu
properties:
memory:
type: string
resourceMinimum: 100M
resourceMaximum: 6G
cpu:
type: string
resourceMinimum: 100m
resourceMaximum: "2"
key | default | description |
---|---|---|
hostIPCSet | danger | Fails when hostIPC attribute is configured. |
hostPIDSet | danger | Fails when hostPID attribute is configured. |
notReadOnlyRootFilesystem | warning | Fails when securityContext.readOnlyRootFilesystem is not true. |
privilegeEscalationAllowed | danger | Fails when securityContext.allowPrivilegeEscalation is true. |
runAsRootAllowed | warning | Fails when securityContext.runAsNonRoot is not true. |
runAsPrivileged | danger | Fails when securityContext.privileged is true. |
insecureCapabilities | warning | Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window) |
dangerousCapabilities | danger | Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window) |
hostNetworkSet | warning | Fails when hostNetwork attribute is configured. |
hostPortSet | warning | Fails when hostPort attribute is configured. |
tlsSettingsMissing | warning | Fails when an Ingress lacks TLS settings. |
根据上文的介绍,我们已经可以根据项目的实际情况,定义自己的扫描配置。如果觉得polaris提供的检查规则不满足需求的话,我们还可以自定义检查规则。 比如:我们可以自定义规则检查镜像来源,当镜像来自http://quay.io抛出警告
checks:
imageRegistry: warning
customChecks:
imageRegistry:
successMessage: Image comes from allowed registries
failureMessage: Image should not be from disallowed registry
category: Images
target: Container # target can be "Container" or "Pod"
schema:
'$schema': http://json-schema.org/draft-07/schema
type: object
properties:
image:
type: string
not:
pattern: ^quay.io
也可以指定检查项
checks:
cpuRequestsMissing: danger
memoryRequestsMissing: danger
cpuLimitsMissing: danger
memoryLimitsMissing: danger
polaris audit -c check_config.yaml --.......
{
"PolarisOutputVersion": "1.0",
"AuditTime": "2021-07-01T15:07:00+08:00",
"SourceType": "Path",
"SourceName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
"DisplayName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
"ClusterInfo": {
"Version": "unknown",
"Nodes": 0,
"Pods": 0,
"Namespaces": 0,
"Controllers": 1
},
"Results": [
{
"Name": "nginx-config",
"Namespace": "",
"Kind": "ConfigMap",
"Results": {},
"PodResult": null,
"CreatedTime": "0001-01-01T00:00:00Z"
},
{
"Name": "nginx-deployment",
"Namespace": "",
"Kind": "Deployment",
"Results": {},
"PodResult": {
"Name": "",
"Results": {},
"ContainerResults": [
{
"Name": "nginx",
"Results": {
"cpuLimitsMissing": {
"ID": "cpuLimitsMissing",
"Message": "CPU limits should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
},
"cpuRequestsMissing": {
"ID": "cpuRequestsMissing",
"Message": "CPU requests should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
},
"memoryLimitsMissing": {
"ID": "memoryLimitsMissing",
"Message": "Memory limits should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
},
"memoryRequestsMissing": {
"ID": "memoryRequestsMissing",
"Message": "Memory requests should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
}
}
}
]
},
"CreatedTime": "0001-01-01T00:00:00Z"
}
],
"Score": 0
}
当对一个集群运行Pollaris检查后,返回的结果是json,不够直观,我们使用Python,处理结果后输出到excel表格中,方便查看
import yaml
import os
import xlsxwriter
# config
fileNamePath = os.path.split(os.path.realpath(__file__))[0]
config = os.path.join(fileNamePath,'check_config.yaml')
cluster_config = os.path.join(fileNamePath,'cluster_list.yaml')
# variable
scan_controller_type = ["Deployment", "DaemonSet", "StatefulSet"]
def read_cluster():
f = open(cluster_config,'r',encoding='utf-8')
cont = f.read()
return yaml.load(cont, Loader=yaml.FullLoader)
def generate_report(cluster_id: str):
scan_command = f"polaris audit -c {config} --kubeconfig ~/.kube/config --only-show-failed-tests true --output-file result/{cluster_id}.yaml"
try:
os.system(scan_command)
except Exception as e:
print(e)
def format_data(cluster):
cluster_report = os.path.join(fileNamePath, 'result/{}.yaml'.format(cluster))
f = open(cluster_report, 'r', encoding='utf-8')
cont = f.read()
x = yaml.load(cont, Loader=yaml.FullLoader)
data_result = x["Results"]
data_list = []
for item in data_result:
if item["Kind"] in scan_controller_type and item['PodResult']["ContainerResults"][0]["Results"]:
pod_scan_result = []
for pod_result in item['PodResult']["ContainerResults"]:
pod_name = pod_result["Name"]
pod_scan_result.append([item for item in pod_result["Results"]])
obj = [cluster, item["Kind"], item["Namespace"], item["Name"], pod_name, str(pod_scan_result[0])]
data_list.append(obj)
return data_list
def excel_config(workbook):
column_name = ['ClusterID', 'Kind', 'NameSpace', 'Name', 'PodName', 'Scan Result']
merge_format = workbook.add_format({
'font_size': 22,
'bold': True,
'font_color': '#FFFFFF',
'border': 1,
'font_name':u'苹方-简',
'align': 'center',
'valign': 'vcenter',
'fg_color': '#0174DF'
})
Title_format = workbook.add_format({
'font_size': 18,
'border': 1,
'bold': True,
'align': 'center',
'font_name': u'苹方-简',
'valign': 'vcenter',
})
data_format = workbook.add_format({
'font_size': 16,
'border': 1,
'align': 'center',
'font_name': u'苹方-简',
'valign': 'vcenter',
})
return column_name, merge_format, Title_format, data_format
def generate_excel():
workbook = xlsxwriter.Workbook("scan_result.xlsx")
column_name, merge_format, Title_format, data_format = excel_config(workbook)
for cluster in read_cluster()["clusters"]:
print(f"Scan cluster start: {cluster}")
generate_report(cluster)
worksheet = workbook.add_worksheet(cluster)
worksheet.merge_range('A1:F1', f'集群 {cluster} Requests/Limits 扫描结果', merge_format)
worksheet.set_column('A:F', 35)
worksheet.set_column('F:F', 130)
worksheet.set_row(0, 50)
global ECSNUM
ECSNUM = 3
scan_result = format_data(cluster)
if scan_result != None:
worksheet.write_row('A2', column_name, Title_format)
# 如果结不为空,则代表有资源,则写入数据
for item in scan_result:
worksheet.write_row('A' + str(ECSNUM), item, data_format)
ECSNUM += 1
# 否则,代表该地域无资源,写入 NULL
else:
worksheet.merge_range('A3:F3', 'NOT Found INFO', data_format)
workbook.close()
if __name__ == '__main__':
generate_excel()
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有