graph TD A[控制端] -->|SSH/API| B[被控服务器] B --> C[Ansible] B --> D[Docker] B --> E[Prometheus] B --> F[ELK Stack]
核心任务:建立基础运维框架
# 安装核心工具
sudo apt-get install -y ansible python3-pip git
pip3 install docker-compose
# 创建Ansible清单文件
mkdir -p /opt/automation/inventories
echo "[single_server]
192.168.1.100 ansible_user=root" > /opt/automation/inventories/hosts# docker-compose-monitor.yml
version: '3'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
node_exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
pid: "host"# app.py
from flask import Flask, render_template
import subprocess
app = Flask(__name__)
@app.route('/deploy/<service>')
def deploy_service(service):
result = subprocess.run(
f"ansible-playbook -i inventories/hosts deploy_{service}.yml",
shell=True,
capture_output=True
)
return f"Deployment output:\n{result.stdout.decode()}"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)# security_hardening.yml
- name: Apply security updates
apt:
upgrade: dist
update_cache: yes
- name: Configure firewall
ufw:
rule: allow
port: "{{ item }}"
loop:
- 22
- 80
- 443
- name: Disable root SSH login
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^PermitRootLogin'
line: 'PermitRootLogin no'
notify: restart sshd
handlers:
- name: restart sshd
service:
name: sshd
state: restarted# prometheus/rules.yml
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"graph LR A[单机自动化] --> B[容器编排] B --> C[混合云管理] C --> D[AIOps] style A fill:#f9f,stroke:#333 style D fill:#ccf,stroke:#f66
密钥管理方案:
# 使用Ansible Vault加密敏感数据
ansible-vault encrypt_string 'super_secret' --name 'db_password'访问控制矩阵:
# RBAC示例
def check_permission(user, action):
permissions = {
'admin': ['deploy', 'restart', 'config'],
'developer': ['deploy', 'logs'],
'guest': ['view']
}
return action in permissions.get(user.role, [])建议从Ansible和Docker入手,逐步扩展监控和日志系统。单服务器环境是学习自动化运维的绝佳实验平台,关键要建立规范的运维流程体系。后续扩展集群时,重点注意配置管理的标准化和服务的无状态化改造。