
在当今的IT基础设施管理中,自动化已成为确保效率、一致性和可靠性的关键要素。Ansible作为一款开源的自动化工具,以其无客户端架构和简单易用的语法在自动化领域占据重要地位。本文将深入探讨在Ubuntu Server环境下Ansible的深度实践,涵盖从基础概念到高级技巧的全面内容,为系统管理员和DevOps工程师提供实用的指导。
Ansible基于Python开发,主打“轻量、无客户端、易上手”的特点,可实现批量命令执行、配置管理、应用部署等功能。与SaltStack、Puppet等工具相比,Ansible的核心优势包括:
Ansible的工作机制基于以下几个核心组件:
当Ansible执行时,其基本流程如下:管理端通过Inventory识别被管理节点,将Modules通过SSH推送到目标主机,执行任务后结果通过SSH回传,任务完成后模块自动删除,无后台进程残留。
Ubuntu Server作为流行的Linux发行版,与Ansible结合具有显著优势:
在Ubuntu Server上安装Ansible有多种方法,以下是常用的安装方式:
# 更新软件包索引
sudo apt update
# 安装软件属性通用包(用于添加PPA)
sudo apt install -y software-properties-common
# 添加Ansible官方PPA
sudo apt-add-repository -y ppa:ansible/ansible
# 安装Ansible
sudo apt install -y ansible# 安装Python和pip
sudo apt install -y python3 python3-pip
# 通过pip安装Ansible
pip3 install ansible# 检查Ansible版本
ansible --version
# 测试本地连接
ansible localhost -m pingAnsible依赖SSH进行通信,配置免密登录是高效使用Ansible的前提:
# 生成SSH密钥对(如果尚未生成)
ssh-keygen -t rsa -b 4096 -C "ansible@example.com"
# 将公钥复制到目标主机
ssh-copy-id username@target_host
# 对于多台主机,可以使用循环操作
for host in host1 host2 host3; do
ssh-copy-id username@$host
doneAnsible的Inventory文件定义管理的主机及其分组,默认位置为/etc/ansible/hosts:
# 简单主机定义
[webservers]
web1.example.com
web2.example.com ansible_port=2222 # 非标准SSH端口
[dbservers]
db1.example.com
db2.example.com
# 分组嵌套
[production:children]
webservers
dbservers
# 变量定义
[all:vars]
ansible_user=admin
ansible_ssh_private_key_file=~/.ssh/ansible_key通过修改ansible.cfg文件可以优化Ansible的性能和行为:
[defaults]
# 提高并行执行速度
forks = 50
# 禁用SSH主机密钥检查
host_key_checking = False
# 设置超时时间
timeout = 30
# 优化SSH连接
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
# 设置回调插件显示执行进度
stdout_callback = yaml
[privilege_escalation]
# 特权升级配置
become = True
become_method = sudo
become_user = root
become_ask_pass = Falsecommand模块是Ansible的默认模块,用于在远程主机执行简单命令:
- name: 检查系统负载
command: uptime
- name: 查看目录内容(先切换目录)
command: ls -l
args:
chdir: /opt
- name: 检查文件是否存在
command: ls /path/to/file
register: file_exists
ignore_errors: yesshell模块支持管道、重定向等Shell特性,适合复杂操作:
- name: 批量修改用户密码
shell: echo "newpassword" | passwd --stdin username
- name: 统计NGINX进程数量
shell: ps aux | grep nginx | grep -v grep | wc -l
register: nginx_process_count
- name: 解析网络配置
shell: ifconfig eth0 | awk '/inet / {print $2}'
register: ip_addressraw模块用于在不具备Python环境的目标主机上执行命令:
- name: 安装Python(用于初始设置)
raw: apt-get install -y python3copy模块用于将文件从管理节点复制到远程主机:
- name: 复制配置文件
copy:
src: files/nginx.conf
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
backup: yes # 如果目标文件存在,则备份
- name: 使用变量定义内容
copy:
content: |
ServerName {{ server_name }}
Listen {{ http_port }}
dest: /etc/apache2/conf.d/custom.conftemplate模块基于Jinja2模板生成文件并复制到远程主机:
- name: 生成动态配置文件
template:
src: templates/database.conf.j2
dest: /etc/app/database.conf
mode: '0644'对应的模板文件templates/database.conf.j2:
# 数据库配置
host: {{ db_host }}
port: {{ db_port | default(5432) }}
username: {{ db_user }}
password: {{ db_password }}
# 连接池设置
max_connections: {{ max_connections | default(100) }}
{% if use_ssl %}
ssl: true
ssl_ca: /etc/ssl/certs/ca-certificates.crt
{% endif %}file模块管理文件、目录和符号链接的属性:
- name: 创建目录
file:
path: /opt/app/logs
state: directory
owner: appuser
group: appgroup
mode: '0755'
- name: 创建符号链接
file:
src: /opt/app/current
dest: /opt/app/releases/v1.0
state: link
- name: 修改文件权限
file:
path: /opt/app/script.sh
mode: '0755'
owner: rootapt模块管理Debian/Ubuntu系统的软件包:
- name: 更新软件包缓存
apt:
update_cache: yes
cache_valid_time: 3600 # 缓存有效时间(秒)
- name: 安装NGINX
apt:
name: nginx
state: present # 亦可使用latest安装最新版本
- name: 安装多个软件包
apt:
name:
- nginx
- postgresql
- python3-pip
state: present
- name: 移除软件包
apt:
name: apache2
state: absent
- name: 安装特定版本的软件包
apt:
name: docker.io=19.03.8-0ubuntu1
state: presentsnap模块管理Ubuntu的Snap软件包:
- name: 安装Snap软件包
snap:
name: code
classic: yes # 对于需要经典模式的Snapservice模块管理系统服务的状态:
- name: 启动并启用NGINX服务
service:
name: nginx
state: started
enabled: yes
- name: 重启服务
service:
name: apache2
state: restarted
- name: 重新加载服务配置
service:
name: nginx
state: reloaded
- name: 检查服务状态
service:
name: nginx
register: nginx_statususer模块管理系统用户账户:
- name: 创建普通用户
user:
name: appuser
comment: "Application User"
shell: /bin/bash
groups: users,admin
append: yes # 将用户追加到补充组
- name: 创建系统用户
user:
name: mysql
system: yes
home: /var/lib/mysql
shell: /bin/false
- name: 管理用户SSH密钥
user:
name: deployuser
ssh_key_file: .ssh/authorized_keys
ssh_key: "{{ lookup('file', 'keys/deploy.pub') }}"group模块管理系统用户组:
- name: 创建用户组
group:
name: developers
state: present
gid: 2001
- name: 创建系统组
group:
name: servicegroup
system: yes良好的Playbook结构是维护大型自动化项目的基础:
project/
├── inventories/ # 环境清单目录
│ ├── production/
│ ├── staging/
│ └── development/
├── group_vars/ # 组变量
│ ├── all/
│ ├── webservers/
│ └── dbservers/
├── host_vars/ # 主机变量
├── roles/ # Ansible角色
│ ├── common/
│ ├── nginx/
│ └── database/
├── library/ # 自定义模块
├── filter_plugins/ # 自定义过滤器
├── site.yml # 主Playbook
├── webservers.yml # 特定功能的Playbook
└── requirements.yml # 角色依赖Ansible变量遵循特定的优先级顺序,理解这一点对于避免意外行为至关重要:
对于敏感数据(如密码、API密钥),使用Ansible Vault进行加密:
# 创建加密文件
ansible-vault create secrets.yml
# 编辑加密文件
ansible-vault edit secrets.yml
# 在Playbook中使用加密变量
ansible-playbook site.yml --ask-vault-pass --extra-vars="@secrets.yml"加密的变量文件内容示例:
---
db_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
36636563386532343264376336383865333236343761643839373431623164623163363164613861
6431623063303539373736626136623038623662663166650a306465646161633266653533323834
63366530646434666262396130343532343531613531316638376335386534316535303435376335- name: 尝试启动可能失败的服务
service:
name: unstable-service
state: started
ignore_errors: yes # 即使任务失败也继续执行
register: service_result
- name: 检查服务状态
debug:
msg: "服务启动失败,需要手动干预"
when: service_result is failed- name: 等待服务就绪(最多重试5次)
uri:
url: http://localhost:8080/health
method: GET
register: response
until: response.status == 200
retries: 5
delay: 10 # 每次重试间隔10秒- name: 仅在生产环境安装监控代理
apt:
name: monitoring-agent
state: present
when: inventory_hostname in groups['production']
- name: 根据操作系统执行不同任务
block:
- name: Ubuntu系统更新
apt:
upgrade: dist
when: ansible_os_family == "Debian"
- name: CentOS系统更新
yum:
name: '*'
state: latest
when: ansible_os_family == "RedHat"- name: 配置数据库连接
block:
- name: 检查数据库连通性
postgresql_ping:
login_host: "{{ db_host }}"
login_user: "{{ db_user }}"
login_password: "{{ db_password }}"
- name: 创建数据库
postgresql_db:
name: "{{ app_db }}"
rescue:
- name: 数据库连接失败处理
debug:
msg: "数据库连接失败,请检查配置"
- name: 发送告警通知
mail:
subject: "数据库配置失败"
body: "在主机 {{ inventory_hostname }} 上配置数据库失败"
to: "admin@example.com"
always:
- name: 记录执行日志
debug:
msg: "数据库配置流程执行完成"Ansible角色提供了一种将相关任务、变量和文件组织在一起的标准化方法。一个完整的角色通常包含以下目录结构:
roles/
└── nginx/ # 角色名称
├── defaults/ # 默认变量(最低优先级)
│ └── main.yml
├── vars/ # 角色变量(高优先级)
│ └── main.yml
├── tasks/ # 任务定义
│ └── main.yml
├── handlers/ # 处理器
│ └── main.yml
├── templates/ # Jinja2模板
│ └── nginx.conf.j2
├── files/ # 静态文件
│ └── custom.conf
├── meta/ # 角色依赖关系
│ └── main.yml
└── README.md # 角色文档在roles/nginx/tasks/main.yml中定义角色的主要任务:
---
- name: 安装NGINX
apt:
name: nginx
state: present
tags: install
- name: 配置NGINX
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
backup: yes
notify: restart nginx
tags: config
- name: 启用站点配置
copy:
src: custom.conf
dest: /etc/nginx/sites-available/custom
tags: config
- name: 激活站点
file:
src: /etc/nginx/sites-available/custom
dest: /etc/nginx/sites-enabled/custom
state: link
tags: config
- name: 确保NGINX运行
service:
name: nginx
state: started
enabled: yes
tags: service在roles/nginx/handlers/main.yml中定义角色的处理器:
---
- name: restart nginx
service:
name: nginx
state: restarted
- name: reload nginx
service:
name: nginx
state: reloaded在roles/nginx/defaults/main.yml中定义角色的默认变量:
---
# NGINX工作进程数
nginx_worker_processes: "{{ ansible_processor_vcpus }}"
# NGINX连接限制
nginx_worker_connections: 1024
# 虚拟主机配置
nginx_servers:
- name: default
port: 80
document_root: /var/www/html在roles/nginx/meta/main.yml中定义角色的依赖关系:
---
dependencies:
- role: common
vars:
setup_firewall: true
- role: ssl
when: enable_ssl | default(false)Ansible Galaxy是一个共享和获取Ansible角色的平台,可以大大提高工作效率:
# 搜索相关角色
ansible-galaxy search nginx
# 安装角色
ansible-galaxy install geerlingguy.nginx
# 使用requirements.yml文件管理角色依赖
cat requirements.yml# requirements.yml 示例
- src: geerlingguy.nginx
version: 3.0.0
- src: geerlingguy.mysql
version: 3.3.0
- src: https://github.com/example/custom-role.git
version: main
name: custom-role# ansible.cfg 优化配置
[defaults]
# 增加并行进程数
forks = 100
# 禁用事实收集(如不需要)
gather_facts: false
# 设置SSH管道提升性能
pipelining = true
# 禁用主机密钥检查
host_key_checking = False
[ssh_connection]
# 启用SSH控制持久化
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r
control_master = auto
control_persist = 60s对于执行时间较长的任务,使用异步模式避免超时:
- name: 执行长时间运行的任务
command: /opt/app/long-running-script.sh
async: 1800 # 最大运行时间(秒)
poll: 30 # 检查间隔(秒)
register: async_result
- name: 检查异步任务状态
async_status:
jid: "{{ async_result.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 30
delay: 30对于云环境或动态基础设施,使用动态Inventory自动发现主机:
# inventory/aws_ec2.yml
plugin: aws_ec2
regions:
- us-east-1
- us-west-2
filters:
tag:Environment: production
instance-state-name: running
compose:
ansible_user: "ec2-user"
keyed_groups:
- key: tags.Role
prefix: role_
- key: tags.Environment
prefix: env_#!/usr/bin/env python3
"""
自定义动态Inventory脚本
"""
import json
import requests
def main():
# 从CMDB API获取主机列表
response = requests.get('https://cmdb.example.com/api/hosts')
hosts = response.json()
inventory = {
'web': {
'hosts': [],
'vars': {
'ansible_user': 'webadmin'
}
},
'db': {
'hosts': [],
'vars': {
'ansible_user': 'dbadmin'
}
},
'_meta': {
'hostvars': {}
}
}
for host in hosts:
group = host['type'] # web, db等
hostname = host['hostname']
inventory[group]['hosts'].append(hostname)
inventory['_meta']['hostvars'][hostname] = {
'ansible_host': host['ip_address'],
'environment': host['environment']
}
print(json.dumps(inventory))
if __name__ == '__main__':
main()# 使用free策略提高并行度
- name: 大规模并行更新
hosts: webservers
strategy: free
serial: "20%" # 每次更新20%的主机
tasks:
- name: 部署新版本
copy:
src: /opt/releases/app-v2.0.war
dest: /opt/tomcat/webapps/app.war
- name: 重启应用
service:
name: tomcat
state: restarted# site.yml
- import_playbook: pre-deployment-checks.yml
- import_playbook: deploy-to-canary.yml
- import_playbook: health-checks.yml
- import_playbook: deploy-to-production.yml# deploy-to-canary.yml
- name: 金丝雀部署
hosts: canary_servers
serial: 1 # 一次只部署一台
tasks:
- name: 部署到金丝雀节点
include_role:
name: app_deploy
vars:
app_version: "2.0.0"
- name: 等待服务健康检查
uri:
url: "http://{{ inventory_hostname }}/health"
status_code: 200
register: health_check
until: health_check.status == 200
retries: 10
delay: 10Ubuntu的Snap包管理可以与Ansible无缝集成:
- name: 管理Snap软件包
hosts: ubuntu_servers
tasks:
- name: 安装Snapd
apt:
name: snapd
state: present
- name: 安装Snap软件包
snap:
name:
- certbot
- lxd
state: present
classic: yes # 对于需要经典模式的Snap
- name: 配置Snap服务
systemd:
name: snapd
state: started
enabled: yes- name: 管理PPA仓库
apt_repository:
repo: "ppa:ansible/ansible"
state: present
filename: ansible-ppa # 指定源列表文件名
- name: 设置自动更新配置
copy:
content: |
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
dest: /etc/apt/apt.conf.d/20auto-upgrades
mode: '0644'
- name: 配置无人值守升级
debconf:
name: unattended-upgrades
question: unattended-upgrades/enable_auto_updates
value: true
vtype: boolean结合Ansible和Ubuntu安全特性进行系统加固:
- name: 系统安全加固
hosts: all
become: yes
tasks:
- name: 配置防火墙
ufw:
policy: deny
state: enabled
rule: allow
name: OpenSSH
- name: 配置自动安全更新
copy:
src: files/50unattended-upgrades
dest: /etc/apt/apt.conf.d/50unattended-upgrades
- name: 安装安全更新
apt:
upgrade: safe
update_cache: yes
cache_valid_time: 3600
tags: security
- name: 配置审计规则
copy:
src: files/audit.rules
dest: /etc/audit/rules.d/ansible.rules
notify: restart auditd
- name: 启用审计服务
systemd:
name: auditd
state: started
enabled: yes在Ubuntu 24.04及以上版本中,可能会遇到特权升级问题,需要特别配置:
- name: 在Ubuntu 24.04上处理特权升级
hosts: ubuntu_servers
become: yes
become_method: sudo
become_user: root # 明确指定特权升级用户
vars:
ansible_become: yes
ansible_become_method: sudo
ansible_become_user: root
tasks:
- name: 配置sudo权限
lineinfile:
path: /etc/sudoers.d/ansible
line: "{{ ansible_user }} ALL=(ALL) NOPASSWD:ALL"
state: present
validate: 'visudo -cf %s'
- name: 确保SSH配置正确
lineinfile:
path: /etc/ssh/sshd_config
line: "PermitRootLogin without-password"
state: present
validate: 'sshd -t -f %s'
notify: restart sshMolecule是测试Ansible角色的专业工具:
# molecule/default/molecule.yml
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: ubuntu-2004
image: ubuntu:20.04
- name: ubuntu-2204
image: ubuntu:22.04
provisioner:
name: ansible
verifier:
name: ansible# molecule/default/tests/test_default.yml
- name: 验证NGINX安装
hosts: all
tasks:
- name: 检查NGINX是否安装
package:
name: nginx
register: nginx_installed
- name: 验证NGINX服务状态
service:
name: nginx
register: nginx_service
- name: 验证NGINX监听端口
wait_for:
port: 80
host: localhost
timeout: 10# tests/test_nginx.py
import pytest
def test_nginx_installed(host):
nginx = host.package("nginx")
assert nginx.is_installed
def test_nginx_service(host):
nginx = host.service("nginx")
assert nginx.is_running
assert nginx.is_enabled
def test_nginx_listening(host):
socket = host.socket("tcp://0.0.0.0:80")
assert socket.is_listening将Ansible测试集成到CI/CD流水线中:
# .gitlab-ci.yml
stages:
- test
- deploy
ansible_test:
stage: test
image: python:3.8
before_script:
- pip install ansible molecule docker
script:
- molecule test
production_deploy:
stage: deploy
only:
- main
script:
- ansible-playbook -i inventory/production site.yml以下是一个真实的企业级Web应用部署案例,结合了多角色和复杂逻辑:
# site.yml
- name: 部署企业Web应用
hosts: web_servers
serial: "25%"
vars_files:
- secrets/credentials.yml
tasks:
- name: 包含系统基础配置
include_role:
name: base_setup
- name: 部署Web应用
include_role:
name: web_application
vars:
app_version: "{{ web_app_version }}"
environment: "{{ deployment_env }}"
- name: 运行数据库迁移
include_role:
name: database_migration
when: deployment_env == "production"
- name: 执行健康检查
include_role:
name: health_check瑞穗金融集团通过Ansible自动化平台实现了显著的效率提升。他们使用Ansible自动化所有硬件资源置备,将构建虚拟服务器所需的时间减少了78%,从每十个虚拟服务器77个工作小时减少到仅17个工作小时。
关键成功因素:
使用Ansible自动实施CIS(Center for Internet Security)基准加固:
- name: CIS Ubuntu基准加固
hosts: all
become: yes
tasks:
- name: 安装审计工具
apt:
name: auditd
state: present
- name: 配置密码策略
template:
src: templates/common-password.j2
dest: /etc/pam.d/common-password
backup: yes
- name: 配置SSH安全设置
lineinfile:
path: /etc/ssh/sshd_config
regexp: "^{{ item.regexp }}"
line: "{{ item.line }}"
validate: 'sshd -t -f %s'
with_items:
- regexp: "^#?PermitRootLogin"
line: "PermitRootLogin no"
- regexp: "^#?PasswordAuthentication"
line: "PasswordAuthentication no"
- regexp: "^#?Protocol"
line: "Protocol 2"
notify: restart ssh在Ubuntu 24.04上可能会遇到特权升级超时问题:
解决方案:
# 明确指定become_user
- name: 在Ubuntu 24.04上执行特权任务
hosts: ubuntu_servers
become: yes
become_user: root # 明确指定
vars:
ansible_become: yes
ansible_become_method: sudo
ansible_become_user: root
tasks:
- name: 执行需要特权的任务
command: whoami
register: current_user
- name: 显示当前用户
debug:
msg: "当前用户: {{ current_user.stdout }}"解决不同Python版本兼容性问题:
- name: 设置Python解释器
hosts: all
gather_facts: false # 首次连接时可能无法自动检测Python
tasks:
- name: 设置Python解释器
set_fact:
ansible_python_interpreter: /usr/bin/python3
when: ansible_python is not defined- name: 调试变量和事实
hosts: all
tasks:
- name: 显示所有事实
debug:
var: ansible_facts
when: debug_mode | default(false)
- name: 显示特定变量
debug:
var: my_variable
- name: 调试带消息的输出
debug:
msg: |
主机名: {{ inventory_hostname }}
IP地址: {{ ansible_default_ipv4.address }}
操作系统: {{ ansible_distribution }} {{ ansible_distribution_version }}使用-v、-vv、-vvv参数获取详细执行信息:
# 基本详细输出
ansible-playbook playbook.yml -v
# 更详细的输出(包括SSH细节)
ansible-playbook playbook.yml -vvv
# 特定标签的详细执行
ansible-playbook playbook.yml --tags=config -vv- name: 验证前提条件
hosts: all
tasks:
- name: 检查可用磁盘空间
assert:
that:
- ansible_mounts | selectattr('mount', 'equalto', '/') | map(attribute='size_available') | first > 1073741824
fail_msg: "根分区磁盘空间不足1GB"
success_msg: "磁盘空间检查通过"
- name: 检查内存大小
assert:
that:
- ansible_memtotal_mb > 2048
fail_msg: "系统内存不足2GB"Ansible在Ubuntu Server环境下的深度实践展示了自动化运维的强大能力和灵活性。通过本文的全面介绍,我们深入探讨了从基础安装配置到高级技巧的各个方面,包括:
Ansible的简洁性和强大功能使其成为Ubuntu Server自动化管理的理想选择。随着技术的不断发展,Ansible与Ubuntu的结合将继续为系统管理员和DevOps工程师提供更加高效、可靠的自动化解决方案。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。