
在当今数字化时代,高可用性、高性能的Web服务已成为企业生存和发展的基石。负载均衡技术作为支撑大规模服务架构的核心要素,通过将网络流量智能地分发到多台后端服务器,不仅提升了服务的响应速度和吞吐量,还极大地增强了系统的弹性和可靠性。本文将深入探讨Ubuntu Server环境下负载均衡的深度实践,聚焦于生产环境中的关键技术、架构设计和运维经验。
负载均衡本质上是一种流量分配机制,它通过前置的调度器(Load Balancer)将客户端请求按照特定算法分发到后端多个服务器,以此避免单点故障和资源过载。在实际生产环境中,负载均衡带来的直接价值包括:应用高可用,当单台服务器故障时自动将流量路由到健康节点;横向扩展能力,通过添加更多后端服务器线性提升系统处理能力;会话保持,确保用户连接的一致性体验;安全增强,在架构层面隐藏后端服务器拓扑,并提供统一的安全入口。
从技术演进角度看,负载均衡可分为三个主要层次:四层负载均衡(传输层,基于IP+端口)、七层负载均衡(应用层,基于HTTP/HTTPS等协议)以及混合型负载均衡。四层负载均衡处理TCP/UDP流量,效率极高,适合对延迟敏感的应用;七层负载均衡能解析应用层协议,提供更精细的路由控制和内容优化。
在负载均衡场景下,Ubuntu Server具有显著优势。其长期支持(LTS)版本提供5年的安全更新和维护,保障了生产环境的稳定性。Ubuntu Server的内核实时补丁特性允许在不重启的情况下应用关键安全更新,对于需要高可用的负载均衡环境至关重要。此外,Ubuntu的APT包管理系统提供了丰富且更新及时的负载均衡软件包,如Nginx、HAProxy、Keepalived等,且易于安装和配置。
Ubuntu Server还针对负载均衡场景进行了多项优化,如调整内核参数以提升网络性能,集成监控工具便于运维,以及活跃的社区支持确保问题能够快速解决。这些特性使Ubuntu Server成为部署负载均衡架构的理想选择。
Linux Virtual Server(LVS)作为四层负载均衡的成熟解决方案,在生产环境中表现卓越,尤其擅长处理高并发、低延迟的大规模流量。LVS集成于Linux内核,通过IP负载均衡技术和多种调度算法,实现了高性能的请求分发。
直接路由(Direct Routing)模式是LVS性能最高的模式,因为响应数据包不经过调度器,直接返回给客户端。在生产环境配置DR模式时,需要特别注意ARP问题。以下是具体配置步骤:
调度器配置:
# 安装ipvsadm
sudo apt-get update
sudo apt-get install ipvsadm -y
# 配置虚拟IP
sudo ifconfig eth0:0 192.168.226.150 netmask 255.255.255.0 broadcast 192.168.226.150
sudo route add -host 192.168.226.150 dev eth0:0
# 启用IP转发
echo "1" | sudo tee /proc/sys/net/ipv4/ip_forward
# 配置LVS规则
sudo ipvsadm -A -t 192.168.226.150:80 -s rr
sudo ipvsadm -a -t 192.168.226.150:80 -r 192.168.226.145 -g
sudo ipvsadm -a -t 192.168.226.150:80 -r 192.168.226.148 -g真实服务器配置:
# 配置回环接口的VIP
sudo ifconfig lo:0 192.168.226.150 netmask 255.255.255.255 broadcast 192.168.226.150 up
# 添加路由
sudo route add -host 192.168.226.150 dev lo:0
# 解决ARP问题
echo '1' | sudo tee /proc/sys/net/ipv4/conf/lo/arp_ignore
echo '2' | sudo tee /proc/sys/net/ipv4/conf/lo/arp_announce
echo '1' | sudo tee /proc/sys/net/ipv4/conf/all/arp_ignore
echo '2' | sudo tee /proc/sys/net/ipv4/conf/all/arp_announceDR模式在生产环境中面临的主要挑战是ARP问题,需要通过上述ARP参数调整来解决。此外,真实服务器的网络拓扑必须允许直接与客户端通信,这在跨网段环境中可能需要特殊路由配置。
网络地址转换(NAT)模式是最简单的LVS模式,但性能相对较低,因为进出流量都要经过调度器。NAT模式适合网络拓扑复杂或真实服务器与客户端不在同一网段的场景。
调度器配置:
# 启用IP转发
echo "1" | sudo tee /proc/sys/net/ipv4/ip_forward
# 配置LVS规则(NAT模式)
sudo ipvsadm -A -t 192.168.226.150:80 -s wlc
sudo ipvsadm -a -t 192.168.226.150:80 -r 192.168.1.145 -m -w 1
sudo ipvsadm -a -t 192.168.226.150:80 -r 192.168.1.148 -m -w 1
# 配置SNAT,使返回流量经过调度器
sudo iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE真实服务器配置:
在NAT模式下,真实服务器只需将默认网关指向调度器的内部IP,无需特殊配置。这是NAT模式的一大优势。
IP隧道(TUN)模式通过IPIP封装实现跨网段调度,适合真实服务器分布在多个数据中心的场景。TUN模式配置较为复杂,需要内核支持IPIP隧道。
调度器配置:
# 加载IPIP模块
sudo modprobe ipip
# 配置LVS规则(TUN模式)
sudo ipvsadm -A -t 192.168.226.150:80 -s lc
sudo ipvsadm -a -t 192.168.226.150:80 -r 192.168.2.145 -i
sudo ipvsadm -a -t 192.168.226.150:80 -r 192.168.3.148 -i真实服务器配置:
# 加载IPIP模块
sudo modprobe ipip
# 配置tunl0接口
sudo ifconfig tunl0 192.168.226.150 netmask 255.255.255.255 up
# 禁用ARP
echo '1' | sudo tee /proc/sys/net/ipv4/conf/tunl0/arp_ignore
echo '2' | sudo tee /proc/sys/net/ipv4/conf/tunl0/arp_announce
echo '1' | sudo tee /proc/sys/net/ipv4/conf/all/arp_ignore
echo '2' | sudo tee /proc/sys/net/ipv4/conf/all/arp_announceLVS在高并发场景下需要调整Linux内核参数以获得最佳性能。以下是生产环境推荐配置:
创建/etc/sysctl.d/lvs-optimization.conf文件:
# 连接跟踪表大小
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
# 网络内存分配优化
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.optmem_max = 65536
net.core.netdev_max_backlog = 250000
# IPVS连接超时设置
net.ipv4.vs.conn_reuse_mode = 1
net.ipv4.vs.expire_nodest_conn = 1
net.ipv4.vs.expire_quiescent_template = 1
# socket缓冲区设置
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 786432 2097152 3145728
# TCP行为优化
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_slow_start_after_idle = 0应用配置:sudo sysctl -p /etc/sysctl.d/lvs-optimization.conf
单一LVS调度器存在单点故障风险,生产环境必须配合Keepalived实现高可用。以下是主备双机热备配置:
主调度器Keepalived配置(/etc/keepalived/keepalived.conf):
global_defs {
router_id LVS_MASTER
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.226.150
}
}
virtual_server 192.168.226.150 80 {
delay_loop 6
lb_algo wrr
lb_kind DR
persistence_timeout 50
protocol TCP
real_server 192.168.226.145 80 {
weight 1
TCP_CHECK {
connect_timeout 8
retry 3
delay_before_retry 3
connect_port 80
}
}
real_server 192.168.226.148 80 {
weight 1
TCP_CHECK {
connect_timeout 8
retry 3
delay_before_retry 3
connect_port 80
}
}
}备用调度器Keepalived配置只需修改state为BACKUP,priority为100,其他配置相同。
LVS生产环境常见问题及解决方法:
连接分配不均衡:
sudo ipvsadm -L -nsudo ipvsadm -L -n --stats真实服务器不可达:
sudo ipvsadm -L -n -cping和tcpdump排查性能瓶颈诊断:
top,vmstat,netstatcat /proc/net/ip_conntrack_countcat /proc/net/ip_vs_statsNginx作为高性能的七层负载均衡器,不仅提供HTTP流量分发,还支持TCP/UDP协议,同时具备内容缓存、SSL终端、安全过滤等丰富功能,是现代Web架构的核心组件。
Nginx通过upstream模块定义后端服务器组,支持多种负载均衡算法:
upstream backend_servers {
# 轮询算法(默认)
server 192.168.1.12:80 weight=1 max_fails=3 fail_timeout=30s;
server 192.168.1.13:80 weight=2 max_fails=3 fail_timeout=30s;
# 备份服务器,只有当主服务器全部不可用时才启用
server 192.168.1.14:80 backup;
}
upstream ip_hash_backend {
# IP哈希算法,保持会话一致性
ip_hash;
server 192.168.1.12:80;
server 192.168.1.13:80;
}
upstream least_conn_backend {
# 最少连接算法
least_conn;
server 192.168.1.12:80;
server 192.168.1.13:80;
}
upstream hash_backend {
# 自定义哈希算法
hash $request_uri consistent;
server 192.168.1.12:80;
server 192.168.1.13:80;
}server {
listen 80;
server_name example.com;
# 访问日志记录真实客户端IP
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
location / {
proxy_pass http://backend_servers;
# 重要代理参数
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# 超时设置
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# 错误处理
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 30s;
# 缓冲区优化
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 16k;
}
# 健康检查端点
location /nginx_status {
stub_status on;
access_log off;
allow 192.168.1.0/24;
deny all;
}
}Nginx从1.9.0版本开始支持四层负载均衡,通过stream模块实现:
stream {
upstream tcp_backend {
server 192.168.1.12:3306 weight=1 max_fails=3 fail_timeout=30s;
server 192.168.1.13:3306 weight=2 max_fails=3 fail_timeout=30s;
}
upstream udp_backend {
server 192.168.1.12:53 weight=1;
server 192.168.1.13:53 weight=2;
}
server {
listen 3306;
proxy_pass tcp_backend;
proxy_timeout 30s;
proxy_connect_timeout 5s;
}
server {
listen 53 udp;
proxy_pass udp_backend;
proxy_timeout 30s;
proxy_responses 1;
error_log /var/log/nginx/dns.log;
}
}Nginx Plus和开源Nginx结合第三方模块可以实现动态负载均衡:
# 基于DNS的服务发现
resolver 8.8.8.8 valid=30s;
upstream dynamic_backend {
zone backend 64k;
server backend.example.com resolve;
}
# 主动健康检查(Nginx Plus特性)
health_check interval=5s fails=3 passes=2 uri=/health;upstream qos_backend {
server 192.168.1.12:80 max_conns=100;
server 192.168.1.13:80 max_conns=200;
# 慢启动
server 192.168.1.14:80 slow_start=30s;
}
# 限流配置
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://qos_backend;
}
}proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=backend_cache:10m
max_size=10g inactive=60m use_temp_path=off;
server {
location / {
proxy_pass http://backend_servers;
proxy_cache backend_cache;
proxy_cache_key "$scheme$request_method$host$request_uri";
proxy_cache_valid 200 302 10m;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
add_header X-Cache-Status $upstream_cache_status;
}
}创建/etc/security/limits.d/nginx.conf:
nginx soft nofile 65536
nginx hard nofile 65536调整内核参数,创建/etc/sysctl.d/nginx-optimization.conf:
# 网络堆栈优化
net.core.somaxconn = 65536
net.core.netdev_max_backlog = 32768
net.ipv4.tcp_max_syn_backlog = 65536
# TCP内存调整
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 786432 2097152 3145728
# TCP连接重用
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
# 减少TCP keepalive时间
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
worker_cpu_affinity auto;
# 错误日志配置
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
# 工作进程优化
worker_rlimit_nofile 65536;
events {
worker_connections 20480;
use epoll;
multi_accept on;
}
http {
# 基础配置
include /etc/nginx/mime.types;
default_type application/octet-stream;
# 性能相关配置
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
types_hash_max_size 2048;
server_tokens off;
# 缓冲区优化
client_body_buffer_size 16k;
client_max_body_size 100m;
client_header_buffer_size 1k;
large_client_header_buffers 4 8k;
# 连接限制
limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 100;
# Gzip压缩
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_proxied any;
gzip_comp_level 6;
gzip_types
text/plain
text/css
text/xml
text/javascript
application/json
application/javascript
application/xml+rss
application/atom+xml;
# 包含其他配置
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}server {
listen 8080;
server_name 127.0.0.1;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow 192.168.1.0/24;
deny all;
}
# Nginx Plus状态页
location /status {
status;
access_log off;
allow 127.0.0.1;
allow 192.168.1.0/24;
deny all;
}
}# 自定义日志格式
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'upstream_addr=$upstream_addr '
'upstream_status=$upstream_status '
'request_time=$request_time '
'upstream_response_time=$upstream_response_time '
'upstream_connect_time=$upstream_connect_time';
# 条件日志记录
map $status $loggable {
~^[23] 0;
default 1;
}
access_log /var/log/nginx/access.log main if=$loggable;HAProxy是高性能的TCP/HTTP负载均衡器,以其卓越的性能、丰富的功能和稳定性在企业级环境中广泛应用。HAProxy特别适合需要精细流量控制和复杂路由逻辑的场景。
# Ubuntu Server安装最新版HAProxy
sudo apt-get update
sudo apt-get install haproxy
# 启用并启动服务
sudo systemctl enable haproxy
sudo systemctl start haproxy
# 检查版本
haproxy -v创建配置文件/etc/haproxy/haproxy.cfg:
global
# 全局配置
daemon
user haproxy
group haproxy
log /dev/log local0 info
maxconn 100000
nbthread 4
cpu-map 1 0
cpu-map 2 1
cpu-map 3 2
cpu-map 4 3
stats socket /var/run/haproxy/admin.sock mode 660 level admin
tune.ssl.default-dh-param 2048
defaults
# 默认配置
log global
mode http
option httplog
option dontlognull
option http-keep-alive
option forwardfor
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 50000
# 监控界面配置
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
# 前端服务配置
frontend web_frontend
bind *:80
bind *:443 ssl crt /etc/ssl/private/example.com.pem
mode http
option http-close
# ACL规则定义
acl static_path path_beg /static/ /images/ /css/ /js/
acl api_path path_beg /api/
acl health_check path /health
# 流量分发
use_backend static_servers if static_path
use_backend api_servers if api_path
use_backend monitor_servers if health_check
default_backend web_servers
# 后端服务器配置
backend web_servers
mode http
balance roundrobin
option redispatch
option httpchk GET /health
cookie SERVERID insert indirect nocache
server web1 192.168.1.12:80 check inter 2s fall 3 rise 2 weight 1 cookie web1
server web2 192.168.1.13:80 check inter 2s fall 3 rise 2 weight 2 cookie web2
server web3 192.168.1.14:80 check inter 2s fall 3 rise 2 weight 1 cookie web3 backup
backend static_servers
mode http
balance source
server static1 192.168.1.15:80 check inter 2s
server static2 192.168.1.16:80 check inter 2s
backend api_servers
mode http
balance leastconn
option tcp-check
server api1 192.168.1.17:8080 check inter 2s
server api2 192.168.1.18:8080 check inter 2s
backend monitor_servers
mode http
server monitor 127.0.0.1:8080 check# 生成SSL证书
sudo mkdir -p /etc/ssl/private
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/private/example.com.key \
-out /etc/ssl/private/example.com.crt
# HAProxy SSL配置
frontend https_frontend
bind *:443 ssl crt /etc/ssl/private/example.com.pem alpn h2,http/1.1
mode http
option forwardfor
# HSTS安全头
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
# SSL优化
ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11backend advanced_health_check
mode http
balance roundrobin
# HTTP健康检查
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
# TCP健康检查
option tcp-check
tcp-check connect
tcp-check send "PING\r\n"
tcp-check expect string "PONG"
# 自定义健康检查
server app1 192.168.1.20:8080 check inter 5s fastinter 1s downinter 3s \
rise 2 fall 3 weight 100 \
check-ssl verify none \
on-marked-down shutdown-sessions \
on-marked-up shutdown-backup-sessionsfrontend traffic_control
bind *:80
mode http
# 连接限流
stick-table type ip size 1m expire 1h store conn_cur,conn_rate(10s)
tcp-request connection track-sc0 src
tcp-request connection reject if { sc0_conn_cur gt 100 }
# 请求速率限制
acl too_fast sc0_conn_rate gt 50
tcp-request connection reject if too_fast
# 基于路径的限流
acl api_path path_beg /api/
acl api_abuse sc1_http_req_rate gt 100
stick-table type binary len 32 size 1m expire 1m store http_req_rate(10s)
tcp-request content track-sc1 base32+src if api_path
http-request deny if api_path api_abuse# 配置rsyslog接收HAProxy日志
# /etc/rsyslog.d/haproxy.conf
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
local0.* -/var/log/haproxy/haproxy.log
& ~
# 创建日志目录和配置
sudo mkdir -p /var/log/haproxy
sudo touch /var/log/haproxy/haproxy.log
sudo systemctl restart rsyslog
# HAProxy日志格式定制
global
log 127.0.0.1:514 local0 info
defaults
log global
option httplog
capture request header Host len 40
capture request header User-Agent len 256
capture request header X-Forwarded-For len 40
capture request header Referer len 200
capture response header Content-Type len 40
capture response header Set-Cookie len 200# 启用统计页面
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats show-legends
stats show-node
stats auth admin:securepassword
stats admin if TRUE
# 详细统计信息
stats realm HAPROXY\ Statistics
stats hide-version
stats show-desc Load Balancer Statistics
stats show-modules主HAProxy Keepalived配置:
# /etc/keepalived/keepalived.conf
global_defs {
router_id haproxy_primary
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass securepassword
}
virtual_ipaddress {
192.168.1.100/24 dev eth0
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/scripts/notify_master.sh"
notify_backup "/etc/keepalived/scripts/notify_backup.sh"
notify_fault "/etc/keepalived/scripts/notify_fault.sh"
}# 多活配置示例
frontend multi_active
bind 192.168.1.100:80
mode http
# 基于地理位置的流量分发
acl from_asia src 10.0.0.0/8 172.16.0.0/12
acl from_europe src 192.168.0.0/16
use_backend asia_servers if from_asia
use_backend europe_servers if from_europe
default_backend default_servers
backend asia_servers
mode http
balance roundrobin
server asia1 10.1.1.10:80 check
server asia2 10.1.1.11:80 check
backend europe_servers
mode http
balance roundrobin
server europe1 192.168.2.10:80 check
server europe2 192.168.2.11:80 check在生产环境中,负载均衡器本身必须保证高可用性,避免成为单点故障。本章将深入探讨Ubuntu Server环境下负载均衡高可用架构的设计与实现。
Keepalived基于VRRP协议实现故障转移,确保负载均衡器的高可用性。
VRRP(Virtual Router Redundancy Protocol)通过多播地址224.0.0.18通信,使用协议号112。主节点定期发送ADVERTISEMENT报文,备用节点监听这些报文。当备用节点在3倍Advertisement Interval内未收到报文时,会发起新的主节点选举。
选举机制:
# /etc/keepalived/keepalived.conf
global_defs {
enable_script_security
script_user root
router_id LVS_DEVEL_01
}
# 复杂的VRRP实例配置
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
nopreempt
preempt_delay 300
authentication {
auth_type PASS
auth_pass secure123
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:vi_1
}
virtual_ipaddress_excluded {
# 辅助IP,不发送ARP
192.168.1.101/32
}
track_interface {
eth0 weight 10
eth1 weight 10
}
track_script {
chk_nginx_service
chk_haproxy_service
}
notify "/etc/keepalived/scripts/notify.sh"
notify_master "/etc/keepalived/scripts/notify_master.sh"
notify_backup "/etc/keepalived/scripts/notify_backup.sh"
notify_fault "/etc/keepalived/scripts/notify_fault.sh"
}
vrrp_instance VI_2 {
# 多实例配置,实现流量分离
state BACKUP
interface eth1
virtual_router_id 52
priority 98
advert_int 1
virtual_ipaddress {
192.168.2.100/24 dev eth1
}
}#!/bin/bash
# /etc/keepalived/scripts/chk_nginx_service.sh
#!/bin/bash
# Nginx健康检查脚本
A=$(systemctl is-active nginx)
B=$(ps -C nginx --no-header | wc -l)
C=$(netstat -tlnp | grep ':80 ' | wc -l)
# 多重检查确保准确性
if [ "$A" = "active" ] && [ $B -gt 0 ] && [ $C -gt 0 ]; then
# 进一步检查Nginx实际响应能力
if curl -I http://localhost/health-check -m 2 >/dev/null 2>&1; then
exit 0
else
# 尝试重启Nginx
systemctl restart nginx
sleep 3
if curl -I http://localhost/health-check -m 2 >/dev/null 2>&1; then
exit 0
else
exit 1
fi
fi
else
# 尝试恢复服务
systemctl start nginx
sleep 5
if [ "$(systemctl is-active nginx)" = "active" ]; then
exit 0
else
exit 1
fi
fi#!/bin/bash
# /etc/keepalived/scripts/notify.sh
#!/bin/bash
# 通知脚本
TYPE=$1
NAME=$2
STATE=$3
case $STATE in
"MASTER")
# 成为主节点时的操作
echo "$(date): $NAME transitioned to MASTER state" >> /var/log/keepalived-notifications.log
systemctl start nginx
systemctl start haproxy
# 发送通知
/usr/local/bin/send_alert.sh "Keepalived MASTER" "$NAME is now MASTER"
;;
"BACKUP")
# 成为备用节点时的操作
echo "$(date): $NAME transitioned to BACKUP state" >> /var/log/keepalived-notifications.log
systemctl stop nginx
systemctl stop haproxy
/usr/local/bin/send_alert.sh "Keepalived BACKUP" "$NAME is now BACKUP"
;;
"FAULT")
# 进入故障状态时的操作
echo "$(date): $NAME transitioned to FAULT state" >> /var/log/keepalived-notifications.log
systemctl stop nginx
systemctl stop haproxy
/usr/local/bin/send_alert.sh "Keepalived FAULT" "$NAME is in FAULT state"
;;
*)
echo "$(date): Unknown state $STATE for $NAME" >> /var/log/keepalived-notifications.log
;;
esac对于更复杂的集群需求,可以使用Heartbeat和Pacemaker构建高可用集群。
# /etc/heartbeat/ha.cf
logfile /var/log/ha-log
logfacility local0
keepalive 500ms
deadtime 5
warntime 3
initdead 30
udpport 694
bcast eth0
mcast eth0 225.0.0.1 694 1 0
ucast eth0 192.168.1.101
auto_failback off
node lb-primary
node lb-secondary
crm respawn# 配置Pacemaker集群
sudo crm configure property stonith-enabled=false
sudo crm configure property no-quorum-policy=ignore
# 虚拟IP资源
sudo crm configure primitive vip ocf:heartbeat:IPaddr2 \
params ip=192.168.1.100 cidr_netmask=24 nic=eth0 \
op monitor interval=30s
# Nginx服务资源
sudo crm configure primitive nginx-service systemd:nginx \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=30s timeout=30s
# HAProxy服务资源
sudo crm configure primitive haproxy-service systemd:haproxy \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=30s timeout=30s
# 资源组
sudo crm configure group loadbalancer-group vip nginx-service haproxy-service
# 资源约束(可选)
sudo crm configure colocation loadbalancer-infrastructure -inf: vip随着容器化技术的普及,负载均衡也需要适应云原生环境。
# docker-compose.yml
version: '3.8'
services:
haproxy:
image: haproxy:2.6
container_name: haproxy-lb
network_mode: host
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
- /etc/ssl/private:/etc/ssl/private:ro
restart: unless-stopped
cap_add:
- NET_ADMIN
healthcheck:
test: ["CMD", "haproxy", "-c", "-f", "/usr/local/etc/haproxy/haproxy.cfg"]
interval: 30s
timeout: 10s
retries: 3
nginx:
image: nginx:1.23
container_name: nginx-lb
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./conf.d:/etc/nginx/conf.d:ro
- /etc/letsencrypt:/etc/letsencrypt:ro
restart: unless-stopped
depends_on:
- haproxy# nginx-ingress-controller.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx-ingress
template:
metadata:
labels:
app: nginx-ingress
spec:
containers:
- name: nginx-ingress-controller
image: registry.k8s.io/ingress-nginx/controller:v1.5.1
args:
- /nginx-ingress-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
- name: webhook
containerPort: 8443
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
---
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- name: http
port: 80
targetPort: http
- name: https
port: 443
targetPort: https
selector:
app: nginx-ingress负载均衡系统的性能和稳定性直接影响整个应用的可用性。本章将深入探讨Ubuntu Server环境下负载均衡的性能优化和故障诊断技术。
创建/etc/sysctl.d/99-loadbalancer-optimization.conf:
# 网络核心参数
net.core.netdev_max_backlog = 100000
net.core.somaxconn = 65535
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.optmem_max = 134217728
# IPV4优化
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_max_tw_buckets = 1440000
# TCP连接重用
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
# 减少TCP keepalive时间
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
# 连接跟踪优化
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
# 内存Overcommit设置
vm.overcommit_memory = 1
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5应用配置:sudo sysctl -p /etc/sysctl.d/99-loadbalancer-optimization.conf
# 增加文件描述符限制
echo "* soft nofile 1000000" >> /etc/security/limits.conf
echo "* hard nofile 1000000" >> /etc/security/limits.conf
echo "root soft nofile 1000000" >> /etc/security/limits.conf
echo "root hard nofile 1000000" >> /etc/security/limits.conf
# 配置systemd服务限制
mkdir -p /etc/systemd/system.conf.d/
cat > /etc/systemd/system.conf.d/limits.conf << EOF
[Manager]
DefaultLimitNOFILE=1000000
DefaultLimitNPROC=1000000
EOF
# 重新加载systemd
systemctl daemon-reload
# 调整文件系统参数
echo "none /proc/sys/kernel/hung_task_timeout_secs 0" >> /etc/sysctl.conf# /etc/nginx/nginx.conf
events {
worker_connections 50000;
worker_aio_requests 128;
use epoll;
multi_accept on;
}
http {
# 缓存优化
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
# 缓冲区优化
client_body_buffer_size 128k;
client_max_body_size 50m;
client_header_buffer_size 3m;
large_client_header_buffers 4 256k;
# 超时设置
client_body_timeout 10;
client_header_timeout 10;
reset_timedout_connection on;
send_timeout 2;
# 压缩优化
gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types
text/plain
text/css
text/xml
text/javascript
application/x-javascript
application/xml
application/javascript
application/json;
# 静态资源优化
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
expires 365d;
add_header Cache-Control "public, immutable";
access_log off;
}
# 上游连接池
upstream backend {
server 192.168.1.12:80 max_conns=300;
server 192.168.1.13:80 max_conns=300;
keepalive 100;
}
}# /etc/haproxy/haproxy.cfg
global
maxconn 100000
maxcompcpu 8
maxcomprate 100
spread-checks 4
tune.bufsize 16384
tune.http.cookielen 4096
tune.http.maxhdr 1024
tune.idletimer 1000
tune.ssl.cachesize 1000000
tune.ssl.lifetime 300
tune.ssl.maxrecord 1430
tune.zlib.memlevel 8
tune.zlib.windowsize 16
defaults
maxconn 50000
timeout http-keep-alive 1s
timeout http-request 5s
timeout queue 30s
timeout connect 5s
timeout client 50s
timeout server 50s
timeout tunnel 1h
backend servers
balance leastconn
option tcp-check
default-server check inter 2s fall 3 rise 2 maxconn 300 maxqueue 100
server web1 192.168.1.12:80 maxconn 300
server web2 192.168.1.13:80 maxconn 300#!/bin/bash
# monitoring-setup.sh
# 安装监控工具
apt-get update
apt-get install -y prometheus-node-exporter nginx-prometheus-exporter haproxy-exporter
# 配置Node Exporter
cat > /etc/default/prometheus-node-exporter << EOF
ARGS="--collector.systemd --collector.tcpstat --collector.processes"
EOF
# 配置Nginx状态收集
location /nginx-status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
# 配置HAProxy Prometheus导出
listen stats
bind :9101
mode http
stats enable
stats uri /metrics
stats show-legends
EOF创建监控脚本/usr/local/bin/loadbalancer-metrics.sh:
#!/bin/bash
# 系统指标
echo "=== SYSTEM METRICS ==="
echo "Load: $(cat /proc/loadavg)"
echo "Memory: $(free -m | awk 'NR==2{printf "%.2f%%", $3*100/$2}')"
echo "Disk IO: $(iostat -x 1 1 | awk 'NR==4{print $14}')% util"
# 网络指标
echo -e "\n=== NETWORK METRICS ==="
echo "TCP Connections: $(netstat -tun | wc -l)"
echo "SYN Queue: $(netstat -tun | grep SYN_RECV | wc -l)"
# Nginx指标
if systemctl is-active nginx >/dev/null; then
echo -e "\n=== NGINX METRICS ==="
curl -s http://localhost/nginx-status | awk '
/Active connections/ {print "Active Connections: "$3}
/server accepts handled/ {print "Accepted: "$3" Handled: "$4" Requests: "$5}
/Reading/ {print "Reading: "$2" Writing: "$4" Waiting: "$6}'
fi
# HAProxy指标
if systemctl is-active haproxy >/dev/null; then
echo -e "\n=== HAPROXY METRICS ==="
echo "show info" | socat /var/run/haproxy/admin.sock - | grep -E "(Maxconn|Maxsock|Uptime|Memmax)"
fi
# 连接跟踪
echo -e "\n=== CONNTRACK METRICS ==="
echo "Tracked Connections: $(cat /proc/sys/net/netfilter/nf_conntrack_count 2>/dev/null || echo "N/A")"#!/bin/bash
# troubleshooting.sh
# 检查服务状态
check_service_status() {
local service=$1
if systemctl is-active $service >/dev/null; then
echo "✓ $service is running"
return 0
else
echo "✗ $service is not running"
systemctl status $service --no-pager -l
return 1
fi
}
# 检查端口监听
check_port_listening() {
local port=$1
if netstat -tln | grep ":$port " >/dev/null; then
echo "✓ Port $port is listening"
return 0
else
echo "✗ Port $port is not listening"
return 1
fi
}
# 检查VIP配置
check_vip_configuration() {
local vip=$1
if ip addr show | grep $vip >/dev/null; then
echo "✓ VIP $vip is configured"
return 0
else
echo "✗ VIP $vip is not configured"
return 1
fi
}
# 健康检查后端服务器
check_backend_health() {
local backend=$1
if curl -s -o /dev/null -w "%{http_code}" http://$backend/health-check | grep -q "200"; then
echo "✓ Backend $backend is healthy"
return 0
else
echo "✗ Backend $backend is unhealthy"
return 1
fi
}
# 执行诊断
echo "Starting load balancer diagnostics..."
check_service_status nginx
check_service_status haproxy
check_service_status keepalived
check_port_listening 80
check_port_listening 443
check_vip_configuration "192.168.1.100"
check_backend_health "192.168.1.12"
check_backend_health "192.168.1.13"#!/bin/bash
# auto-recovery.sh
LOG_FILE="/var/log/loadbalancer-recovery.log"
ALERT_EMAIL="admin@example.com"
log() {
echo "$(date): $1" >> $LOG_FILE
}
send_alert() {
local subject=$1
local message=$2
echo "$message" | mail -s "$subject" $ALERT_EMAIL
log "Alert sent: $subject"
}
recover_nginx() {
log "Attempting to recover Nginx..."
# 优雅停止
if ! nginx -s quit 2>/dev/null; then
sleep 5
# 强制停止
pkill -9 nginx
fi
# 清理资源
fuser -k 80/tcp
fuser -k 443/tcp
# 重新启动
systemctl start nginx
sleep 3
if systemctl is-active nginx >/dev/null; then
log "Nginx recovery successful"
return 0
else
log "Nginx recovery failed"
send_alert "Nginx Recovery Failed" "Manual intervention required"
return 1
fi
}
recover_haproxy() {
log "Attempting to recover HAProxy..."
systemctl stop haproxy
sleep 2
# 清理socket文件
rm -f /var/run/haproxy/admin.sock
# 重新启动
systemctl start haproxy
sleep 3
if systemctl is-active haproxy >/dev/null; then
log "HAProxy recovery successful"
return 0
else
log "HAProxy recovery failed"
send_alert "HAProxy Recovery Failed" "Manual intervention required"
return 1
fi
}
# 主恢复逻辑
main() {
log "Starting automatic recovery process..."
# 检查并恢复Nginx
if ! systemctl is-active nginx >/dev/null; then
log "Nginx is down, starting recovery..."
if ! recover_nginx; then
return 1
fi
fi
# 检查并恢复HAProxy
if ! systemctl is-active haproxy >/dev/null; then
log "HAProxy is down, starting recovery..."
if ! recover_haproxy; then
return 1
fi
fi
log "Recovery process completed successfully"
return 0
}
# 执行主函数
main "$@"#!/bin/bash
# benchmark-loadbalancer.sh
# 安装测试工具
apt-get update
apt-get install -y wrk apache2-utils siege
# WRK基准测试
run_wrk_test() {
local url=$1
local threads=$2
local connections=$3
local duration=$4
echo "Running WRK test: $url"
echo "Threads: $threads, Connections: $connections, Duration: ${duration}s"
wrk -t$threads -c$connections -d${duration}s $url
echo "----------------------------------------"
}
# Siege压力测试
run_siege_test() {
local url=$1
local concurrent=$2
local time=$3
echo "Running Siege test: $url"
echo "Concurrent: $concurrent, Time: ${time}s"
siege -c$concurrent -t${time}s $url
echo "----------------------------------------"
}
# 测试不同的负载均衡配置
echo "Starting load balancer benchmark..."
# 测试静态内容
run_wrk_test "http://192.168.1.100/static/test.html" 4 100 30
# 测试动态内容
run_wrk_test "http://192.168.1.100/api/health" 2 50 30
# 测试SSL性能
run_wrk_test "https://192.168.1.100/" 4 100 30
# 长连接测试
run_siege_test "http://192.168.1.100/" 100 60s
echo "Benchmark completed"通过本章介绍的优化技术和故障诊断方法,您可以显著提升负载均衡系统的性能和可靠性,确保生产环境的稳定运行。
Ubuntu Server下的负载均衡实践是一个持续演进的过程,随着技术的发展和业务需求的变化,我们需要不断更新知识体系和技能栈。本文涵盖了从基础概念到高级实践的全面内容,希望能够为您的负载均衡架构设计和运维提供有价值的参考。
在生产环境实践中,我们特别需要关注以下几个关键方面:
架构设计的冗余性:负载均衡器本身必须避免单点故障,通过主备、双活或多活架构确保高可用性。Keepalived、Heartbeat等工具为这方面提供了成熟解决方案。
性能与可扩展性的平衡:根据业务特点选择合适的负载均衡技术和算法,在性能、功能和复杂度之间找到最佳平衡点。LVS、Nginx、HAProxy各有擅长场景,可以组合使用。
监控与可观测性:建立完善的监控体系,实时掌握系统状态,及时发现和解决潜在问题。Prometheus、Grafana等工具在这方面发挥着重要作用。
安全性的全面考虑:从网络层到应用层,实施纵深防御策略,包括DDoS防护、SSL/TLS优化、访问控制等。
随着云原生技术和人工智能的发展,负载均衡领域也呈现出新的趋势:
服务网格的兴起:Istio、Linkerd等服务网格技术将负载均衡、服务发现、安全控制等功能下沉到基础设施层,为微服务架构提供更精细的流量管理。
AI驱动的智能负载均衡:通过机器学习算法实时分析流量模式,动态调整负载均衡策略,实现更智能的资源分配和故障预测。
边缘计算的集成:随着边缘计算的发展,负载均衡需要适应分布式、低延迟的场景,在边缘节点和中心云之间实现智能流量调度。
安全能力的深度融合:负载均衡器将集成更多安全能力,如WAF、DDoS防护、API安全等,成为综合性的安全接入网关。
通过不断学习和实践这些新技术,我们能够构建更加高效、可靠和安全的负载均衡架构,为业务发展提供坚实的技术基础。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。