前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Docker桥接网络分析

Docker桥接网络分析

作者头像
敲得码黛
发布2024-06-12 14:17:04
810
发布2024-06-12 14:17:04
举报
文章被收录于专栏:敲得码黛敲得码黛

前言

虚拟局域网(VLAN)》一文中描述了虚拟网卡、虚拟网桥的作用,以及通过iptables实现了vlan联网,其实学习到这里自然就会联想到目前主流的容器技术:Docker,因此接下来打算研究一下Docker的桥接网络与此有何异同。

猜测

众所周知,Docker有host、bridge、none三种网络模式,这里我们仅分析桥接(bridge)模式。有了上一篇文章的基础,bridge这个概念我们应该已经熟悉了,bridge网桥是一种基于mac地址数据链路层进行数据交换的一个虚拟交换机。 所以我们现在可以大胆的进行猜测:Docker也是基于此模式实现了内部网络通信。

  • 猜测一:Docker引擎在创建容器的时候会自动为容器创建一对虚拟网卡(veth)并为其分配私有ip,然后将veth一端连接在docker0网桥中,另一端连接在容器的内部网络中
  • 猜测二:Docker同样利用iptables的nat能力将容器内流量转发至互联网实现通信。

求证

检查主机网卡列表

检查docker容器及网卡列表,观察是否存在docker网桥以及veth。

shell
代码语言:javascript
复制
# 查看本机正在运行的coekr容器(mysql、redis、halo、debian)
[root@VM-8-10-centos ~]# docker ps 
CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                                                  NAMES
56ffaf39316a   debian               "bash"                   23 hours ago    Up 7 minutes                                                          debian
c8a273ce122e   halohub/halo:1.5.3   "/bin/sh -c 'java -X…"   5 months ago    Up 47 hours    0.0.0.0:8090->8090/tcp, :::8090->8090/tcp              halo
d09fcfa7de0f   redis                "docker-entrypoint.s…"   12 months ago   Up 5 weeks     0.0.0.0:8805->6379/tcp, :::8805->6379/tcp              redis
87a2192f6db4   mysql:5.7            "docker-entrypoint.s…"   2 years ago     Up 5 weeks     0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp   mysql
# 检查主机网卡列表(确认docker0、veth存在)
[root@VM-12-15-centos ~]# ip link 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:b3:6f:20 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
3: br-67cf5bfe7a5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:c5:07:22:c7 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:38:d6:1b:ea brd ff:ff:ff:ff:ff:ff
5: br-9fd151a807e7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:35:7f:ed:76 brd ff:ff:ff:ff:ff:ff
315: vethf2afb37@if314: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether 3a:06:f0:8d:06:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 12
317: veth1ec30f9@if316: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-9fd151a807e7 state UP mode DEFAULT group default 
    link/ether 4a:ad:1a:b0:5a:5f brd ff:ff:ff:ff:ff:ff link-netnsid 0
319: vethc408286@if318: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether 26:b0:3c:f4:c5:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
321: veth68fb8c6@if320: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether 96:ca:a9:42:f8:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 9
323: veth6dba394@if322: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether 92:1c:5e:9c:a2:b3 brd ff:ff:ff:ff:ff:ff link-netnsid 4
325: veth1509ed0@if324: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether fa:22:33:da:12:e0 brd ff:ff:ff:ff:ff:ff link-netnsid 11
329: vethef1dbac@if328: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether aa:db:d2:10:36:60 brd ff:ff:ff:ff:ff:ff link-netnsid 3
331: veth69d3e7d@if330: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether 86:45:d0:0e:6b:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 5
335: veth98588ae@if334: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether 86:59:55:39:17:ad brd ff:ff:ff:ff:ff:ff link-netnsid 7
349: vetha84d717@if348: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
    link/ether ee:7f:d2:27:15:83 brd ff:ff:ff:ff:ff:ff link-netnsid 6
354: veth1@if355: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mybridge state UP mode DEFAULT group default qlen 1000
    link/ether 72:c8:9e:24:a6:a3 brd ff:ff:ff:ff:ff:ff link-netns n1
356: br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 72:c8:9e:24:a6:a3 brd ff:ff:ff:ff:ff:ff

使用ip link查看本机网卡列表,可以发现宿主机存在一个名为docker0的虚拟网桥,且虚拟网桥下有四对虚拟网卡分别对应 debian、halo、redis、mysql四个docker容器

检查网桥ip及Docker内部容器的网络通信

shell
代码语言:javascript
复制
# docker0默认网桥的IP地址为172.17.0.1/16
[root@VM-8-10-centos ~]# ip addr show docker0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:6f:d7:19:7e brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:6fff:fed7:197e/64 scope link 
       valid_lft forever preferred_lft forever
# 检查桥接网络内部容器的ip地址(分别为172.17.0.2/16、172.17.0.3/16、172.17.0.4/16、172.17.0.5/16)       
[root@VM-8-10-centos ~]# docker network inspect bridge
[
    {
        "Name": "bridge",
        "Id": "2dc75e446719be8cad37e1ea9ae7d1385fcc728b8177646a3c62929c2b289e94",
        "Created": "2024-04-24T09:46:14.399901891+08:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "56ffaf39316ac9f776c6b3e2a8a79e9f42dfab42aa1f7de7525bd26c686defaa": {
                "Name": "debian",
                "EndpointID": "47dd9441d4a4c8b09afea3bca23652b80ba35e6baa13d44ec21ec89522e722a6",
                "MacAddress": "02:42:ac:11:00:05",
                "IPv4Address": "172.17.0.5/16",
                "IPv6Address": ""
            },
            "87a2192f6db48c9bf2996bf25c79d4c18c3ae2975cac9d55e7fdfdcec03f896b": {
                "Name": "mysql",
                "EndpointID": "00b93de23c5abf2ed1349bac1c2ec93bf7ed516370dabf23348b980f19cfaa9c",
                "MacAddress": "02:42:ac:11:00:02",
                "IPv4Address": "172.17.0.2/16",
                "IPv6Address": ""
            },
            "c8a273ce122ef5479583908f40898141a90933a3c41c8028dc7966b9af4c465d": {
                "Name": "halo",
                "EndpointID": "ba8ef83c80f3edb6e7987c95ae6d56816a1fc00d07e8bb2bfbb0f19ef543badf",
                "MacAddress": "02:42:ac:11:00:04",
                "IPv4Address": "172.17.0.4/16",
                "IPv6Address": ""
            },
            "d09fcfa7de0f2a7b3ef7927a7e53a8a53fb93021b119b1376fe4616381c5a57c": {
                "Name": "redis",
                "EndpointID": "afbc9128f7d27becfbf64e843a92d36ce23800cd42c131e550abea7afb6a131e",
                "MacAddress": "02:42:ac:11:00:03",
                "IPv4Address": "172.17.0.3/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1500"
        },
        "Labels": {}
    }
] 
# 进入debian容器测试内部网络通信和互联网通信
[root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
root@56ffaf39316a:/# ping 172.17.0.1
PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.071 ms
64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.036 ms
--- 172.17.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.036/0.053/0.071/0.017 ms
root@56ffaf39316a:/# ping 172.17.0.3
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.047 ms
--- 172.17.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.047/0.057/0.067/0.010 ms
root@56ffaf39316a:/# ping baidu.com
PING baidu.com (39.156.66.10) 56(84) bytes of data.
64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=1 ttl=247 time=59.0 ms
64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=2 ttl=247 time=55.4 ms
--- baidu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 55.400/57.221/59.043/1.821 ms
小结

通过shell的结果分析:docker0网桥的ip为172.17.0.1/16,docker0各子网通信正常,并且通过ping baidu.com检查了互联网通信也正常。因此可以得出docker桥接模式与前一章中vlan模式是一致的,都是通过一个虚拟网桥实现了内部网络的通信

docker内部通信脉络图

Docker容器与互联网进行通信

在上一章节中不小心留了个坑,因为firewalld在iptables中内置了很多的规则,所以对于流量的分析很不友好,所以我索性直接关闭了firewalld,但是紧接着就发现这样做有一个副作用:firewalld关闭后,iptables也会被清空。当时不觉得有什么影响,现在仔细回想了一下vlan之所以能够连接互联网,很大一部分原因是利用了iptables的nat功能,iptables被清空,意味着nat功能被关闭了,所以利用此功能的应用会失去网络连接。下面使用shell命令来模拟并分析此现象。

shell
代码语言:javascript
复制
# 关闭firewalld
[root@VM-8-10-centos ~]# systemctl stop firewalld
# 检查iptables
[root@VM-8-10-centos ~]# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
[root@VM-8-10-centos ~]# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination 
# 检查debian容器互联网连接情况 
[root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
root@56ffaf39316a:/# ping baidu.com
PING baidu.com (110.242.68.66) 56(84) bytes of data.
--- baidu.com ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4000ms
# 检查内部网络连接情况
root@56ffaf39316a:/# ping 172.17.0.1
PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.041 ms
64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.046 ms
--- 172.17.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.041/0.043/0.046/0.002 ms
root@56ffaf39316a:/# ping 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.081 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.055 ms
--- 172.17.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.055/0.068/0.081/0.013 ms

通过清空iptables发现docker容器内部确实丢失了互联网连接,但是没有影响内部网络的通信。

手动添加nat记录恢复Docker容器与互联网的通信
代码语言:javascript
复制
# 添加snat记录
[root@VM-8-10-centos ~]# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# 检查debian容器互联网连接情况 
[root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
root@56ffaf39316a:/# ping baidu.com
PING baidu.com (39.156.66.10) 56(84) bytes of data.
64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=1 ttl=247 time=55.8 ms
64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=2 ttl=247 time=55.4 ms
--- baidu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 55.386/55.610/55.834/0.224 ms

个人总结: docker容器与互联网进行通信时确实依赖iptables,且行为上与vlan几乎一致,因此我认为Docker其实是vlan+iptables一种高级应用。

思考

docker容器内的网络通信是否也基于二层协议进行数据交换?

基于之前对vlan的了解,明白了bridge是一种工作在"数据链路层",根据mac地址交换数据帧的虚拟交换机,既然工作在二层,那么意味着它在进行数据交换时是没有ip概念的,仅仅是按照mac地址转发数据帧。既然如此,那么即使删除了它的ip地址和路由表,应该也可以完成数据交换。

代码语言:javascript
复制
[root@VM-8-10-centos ~]# ip addr del 172.17.0.1/16 dev docker0
[root@VM-8-10-centos ~]# ip addr show docker0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:6f:d7:19:7e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::42:6fff:fed7:197e/64 scope link 
       valid_lft forever preferred_lft forever
[root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
root@56ffaf39316a:/# ping 172.17.0.3 
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.049 ms
--- 172.17.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.049/0.057/0.066/0.008 ms

iptables与路由表有何联系和区别?谁决定了流量的出口网卡?

学习vlan的时候就存在一个疑惑:虚拟网桥进行互联网通信时,将流入网桥的流量转发到出口网卡是由谁决定的?当时做vlan的nat通信时,因为需要在iptables中配置FORWARD及NAT规则,自然而然的会认为是iptables实现的。如此的话,那么路由表存在的意义又是什么呢?所以到底是iptables实现了流量转发,还是路由表(ip route)实现了流量转发?或者具体点讲:是谁将流量从docker0网卡转发到eth0网卡?

具体过程需要深入分析iptables的工作原理,这里就不再赘述了,直接给出个人结论仅供参考。

个人结论:路由表不对流量做任何更改,仅仅用来确定数据包的出口网卡,iptables可以对ip数据包进行过滤、修改、转发,但最终还是由路由表确定出口网卡。

即使没有snat,数据包是不是应该也可以到达对方网络?

在互联网中基于ip协议进行通信的流量都会被标注源地址目的地址,目的地址决定了流量应该如何发送给对方主机,源地址决定了其他主机如何区分数据包是由谁发送的。而SNAT的核心概念是通过转换源地址的方式进行工作的,这是否意味着即使不配置snat,数据包依然可以到达对方网络,只是对方网络无法回复。

代码语言:javascript
复制
# 假设我有两台具有公网ipv4地址的云服务器xxx.xxx.xxx.xx1和xxx.xxx.xxx.xx2。xx1局域网内有另一台主机x10

# xx1主机
# 使用snat将源ip由xx1转换为xx2
[root@VM-8-10-centos ~]# iptables -t nat -A POSTROUTING -s xx1 -o eth0 -j SNAT --to-source xxx.xxx.xxx.x10
# 监听eth0网卡的icmp数据包
[root@VM-8-10-centos ~]# tcpdump -i eth0 -p icmp -nv | grep x10

# xx2主机
# 监听eth0网卡的icmp数据包
[root@VM-8-10-centos ~]# tcpdump -i eth0 -p icmp -nv  

根据tcpdump抓包分析xx1确实发送了源地址为x10的数据包,但是从xx2主机的监听结果看并没有收到来自xx1或来自x10发送的数据包。或许是数据包在中途路由的过程中被丢弃了,又或者是我理解错了??

关于我

一个互联网行业搬砖人士,爱技术、爱生活、爱分享。

对文章感兴趣的话就关注我吧。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2024-06-04,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 敲得码黛 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 前言
  • 猜测
  • 求证
    • 检查主机网卡列表
      • shell
    • 检查网桥ip及Docker内部容器的网络通信
      • shell
      • 小结
    • Docker容器与互联网进行通信
      • shell
      • 手动添加nat记录恢复Docker容器与互联网的通信
  • 思考
    • docker容器内的网络通信是否也基于二层协议进行数据交换?
      • iptables与路由表有何联系和区别?谁决定了流量的出口网卡?
        • 即使没有snat,数据包是不是应该也可以到达对方网络?
        • 关于我
        相关产品与服务
        容器服务
        腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档