在前面文章《learning:vpp实现dot1q终结功能配置》介绍了L2 vSwitch一些基本概念BD(Bridge Domain)、BDI (Bridge Domain interface)等等概念,本文主要学习二层的转发流程。前面文章中介绍了在腾讯云主机中搭建DPDK&VPP的学习环境,下面就在腾讯云主机搭建L2 vswitch环境。具体配置如下图所示:
首先在linux内核创建2个命名空间PC1和PC2.
ip netns add pc1
ip netns add pc2
然后通过vpp命令行创建2个tap接口及BD,两个tap接口加入到BD中。具体配置如下. 我们可以复制下面内容,自动生成一个l2_conf文件
cat << EOF > /root/l2_conf
#创建二层BD域 1
create bridge-domain 1
#创建tap接口tap1 内核名称tap1 内核所有netns pc1 内核接口地址192.168.1.1/24
creat tap id 1 host-ns pc1 host-ip4-addr 192.168.1.1/24 host-if-name tap1
creat tap id 2 host-ns pc2 host-ip4-addr 192.168.1.2/24 host-if-name tap2
set interface state tap1 up
set interface state tap2 up
#tap接口加入bd 1
set interface l2 bridge tap1 1
set interface l2 bridge tap2 1
EOF
接下来进入vpp命令行视图执行exec /root/l2_conf既可以生成相应的接口
dpdk-vpp源码分析: exec /root/l2_conf
dpdk-vpp源码分析:
dpdk-vpp源码分析:
dpdk-vpp源码分析: show interface addr
local0 (dn):
tap1 (up):
L2 bridge bd-id 1 idx 1 shg 0
tap2 (up):
L2 bridge bd-id 1 idx 1 shg 0
dpdk-vpp源码分析: show bridge-domain 1 detail
BD-ID Index BSN Age(min) Learning U-Forwrd UU-Flood Flooding ARP-Term arp-ufwd Learn-co Learn-li BVI-Intf
1 1 0 off on on flood on off off 2 16777216 N/A
span-l2-input l2-input-classify l2-input-feat-arc l2-policer-classify l2-input-acl vpath-input-l2 l2-ip-qos-record l2-input-vtr l2-learn l2-rw l2-fwd l2-flood l2-flood l2-output
Interface If-idx ISN SHG BVI TxFlood VLAN-Tag-Rewrite
tap1 1 1 0 - * none
tap2 2 1 0 - * none
dpdk-vpp源码分析: show l2fib all
Mac-Address BD-Idx If-Idx BSN-ISN Age(min) static filter bvi Interface-Name
02:fe:25:07:17:52 1 2 0/1 - - - - tap2
02:fe:7b:06:08:36 1 1 0/1 - - - - tap1
L2FIB total/learned entries: 2/2 Last scan time: 0.0000e0sec Learn limit: 16777216
dpdk-vpp源码分析: clear l2fib
dpdk-vpp源码分析: show l2fib all
no l2fib entries
dpdk-vpp源码分析:
为了抓取arp请求及回应报文的trace流程,我们执行cleat l2fib清楚mac表。然后设置trace add virtio-input 1 来抓取arp请求及回应流程。在内核执行ip netns exec pc1 ping 192.168.1.2 。
Packet 1
00:04:05:084044: virtio-input
virtio: hw_if_index 1 next-index 4 vring 0 len 42
hdr: flags 0x00 gso_type 0x00 hdr_len 0 gso_size 0 csum_start 0 csum_offset 0 num_buffers 1
00:04:05:084057: ethernet-input
frame: flags 0x1, hw-if-index 1, sw-if-index 1
ARP: 02:fe:81:57:ec:8e -> ff:ff:ff:ff:ff:ff
00:04:05:084069: l2-input
l2-input: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 02:fe:81:57:ec:8e [l2-learn l2-flood ]
00:04:05:084073: l2-learn
l2-learn: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 02:fe:81:57:ec:8e bd_index 1
00:04:05:084081: l2-flood
l2-flood: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 02:fe:81:57:ec:8e bd_index 1
00:04:05:084085: l2-output
l2-output: sw_if_index 2 dst ff:ff:ff:ff:ff:ff src 02:fe:81:57:ec:8e data 08 06 00 01 08 00 06 04 00 01 02 fe
00:04:05:084089: tap2-output
tap2 flags 0x00180005
ARP: 02:fe:81:57:ec:8e -> ff:ff:ff:ff:ff:ff
request, type ethernet/IP4, address size 6/4
02:fe:81:57:ec:8e/192.168.1.1 -> 00:00:00:00:00:00/192.168.1.2
00:04:05:084095: tap2-tx
buffer 0x9f7d3: current data 0, length 42, buffer-pool 0, ref-count 1, trace handle 0x0
l2-hdr-offset 0 l3-hdr-offset 14
hdr-sz 0 l2-hdr-offset 0 l3-hdr-offset 14 l4-hdr-offset 0 l4-hdr-sz 0
ARP: 02:fe:81:57:ec:8e -> ff:ff:ff:ff:ff:ff
request, type ethernet/IP4, address size 6/4
02:fe:81:57:ec:8e/192.168.1.1 -> 00:00:00:00:00:00/192.168.1.2
Packet 2
00:04:05:084154: virtio-input
virtio: hw_if_index 2 next-index 4 vring 0 len 42
hdr: flags 0x00 gso_type 0x00 hdr_len 0 gso_size 0 csum_start 0 csum_offset 0 num_buffers 1
00:04:05:084156: ethernet-input
frame: flags 0x1, hw-if-index 2, sw-if-index 2
ARP: 02:fe:cf:7a:3e:a8 -> 02:fe:81:57:ec:8e
00:04:05:084159: l2-input
l2-input: sw_if_index 2 dst 02:fe:81:57:ec:8e src 02:fe:cf:7a:3e:a8 [l2-learn l2-fwd l2-flood l2-flood ]
00:04:05:084160: l2-learn
l2-learn: sw_if_index 2 dst 02:fe:81:57:ec:8e src 02:fe:cf:7a:3e:a8 bd_index 1
00:04:05:084168: l2-fwd
l2-fwd: sw_if_index 2 dst 02:fe:81:57:ec:8e src 02:fe:cf:7a:3e:a8 bd_index 1 result [0x1040000000001, 1] none
00:04:05:084171: l2-output
l2-output: sw_if_index 1 dst 02:fe:81:57:ec:8e src 02:fe:cf:7a:3e:a8 data 08 06 00 01 08 00 06 04 00 02 02 fe
00:04:05:084173: tap1-output
tap1 flags 0x00180005
ARP: 02:fe:cf:7a:3e:a8 -> 02:fe:81:57:ec:8e
reply, type ethernet/IP4, address size 6/4
02:fe:cf:7a:3e:a8/192.168.1.2 -> 02:fe:81:57:ec:8e/192.168.1.1
00:04:05:084175: tap1-tx
buffer 0x9d0d3: current data 0, length 42, buffer-pool 0, ref-count 1, trace handle 0x1
l2-hdr-offset 0 l3-hdr-offset 14
hdr-sz 0 l2-hdr-offset 0 l3-hdr-offset 14 l4-hdr-offset 0 l4-hdr-sz 0
ARP: 02:fe:cf:7a:3e:a8 -> 02:fe:81:57:ec:8e
reply, type ethernet/IP4, address size 6/4
02:fe:cf:7a:3e:a8/192.168.1.2 -> 02:fe:81:57:ec:8e/192.168.1.1
上述trace流程中,可以看到arp请求报文和回应报文走的node流程是存在一些差异的。arp请求报文时,携带目的mac地址时广播mac,l2-learn触发了mac表学习,记录内核tap1接口mac及对应接口;l2-flood在二层BD域内报文泛宏从tap2发送出去。arp回应报文使用单播报文,l2-learn节点出发mac表学习,记录内核tap2接口mac及对应接口。由于在请求阶段已经学习到mac表,所以命中l2fib表,报文送到l2-fwd节点查询到出接口信息,从tap1接口转发出去。下面的查询内核tap1和tap2接口mac地址及vpp l2fib表信息:
dpdk-vpp源码分析: show l2fib bd_id 1
Mac-Address BD-Idx If-Idx BSN-ISN Age(min) static filter bvi Interface-Name
02:fe:25:07:17:52 1 2 0/1 - - - - tap2
02:fe:7b:06:08:36 1 1 0/1 - - - - tap1
L2FIB total/learned entries: 2/2 Last scan time: 0.0000e0sec Learn limit: 16777216
dpdk-vpp源码分析: quit
root@learning-vpp:~# ip netns exec pc1 ifconfig
tap1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::fe:7bff:fe06:836 prefixlen 64 scopeid 0x20<link>
ether 02:fe:7b:06:08:36 txqueuelen 1000 (Ethernet)
RX packets 17 bytes 1286 (1.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 17 bytes 1286 (1.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@learning-vpp:~# ip netns exec pc2 ifconfig
tap2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.2 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::fe:25ff:fe07:1752 prefixlen 64 scopeid 0x20<link>
ether 02:fe:25:07:17:52 txqueuelen 1000 (Ethernet)
RX packets 17 bytes 1286 (1.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 17 bytes 1286 (1.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
至此,我们查询在vpp上已经学习到内核上tap1和tap2的mac表,接下来我们在linux上执行ip netns exec pc1 ping 192.168.1.2查询后续报文转发流程图
00:30:46:742722: virtio-input
virtio: hw_if_index 1 next-index 4 vring 0 len 98
hdr: flags 0x00 gso_type 0x00 hdr_len 0 gso_size 0 csum_start 0 csum_offset 0 num_buffers 1
00:30:46:742739: ethernet-input
frame: flags 0x1, hw-if-index 1, sw-if-index 1
IP4: 02:fe:7b:06:08:36 -> 02:fe:25:07:17:52
00:30:46:742751: l2-input
l2-input: sw_if_index 1 dst 02:fe:25:07:17:52 src 02:fe:7b:06:08:36 [l2-learn l2-fwd l2-flood l2-flood ]
00:30:46:742755: l2-learn
l2-learn: sw_if_index 1 dst 02:fe:25:07:17:52 src 02:fe:7b:06:08:36 bd_index 1
00:30:46:742761: l2-fwd
l2-fwd: sw_if_index 1 dst 02:fe:25:07:17:52 src 02:fe:7b:06:08:36 bd_index 1 result [0x1010000000002, 2] none
00:30:46:742766: l2-output
l2-output: sw_if_index 2 dst 02:fe:25:07:17:52 src 02:fe:7b:06:08:36 data 08 00 45 00 00 54 c3 da 40 00 40 01
00:30:46:742770: tap2-output
tap2 flags 0x00180005
IP4: 02:fe:7b:06:08:36 -> 02:fe:25:07:17:52
ICMP: 192.168.1.1 -> 192.168.1.2
tos 0x00, ttl 64, length 84, checksum 0xf37a dscp CS0 ecn NON_ECN
fragment id 0xc3da, flags DONT_FRAGMENT
ICMP echo_request checksum 0x6521 id 24956
00:30:46:742779: tap2-tx
buffer 0x9fc65: current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
l2-hdr-offset 0 l3-hdr-offset 14
hdr-sz 0 l2-hdr-offset 0 l3-hdr-offset 14 l4-hdr-offset 0 l4-hdr-sz 0
IP4: 02:fe:7b:06:08:36 -> 02:fe:25:07:17:52
ICMP: 192.168.1.1 -> 192.168.1.2
tos 0x00, ttl 64, length 84, checksum 0xf37a dscp CS0 ecn NON_ECN
fragment id 0xc3da, flags DONT_FRAGMENT
ICMP echo_request checksum 0x6521 id 24956
Packet 2
00:30:46:742829: virtio-input
virtio: hw_if_index 2 next-index 4 vring 0 len 98
hdr: flags 0x00 gso_type 0x00 hdr_len 0 gso_size 0 csum_start 0 csum_offset 0 num_buffers 1
00:30:46:742831: ethernet-input
frame: flags 0x1, hw-if-index 2, sw-if-index 2
IP4: 02:fe:25:07:17:52 -> 02:fe:7b:06:08:36
00:30:46:742833: l2-input
l2-input: sw_if_index 2 dst 02:fe:7b:06:08:36 src 02:fe:25:07:17:52 [l2-learn l2-fwd l2-flood l2-flood ]
00:30:46:742835: l2-learn
l2-learn: sw_if_index 2 dst 02:fe:7b:06:08:36 src 02:fe:25:07:17:52 bd_index 1
00:30:46:742836: l2-fwd
l2-fwd: sw_if_index 2 dst 02:fe:7b:06:08:36 src 02:fe:25:07:17:52 bd_index 1 result [0x1010000000001, 1] none
00:30:46:742838: l2-output
l2-output: sw_if_index 1 dst 02:fe:7b:06:08:36 src 02:fe:25:07:17:52 data 08 00 45 00 00 54 06 8b 00 00 40 01
00:30:46:742840: tap1-output
tap1 flags 0x00180005
IP4: 02:fe:25:07:17:52 -> 02:fe:7b:06:08:36
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0xf0ca dscp CS0 ecn NON_ECN
fragment id 0x068b
ICMP echo_reply checksum 0x6d21 id 24956
00:30:46:742842: tap1-tx
buffer 0x9d58c: current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x1
l2-hdr-offset 0 l3-hdr-offset 14
hdr-sz 0 l2-hdr-offset 0 l3-hdr-offset 14 l4-hdr-offset 0 l4-hdr-sz 0
IP4: 02:fe:25:07:17:52 -> 02:fe:7b:06:08:36
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0xf0ca dscp CS0 ecn NON_ECN
fragment id 0x068b
ICMP echo_reply checksum 0x6d21 id 24956
通过上面流程图,ping请求及回应报文都查询到mac表,走已知单播转发流程。至此我们学习二层广播报文及单播报文的转发流程。后面再详细分析代码。
本文分享自 DPDK VPP源码分析 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!