This blog post explains how computers running the Linux kernel send packets, as well as how to monitor and tune each component of the networking stack as packets flow from user programs to network hardware.
这篇博客文章解释了运行 Linux 内核的计算机如何发送数据包,以及如何在数据包从用户程序流向网络硬件时监视和调整网络堆栈的每个组件。
It is impossible to tune or monitor the Linux networking stack without reading the source code of the kernel and having a deep understanding of what exactly is happening.
This blog post will hopefully serve as a reference to anyone looking to do this.
如果不阅读内核的源代码并深入了解究竟发生了什么,就不可能调整或监控Linux网络堆栈。
这篇博客文章有望作为任何希望这样做的人的参考。
有关监视和调整 Linux 网络堆栈的一般建议
As mentioned in our previous article, the Linux network stack is complex and there is no one size fits all solution for monitoring or tuning. If you truly want to tune the network stack, you will have no choice but to invest a considerable amount of time, effort, and money into understanding how the various parts of networking system interact.
正如我们在上一篇文章中提到的,Linux 网络堆栈很复杂,没有一种适合所有监视或调优的解决方案。如果您真的想调整网络堆栈,您将别无选择,只能投入大量的时间,精力和金钱来了解网络系统的各个部分如何交互。
Many of the example settings provided in this blog post are used solely for illustrative purposes and are not a recommendation for or against a certain configuration or default setting. Before adjusting any setting, you should develop a frame of reference around what you need to be monitoring to notice a meaningful change.
此博客文章中提供的许多示例设置仅用于说明目的,而不是支持或反对特定配置或默认设置的建议。在调整任何设置之前,您应该围绕需要监视的内容开发一个参考框架,以注意到有意义的变化。
Adjusting networking settings while connected to the machine over a network is dangerous; you could very easily lock yourself out or completely take out your networking. Do not adjust these settings on production machines; instead, make adjustments on new machines and rotate them into production, if possible.
通过网络连接到机器时调整网络设置是危险的;你可以很容易地把自己锁在外面,或者完全把你的网络拿出来。请勿在生产计算机上调整这些设置;相反,如果可能的话,在新机器上进行调整并将其旋转到生产中。
For reference, you may want to have a copy of the device data sheet handy. This post will examine the Intel I350 Ethernet controller, controlled by the igb device driver. You can find that data sheet (warning: LARGE PDF) here for your reference.
作为参考,您可能希望手头有一份设备数据手册。这篇文章将研究由 igb 设备驱动程序控制的英特尔 I350 以太网控制器。您可以在此处找到该数据表(警告:大PDF)供您参考。
The high-level path network data takes from a user program to a network device is as follows:
网络数据从用户程序获取到网络设备的高级路径如下所示:
sendto
, sendmsg
, et. al.).AF_INET
).NET_TX
softirq.NET_RX
softirq.This entire flow will be examined in detail in the following sections.
The protocol layers examined below are the IP and UDP protocol layers. Much of the information presented will serve as a reference for other protocol layers, as well.
以下各节将详细介绍整个流程。
下面检查的协议层是 IP 和 UDP 协议层。提供的大部分信息也将作为其他协议层的参考。
This blog post will be examining the Linux kernel version 3.13.0 with links to code on GitHub and code snippets throughout this post, much like the companion post.
这篇博客文章将研究Linux内核版本3.13.0,其中包含指向GitHub上的代码的链接以及本文中的代码片段,就像配套文章一样。
Let’s begin by examining how protocol families are registered in the kernel and used by the socket subsystem, then we can proceed to receiving data.
让我们首先检查协议家族如何在内核中注册并由套接字子系统使用,然后我们可以继续接收数据。
What happens when you run a piece of code like this in a user program to create a UDP socket? 当您在用户程序中运行这样的一段代码以创建UDP套接字时,会发生什么情况?
sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
In short, the Linux kernel looks up a set of functions exported by the UDP protocol stack that deal with many things including sending and receiving network data. To understand exactly how this work, we have to look into the AF_INET address family code.
简而言之,Linux内核查找由UDP协议栈导出的一组函数,这些函数处理许多事情,包括发送和接收网络数据。为了准确理解它是如何工作的,我们必须研究AF_INET解决家庭代码。
The Linux kernel executes the inet_init
function early during kernel initialization. This function registers the AF_INET
protocol family, the individual protocol stacks within that family (TCP, UDP, ICMP, and RAW), and calls initialization routines to get protocol stacks ready to process network data. You can find the code for inet_init
in ./net/ipv4/af_inet.c.
Linux 内核在内核初始化期间很早就执行 inet_init 函数。此函数注册AF_INET协议系列、该系列中的各个协议栈(TCP、UDP、ICMP 和 RAW),并调用初始化例程以使协议栈准备好处理网络数据。您可以在 ./net/ipv4/af_inet.c 中找到inet_init的代码。
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/af_inet.c#L1678-L1804
static int __init inet_init(void)
{
struct inet_protosw *q;
struct list_head *r;
int rc = -EINVAL;
BUILD_BUG_ON(sizeof(struct inet_skb_parm) > FIELD_SIZEOF(struct sk_buff, cb));
sysctl_local_reserved_ports = kzalloc(65536 / 8, GFP_KERNEL);
if (!sysctl_local_reserved_ports)
goto out;
rc = proto_register(&tcp_prot, 1);
if (rc)
goto out_free_reserved_ports;
rc = proto_register(&udp_prot, 1);
if (rc)
goto out_unregister_tcp_proto;
rc = proto_register(&raw_prot, 1);
if (rc)
goto out_unregister_udp_proto;
rc = proto_register(&ping_prot, 1);
if (rc)
goto out_unregister_raw_proto;
/*
* Tell SOCKET that we are alive...
*/
(void)sock_register(&inet_family_ops);
#ifdef CONFIG_SYSCTL
ip_static_sysctl_init();
#endif
/*
* Add all the base protocols.
*/
if (inet_add_protocol(&icmp_protocol, IPPROTO_ICMP) < 0)
pr_crit("%s: Cannot add ICMP protocol\n", __func__);
if (inet_add_protocol(&udp_protocol, IPPROTO_UDP) < 0)
pr_crit("%s: Cannot add UDP protocol\n", __func__);
if (inet_add_protocol(&tcp_protocol, IPPROTO_TCP) < 0)
pr_crit("%s: Cannot add TCP protocol\n", __func__);
#ifdef CONFIG_IP_MULTICAST
if (inet_add_protocol(&igmp_protocol, IPPROTO_IGMP) < 0)
pr_crit("%s: Cannot add IGMP protocol\n", __func__);
#endif
/* Register the socket-side information for inet_create. */
for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
INIT_LIST_HEAD(r);
for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
inet_register_protosw(q);
/*
* Set the ARP module up
*/
arp_init();
/*
* Set the IP module up
*/
ip_init();
tcp_v4_init();
/* Setup TCP slab cache for open requests. */
tcp_init();
/* Setup UDP memory threshold */
udp_init();
/* Add UDP-Lite (RFC 3828) */
udplite4_register();
ping_init();
/*
* Set the ICMP layer up
*/
if (icmp_init() < 0)
panic("Failed to create the ICMP control socket.\n");
/*
* Initialise the multicast router
*/
#if defined(CONFIG_IP_MROUTE)
if (ip_mr_init())
pr_crit("%s: Cannot init ipv4 mroute\n", __func__);
#endif
/*
* Initialise per-cpu ipv4 mibs
*/
if (init_ipv4_mibs())
pr_crit("%s: Cannot init ipv4 mibs\n", __func__);
ipv4_proc_init();
ipfrag_init();
dev_add_pack(&ip_packet_type);
rc = 0;
out:
return rc;
out_unregister_raw_proto:
proto_unregister(&raw_prot);
out_unregister_udp_proto:
proto_unregister(&udp_prot);
out_unregister_tcp_proto:
proto_unregister(&tcp_prot);
out_free_reserved_ports:
kfree(sysctl_local_reserved_ports);
goto out;
}
fs_initcall(inet_init);
The AF_INET
protocol family exports a structure that has a create
function. This function is called by the kernel when a socket is created from a user program:
AF_INET协议系列导出具有创建函数的结构。当从用户程序创建套接字时,内核调用此函数:
https://github.com/torvalds/linux/blob/d8ec26d7f8287f5788a494f56e8814210f0e64be/net/ipv4/af_inet.c#L992
static const struct net_proto_family inet_family_ops = {
.family = PF_INET,
.create = inet_create,
.owner = THIS_MODULE,
};
The inet_create
function takes the arguments passed to the socket system call and searches the registered protocols to find a set of operations to link to the socket. Take a look:
inet_create函数获取传递给套接字系统调用的参数,并搜索已注册的协议以查找要链接到套接字的一组操作。看一看:
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/af_inet.c#L267
/* Look for the requested type/protocol pair. */
lookup_protocol:
err = -ESOCKTNOSUPPORT;
rcu_read_lock();
list_for_each_entry_rcu(answer, &inetsw[sock->type], list) {
err = 0;
/* Check the non-wild match. */
if (protocol == answer->protocol) {
if (protocol != IPPROTO_IP)
break;
} else {
/* Check for the two wild cases. */
if (IPPROTO_IP == protocol) {
protocol = answer->protocol;
break;
}
if (IPPROTO_IP == answer->protocol)
break;
}
err = -EPROTONOSUPPORT;
}
Later, answer
which holds a reference to a particular protocol stack has its ops
fields copied into the socket structure:
稍后,包含对特定协议栈的引用的答案会将其操作字段复制到套接字结构中:
https://github.com/torvalds/linux/blob/d8ec26d7f8287f5788a494f56e8814210f0e64be/net/ipv4/af_inet.c#L316
sock->ops = answer->ops;
You can find the structure definitions for all of the protocol stacks in af_inet.c. Let’s take a look at the TCP and UDP protocol structures:
您可以在af_inet.c中找到所有协议栈的结构定义。让我们看一下 TCP 和 UDP 协议结构:
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/af_inet.c#L998-L1020
/* Upon startup we insert all the elements in inetsw_array[] into
* the linked list inetsw.
*/
static struct inet_protosw inetsw_array[] =
{
{
.type = SOCK_STREAM,
.protocol = IPPROTO_TCP,
.prot = &tcp_prot,
.ops = &inet_stream_ops,
.no_check = 0,
.flags = INET_PROTOSW_PERMANENT |
INET_PROTOSW_ICSK,
},
{
.type = SOCK_DGRAM,
.protocol = IPPROTO_UDP,
.prot = &udp_prot,
.ops = &inet_dgram_ops,
.no_check = UDP_CSUM_DEFAULT,
.flags = INET_PROTOSW_PERMANENT,
},
{
.type = SOCK_DGRAM,
.protocol = IPPROTO_ICMP,
.prot = &ping_prot,
.ops = &inet_dgram_ops,
.no_check = UDP_CSUM_DEFAULT,
.flags = INET_PROTOSW_REUSE,
},
{
.type = SOCK_RAW,
.protocol = IPPROTO_IP, /* wild card */
.prot = &raw_prot,
.ops = &inet_sockraw_ops,
.no_check = UDP_CSUM_DEFAULT,
.flags = INET_PROTOSW_REUSE,
}
};
In the case of IPPROTO_UDP
, an ops
structure is linked into place which contains functions for various things, including sending and receiving data:
在IPPROTO_UDP的情况下,一个运维结构被链接到适当的位置,其中包含各种功能,包括发送和接收数据:
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/af_inet.c#L935-L960
const struct proto_ops inet_dgram_ops = {
.family = PF_INET,
.owner = THIS_MODULE,
.release = inet_release,
.bind = inet_bind,
.connect = inet_dgram_connect,
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = inet_getname,
.poll = udp_poll,
.ioctl = inet_ioctl,
.listen = sock_no_listen,
.shutdown = inet_shutdown,
.setsockopt = sock_common_setsockopt,
.getsockopt = sock_common_getsockopt,
.sendmsg = inet_sendmsg,
.recvmsg = inet_recvmsg,
.mmap = sock_no_mmap,
.sendpage = inet_sendpage,
#ifdef CONFIG_COMPAT
.compat_setsockopt = compat_sock_common_setsockopt,
.compat_getsockopt = compat_sock_common_getsockopt,
.compat_ioctl = inet_compat_ioctl,
#endif
};
EXPORT_SYMBOL(inet_dgram_ops);
and a protocol-specific structure prot
, which contains function pointers to all the internal UDP protocol stack function. For the UDP protocol, this structure is called udp_prot
and is exported by ./net/ipv4/udp.c:
和特定于协议的结构 prot,其中包含指向所有内部 UDP 协议栈函数的函数指针。对于 UDP 协议,此结构称为 udp_prot,由 ./net/ipv4/udp 导出.c:
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/udp.c#L2171-L2203
struct proto udp_prot = {
.name = "UDP",
.owner = THIS_MODULE,
.close = udp_lib_close,
.connect = ip4_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
.destroy = udp_destroy_sock,
.setsockopt = udp_setsockopt,
.getsockopt = udp_getsockopt,
.sendmsg = udp_sendmsg,
.recvmsg = udp_recvmsg,
.sendpage = udp_sendpage,
.backlog_rcv = __udp_queue_rcv_skb,
.release_cb = ip4_datagram_release_cb,
.hash = udp_lib_hash,
.unhash = udp_lib_unhash,
.rehash = udp_v4_rehash,
.get_port = udp_v4_get_port,
.memory_allocated = &udp_memory_allocated,
.sysctl_mem = sysctl_udp_mem,
.sysctl_wmem = &sysctl_udp_wmem_min,
.sysctl_rmem = &sysctl_udp_rmem_min,
.obj_size = sizeof(struct udp_sock),
.slab_flags = SLAB_DESTROY_BY_RCU,
.h.udp_table = &udp_table,
#ifdef CONFIG_COMPAT
.compat_setsockopt = compat_udp_setsockopt,
.compat_getsockopt = compat_udp_getsockopt,
#endif
.clear_sk = sk_prot_clear_portaddr_nulls,
};
EXPORT_SYMBOL(udp_prot);
Now, let’s turn to a user program that sends UDP data to see how udp_sendmsg
is called in the kernel!
现在,让我们转向一个发送UDP数据的用户程序,看看内核中如何调用udp_sendmsg!
Sending network data via a socket
A user program wants to send UDP network data and so it uses the sendto
system call, maybe like this:
用户程序想要发送UDP网络数据,因此它使用sentto系统调用,可能如下所示:
ret = sendto(socket, buffer, buflen, 0, &dest, sizeof(dest));
This system call passes through the Linux system call layer and lands in this function in ./net/socket.c
:
此系统调用通过 Linux 系统调用层,并驻留在 ./net/socket 中的此函数中.c:
https://github.com/torvalds/linux/blob/v3.13/net/socket.c#L1756-L1803
/*
* Send a datagram to a given address. We move the address into kernel
* space and check the user space data area is readable before invoking
* the protocol.
*/
SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
unsigned int, flags, struct sockaddr __user *, addr,
int, addr_len)
{
struct socket *sock;
struct sockaddr_storage address;
int err;
struct msghdr msg;
struct iovec iov;
int fput_needed;
if (len > INT_MAX)
len = INT_MAX;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (!sock)
goto out;
iov.iov_base = buff;
iov.iov_len = len;
msg.msg_name = NULL;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
if (addr) {
err = move_addr_to_kernel(addr, addr_len, &address);
if (err < 0)
goto out_put;
msg.msg_name = (struct sockaddr *)&address;
msg.msg_namelen = addr_len;
}
if (sock->file->f_flags & O_NONBLOCK)
flags |= MSG_DONTWAIT;
msg.msg_flags = flags;
err = sock_sendmsg(sock, &msg, len);
out_put:
fput_light(sock->file, fput_needed);
out:
return err;
}
The SYSCALL_DEFINE6
macro unfolds into a pile of macros, which in turn, set up the infrastructure needed to create a system call with 6 arguments (hence DEFINE6
). One of the results of this is that inside the kernel, system call function names have sys_
prepended to them.
SYSCALL_DEFINE6宏展开为一堆宏,这些宏反过来又设置了创建具有 6 个参数的系统调用所需的基础结构(因此定义为 6)。这样做的结果之一是,在内核内部,系统调用函数名称具有附加sys_。
The system call code for sendto
calls sock_sendmsg
after arranging the data in a way that the lower layers will be able to handle. In particular, it takes the destination address passed into sendto
and arranges it into a structure, let’s take a look:
sentto调用的系统调用代码sock_sendmsg以较低层能够处理的方式排列数据之后。特别是,它将传递到 sendto 的目标地址并排列成一个结构,让我们看一下:
iov.iov_base = buff;
iov.iov_len = len;
msg.msg_name = NULL;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
if (addr) {
err = move_addr_to_kernel(addr, addr_len, &address);
if (err < 0)
goto out_put;
msg.msg_name = (struct sockaddr *)&address;
msg.msg_namelen = addr_len;
}
This code is copying addr
, passed in via the user program into the kernel data structure address
, which is then embedded into a struct msghdr
structure as msg_name
. This is similar to what a userland program would do if it were calling sendmsg
instead of sendto
. The kernel provides this mutation because both sendto
and sendmsg
do call down to sock_sendmsg
.
此代码是复制 addr,通过用户程序传入内核数据结构地址,然后将其嵌入到结构 msghdr 结构中,作为msg_name。这类似于用户空间程序在调用 sendmsg 而不是 sendto 时会执行的操作。内核提供了这种突变,因为 sendto 和 sendmsg 都调用了sock_sendmsg。
sock_sendmsg
, __sock_sendmsg
, and __sock_sendmsg_nosec
sock_sendmsg
performs some error checking before calling __sock_sendmsg
does its own error checking before calling __sock_sendmsg_nosec
. __sock_sendmsg_nosec
passes the data deeper into the socket subsystem:
sock_sendmsg在调用之前执行一些错误检查,__sock_sendmsg在调用__sock_sendmsg_nosec之前执行自己的错误检查。__sock_sendmsg_nosec将数据更深地传递到套接字子系统中:
https://github.com/torvalds/linux/blob/d8ec26d7f8287f5788a494f56e8814210f0e64be/net/socket.c#L622
static inline int __sock_sendmsg_nosec(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size)
{
struct sock_iocb *si = kiocb_to_siocb(iocb);
si->sock = sock;
si->scm = NULL;
si->msg = msg;
si->size = size;
return sock->ops->sendmsg(iocb, sock, msg, size);
}
As seen in the previous section explaining socket creation, the sendmsg
function registered to this socket ops structure is inet_sendmsg
.
如上一节解释套接字创建所示,注册到此套接字操作结构的 sendmsg 函数inet_sendmsg。
inet_sendmsg
As you may have guessed from the name, this is a generic function provided by the AF_INET
protocol family. This function starts by calling sock_rps_record_flow
to record the last CPU that the flow was processed on; this is used by Receive Packet Steering. Next, this function looks up the sendmsg
function on the socket’s internal protocol operations structure and calls it:
正如您可能已经从名称中猜到的那样,这是AF_INET协议系列提供的通用函数。此函数首先调用sock_rps_record_flow以记录上次处理流的CPU;这由接收数据包转向使用。接下来,此函数在套接字的内部协议操作结构上查找 sendmsg 函数并调用它:
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/af_inet.c#L935-L960
int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
size_t size)
{
struct sock *sk = sock->sk;
sock_rps_record_flow(sk);
/* We may need to bind the socket. */
if (!inet_sk(sk)->inet_num && !sk->sk_prot->no_autobind &&
inet_autobind(sk))
return -EAGAIN;
return sk->sk_prot->sendmsg(iocb, sk, msg, size);
}
EXPORT_SYMBOL(inet_sendmsg);
When dealing with UDP, sk->sk_prot->sendmsg
above is udp_sendmsg
as exported by the UDP protocol layer, via the udp_prot
structure we saw earlier. This function call transitions from the generic AF_INET
protocol family on to the UDP protocol stack.
在处理 UDP 时,上面的 sk->sk_prot->sendmsg udp_sendmsg由 UDP 协议层通过我们之前看到的udp_prot结构导出。此函数调用从通用AF_INET协议系列转换到 UDP 协议栈。
udp_sendmsg
The udp_sendmsg
function can be found in ./net/ipv4/udp.c. The entire function is quite long, so we’ll examine pieces of it below. Follow the previous link if you’d like to read it in its entirety
udp_sendmsg函数可以在 ./net/ipv4/udp.c 中找到。整个函数很长,因此我们将在下面检查其中的各个部分。如果您想完整阅读,请点击上一个链接。
https://github.com/torvalds/linux/blob/v3.13/net/ipv4/udp.c#L845-L1088
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
struct inet_sock *inet = inet_sk(sk);
struct udp_sock *up = udp_sk(sk);
struct flowi4 fl4_stack;
struct flowi4 *fl4;
int ulen = len;
struct ipcm_cookie ipc;
struct rtable *rt = NULL;
int free = 0;
int connected = 0;
__be32 daddr, faddr, saddr;
__be16 dport;
u8 tos;
int err, is_udplite = IS_UDPLITE(sk);
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
struct sk_buff *skb;
struct ip_options_data opt_copy;
}
After variable declarations and some basic error checking, one of the first things udp_sendmsg
does is check if the socket is “corked”. UDP corking is a feature that allows a user program request that the kernel accumulate data from multiple calls to send
into a single datagram before sending. There are two ways to enable this option in your user program:
在变量声明和一些基本的错误检查之后,udp_sendmsg做的第一件事就是检查套接字是否“软木塞”。UDP 软木塞是一项功能,它允许用户程序请求内核从多个调用中累积数据,以便在发送之前发送到单个数据报中。有两种方法可以在用户程序中启用此选项:
setsockopt
system call and pass UDP_CORK
as the socket option.MSG_MORE
as one of the flags
when calling send
, sendto
, or sendmsg
from your program.The code from udp_sendmsg
checks up->pending
to determine if the socket is currently corked, and if so, it proceeds directly to appending data. We’ll see how data is appended later.
udp_sendmsg中的代码将检查 up->pending 以确定套接字当前是否已填充,如果是,则直接进行追加数据。稍后我们将看到如何追加数据。
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
/* variables and error checking ... */
fl4 = &inet->cork.fl.u.ip4;
if (up->pending) {
/*
* There are pending frames.
* The socket lock must be held while it's corked.
*/
lock_sock(sk);
if (likely(up->pending)) {
if (unlikely(up->pending != AF_INET)) {
release_sock(sk);
return -EINVAL;
}
goto do_append_data;
}
release_sock(sk);
}
Next, the destination address and port are determined from one of two possible sources:
接下来,从以下两个可能的来源之一确定目标地址和端口:
sendto
.Here’s how the kernel deals with this:
内核是这样处理这个问题的:
/*
* Get and verify the address.
*/
if (msg->msg_name) {
struct sockaddr_in *usin = (struct sockaddr_in *)msg->msg_name;
if (msg->msg_namelen < sizeof(*usin))
return -EINVAL;
if (usin->sin_family != AF_INET) {
if (usin->sin_family != AF_UNSPEC)
return -EAFNOSUPPORT;
}
daddr = usin->sin_addr.s_addr;
dport = usin->sin_port;
if (dport == 0)
return -EINVAL;
} else {
if (sk->sk_state != TCP_ESTABLISHED)
return -EDESTADDRREQ;
daddr = inet->inet_daddr;
dport = inet->inet_dport;
/* Open fast path for connected socket.
Route will not be used, if at least one option is set.
*/
connected = 1;
}
Yes, that is a TCP_ESTABLISHED
in the UDP protocol layer! The socket states for better or worse use TCP state descriptions.
是的,这是UDP协议层中的一个TCP_ESTABLISHED!无论好坏,套接字状态都使用 TCP 状态描述。
Recall earlier that we saw how the kernel arranges a struct msghdr
structure on behalf of the user when the user program calls sendto
. The code above shows how the kernel parses that data back out in order to set daddr
and dport
.
回想一下,我们之前看到内核如何在用户程序调用 sendto 时代表用户安排结构 msghdr 结构。上面的代码显示了内核如何解析该数据以设置 daddr 和 dport。
If the udp_sendmsg
function was reached by kernel function which did not arrange a struct msghdr
structure, the destination address and port are retrieved from the socket itself and the socket is marked as “connected.”
如果 udp_sendmsg内核函数未排列结构 msghdr 结构,则从套接字本身检索目标地址和端口,并将套接字标记为“已连接”。
In either case daddr
and dport
will be set to the destination address and port.
在任何一种情况下,daddr 和 dport 都将设置为目标地址和端口。
Next, the source address, device index, and any timestamping options which were set on the socket (like SOCK_TIMESTAMPING_TX_HARDWARE
, SOCK_TIMESTAMPING_TX_SOFTWARE
, SOCK_WIFI_STATUS
) are retrieved and stored:
接下来,检索并存储源地址、设备索引以及在套接字上设置的任何时间戳选项(如SOCK_TIMESTAMPING_TX_HARDWARE、SOCK_TIMESTAMPING_TX_SOFTWARE、SOCK_WIFI_STATUS):
ipc.addr = inet->inet_saddr;
ipc.oif = sk->sk_bound_dev_if;
sock_tx_timestamp(sk, &ipc.tx_flags);
https://blog.packagecloud.io/monitoring-tuning-linux-networking-stack-sending-data/