为什么同步过程能够访问current指针,而异步的过程不应该访问current指针?
同步过程其实都是由当前进程触发的,内核需要帮助当前进程来完成某项工作,所以内核在这种情况下是可以访问current指针。异步过程是由外部中断触发,比如说网络包接受中断,接收到网络包之后当前正在运行的进程不一定就是接受这个网络包的进程,需要内核处理后发送给某个进程,因此异步过程中不应该访问current 指针。
在 x86_64 架构中 cpu 寄存器里面包含一个中断描述符寄存器 (idtr) 寄存器,该寄存器指向内存中存放的中断描述符表(元素为中断描述符的数组,元素个数为256),当中断发生时硬件通过idtr寄存器找到中断描述符表在内存中的位置,之后根据中断标号找到相应的中断描述符拿到中断处理函数在内存中的地址,硬件自动将一些必要寄存器入栈,同时跳转到中断处理函数地址后cpu执行中断处理函数,中断描述符定义如下所示
(gdb) ptype idt_table[0]
type = struct gate_struct {
u16 offset_low;
u16 segment;
struct idt_bits bits;
u16 offset_middle;
u32 offset_high;
u32 reserved;
}
其中 offset_high offset_middle offset_low 组合的地址就是中断处理函数所在地址。
通过 qemu 启动 linux 内核,通过 gdb 连接 qemu 并运行内核,待内核启动完成后首先可以输入p &idt_table 查看中断描述符表在内存中的位置
(gdb) p sizeof(struct gate_struct)
$1 = 16
(gdb) p &idt_table
$2 = (gate_desc (*)[256]) 0xffffffff8352d000 <idt_table>
(gdb) p /x idt_table[0]
$3 = {offset_low = 0x1030, segment = 0x10, bits = {ist = 0x0, zero = 0x0, type = 0xe, dpl = 0x0, p = 0x1}, offset_middle = 0x8100, offset_high = 0xffffffff, reserved = 0x0}
(gdb) info symbol 0xffffffff81001030
asm_exc_divide_error in section .text
以上展示了第一个中断处理函数的地址及函数名,这个函数是处理除零异常中断,接下来通过定义宏遍历一遍中断描述符表,看看都有哪些中断处理函数
(gdb) define irq_handler
Type commands for definition of "irq_handler".
End with a line saying just "end".
>set $tmp=$arg0
>set $addr=((long long)idt_table[$tmp].offset_high<<32)|((long long)idt_table[$tmp].offset_middle<<16)|((long long)idt_table[$tmp].offset_low)
>printf "%d item in idt table, irq handler addr[0x%x]\n", $tmp, $addr
>info symbol $addr
>end
(gdb) irq_handler 0
0 item in idt table, irq handler addr[0x81001030]
asm_exc_divide_error in section .text
(gdb) set $i=0
(gdb) irq_handler $i++
0 item in idt table, irq handler addr[0x81001030]
asm_exc_divide_error in section .text
(gdb)
1 item in idt table, irq handler addr[0x81001310]
asm_exc_debug in section .text
(gdb)
2 item in idt table, irq handler addr[0x81001a70]
asm_exc_nmi in section .text
(gdb)
3 item in idt table, irq handler addr[0x81001240]
asm_exc_int3 in section .text
(gdb)
4 item in idt table, irq handler addr[0x81001050]
asm_exc_overflow in section .text
(gdb)
5 item in idt table, irq handler addr[0x81001070]
asm_exc_bounds in section .text
(gdb)
6 item in idt table, irq handler addr[0x81001220]
asm_exc_invalid_op in section .text
(gdb)
7 item in idt table, irq handler addr[0x81001090]
asm_exc_device_not_available in section .text
(gdb)
8 item in idt table, irq handler addr[0x81001350]
asm_exc_double_fault in section .text
(gdb)
9 item in idt table, irq handler addr[0x810010b0]
asm_exc_coproc_segment_overrun in section .text
(gdb)
10 item in idt table, irq handler addr[0x81001130]
asm_exc_invalid_tss in section .text
(gdb)
11 item in idt table, irq handler addr[0x81001160]
asm_exc_segment_not_present in section .text
(gdb)
12 item in idt table, irq handler addr[0x81001190]
asm_exc_stack_segment in section .text
定义好中断描述符后通过上图的方式就能查看所有中断处理函数的地址以及中断函数名称
asm_sysvec_apic_timer_interrupt 中断处理函数在系统运行时会被频繁触发,通过之前编写的宏可以查看该函数地址的位置,我们接下来在这个函数打上断点来分析操作系统中断发生时硬件及软件的行为。
(gdb) irq_handler 236
236 item in idt table, irq handler addr[0x81001470]
asm_sysvec_apic_timer_interrupt in section .text
(gdb) b asm_sysvec_apic_timer_interrupt
Breakpoint 1 at 0xffffffff81001470: file ./arch/x86/include/asm/idtentry.h, line 702.
(gdb) c
Continuing.
在运行到中断处理函数入口时,我们先来看一下这个函数的反汇编以及栈内存
(gdb) disassemble
Dump of assembler code for function asm_sysvec_apic_timer_interrupt:
=> 0xffffffff81001470 <+0>: endbr64
0xffffffff81001474 <+4>: nopl (%rax)
0xffffffff81001477 <+7>: cld
0xffffffff81001478 <+8>: pushq $0xffffffffffffffff
0xffffffff8100147a <+10>: callq 0xffffffff81001910 <error_entry>
0xffffffff8100147f <+15>: mov %rax,%rsp
0xffffffff81001482 <+18>: mov %rsp,%rdi
0xffffffff81001485 <+21>: callq 0xffffffff8215cc20 <sysvec_apic_timer_interrupt>
0xffffffff8100148a <+26>: jmpq 0xffffffff81001a50 <error_return>
End of assembler dump.c
(gdb) x /50a $rsp
0xffffffff82a03e58 <init_thread_union+15960>: 0xffffffff8215d04f <pv_native_safe_halt+15> 0x10
0xffffffff82a03e68 <init_thread_union+15976>: 0x206 0xffffffff82a03e80 <init_thread_union+16000>
0xffffffff82a03e78 <init_thread_union+15992>: 0x18 0xffffffff8215e6c9 <default_idle+9>
0xffffffff82a03e88 <init_thread_union+16008>: 0xffffffff8215e93b <default_idle_call+43> 0xffffffff812e4d85 <do_idle+437>
0xffffffff82a03e98 <init_thread_union+16024>: 0xffff888007a289c0 0x3a34513d6cf2cb00
0xffffffff82a03ea8 <init_thread_union+16040>: 0xffff888007ed0008 0xed
0xffffffff82a03eb8 <init_thread_union+16056>: 0x0 0xffff888007ed0000
0xffffffff82a03ec8 <init_thread_union+16072>: 0xffffffff82a0e030 <envp_init+16> 0xffffffff812e4fb4 <cpu_startup_entry+36>
0xffffffff82a03ed8 <init_thread_union+16088>: 0x2 0xffffffff8215f02c
0xffffffff82a03ee8 <init_thread_union+16104>: 0x9 0xffffffff8325ee16 <start_kernel+1318>
0xffffffff82a03ef8 <init_thread_union+16120>: 0xffffffff83332020 <command_line> 0x0
0xffffffff82a03f08 <init_thread_union+16136>: 0x0 0x14970
0xffffffff82a03f18 <init_thread_union+16152>: 0xb0 0x0
0xffffffff82a03f28 <init_thread_union+16168>: 0x0 0xffffffff8326aac8 <x86_64_start_reservations+24>
0xffffffff82a03f38 <init_thread_union+16184>: 0xffffffff8326ac0b <x86_64_start_kernel+203> 0x0
0xffffffff82a03f48 <init_thread_union+16200>: 0x0 0xffffffff8122faa6 <secondary_startup_64+342>
0xffffffff82a03f58 <__top_init_kernel_stack>: 0x1f0f2e6600000000 0x2e66000000000084
0xffffffff82a03f68 <__top_init_kernel_stack+16>: 0x841f0f 0x841f0f2e66
0xffffffff82a03f78 <__top_init_kernel_stack+32>: 0x841f0f2e660000 0x1f0f2e6600000000
0xffffffff82a03f88 <__top_init_kernel_stack+48>: 0x2e66000000000084 0x841f0f
0xffffffff82a03f98 <__top_init_kernel_stack+64>: 0x841f0f2e66 0x841f0f2e660000
0xffffffff82a03fa8 <__top_init_kernel_stack+80>: 0x1f0f2e6600000000 0x2e66000000000084
0xffffffff82a03fb8 <__top_init_kernel_stack+96>: 0x841f0f 0x841f0f2e66
0xffffffff82a03fc8 <__top_init_kernel_stack+112>: 0x841f0f2e660000 0x1f0f2e6600000000
0xffffffff82a03fd8 <__top_init_kernel_stack+128>: 0x2e66000000000084 0x841f0f
通过反汇编可以看出真正的中断处理过程在 sysvec_apic_timer_interrupt 这个函数中,我们在这个函数打上断点继续执行
(gdb) b sysvec_apic_timer_interrupt
Breakpoint 2 at 0xffffffff8215cc20: file arch/x86/kernel/apic/apic.c, line 1050.
(gdb) c
Continuing.
Breakpoint 2, sysvec_apic_timer_interrupt (regs=0xffffffff82a03dd8 <init_thread_union+15832>) at arch/x86/kernel/apic/apic.c:1050
1050 DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt)
(gdb)
可以看出这个函数的入参为 regs,我们看一下这个入参的类型、大小以及内容
(gdb) ptype regs
type = struct pt_regs {
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long bp;
unsigned long bx;
unsigned long r11;
unsigned long r10;
unsigned long r9;
unsigned long r8;
unsigned long ax;
unsigned long cx;
unsigned long dx;
unsigned long si;
unsigned long di;
unsigned long orig_ax;
unsigned long ip;
union {
u16 cs;
u64 csx;
struct fred_cs fred_cs;
};
unsigned long flags;
unsigned long sp;
union {
u16 ss;
u64 ssx;
struct fred_ss fred_ss;
};
} *
(gdb) p /x *regs
$4 = {r15 = 0x14970, r14 = 0xffffffff82a0e030, r13 = 0x0, r12 = 0x0, bp = 0x0, bx = 0xffffffff82a0e900, r11 = 0x0, r10 = 0xffffffff82a0e900,
r9 = 0xe86f235840, r8 = 0xf7a4, ax = 0xffff88808450d000, cx = 0xffffffff, dx = 0x3f, si = 0xe851282c80, di = 0xf7a4, orig_ax = 0xffffffffffffffff,
ip = 0xffffffff8215d04f, {cs = 0x10, csx = 0x10, fred_cs = {cs = 0x10, sl = 0x0, wfe = 0x0}}, flags = 0x206, sp = 0xffffffff82a03e80, {ss = 0x18,
ssx = 0x18, fred_ss = {ss = 0x18, sti = 0x0, swevent = 0x0, nmi = 0x0, vector = 0x0, type = 0x0, enclave = 0x0, lm = 0x0, nested = 0x0,
insnlen = 0x0}}}
(gdb) p sizeof(*regs)
$5 = 168
(gdb) p /x sizeof(*regs)
$6 = 0xa8
由此可以看出 regs变量是一个struct pt_regs结构体指针,而 struct pt_regs 就是 linux 保存中断前现场的重要结构体,我们接下来继续看一下栈空间的分布
(gdb) x /50a $rsp
0xffffffff82a03dd0 <init_thread_union+15824>: 0xffffffff8100148a <asm_sysvec_apic_timer_interrupt+26> 0x14970
0xffffffff82a03de0 <init_thread_union+15840>: 0xffffffff82a0e030 <envp_init+16> 0x0
0xffffffff82a03df0 <init_thread_union+15856>: 0x0 0x0
0xffffffff82a03e00 <init_thread_union+15872>: 0xffffffff82a0e900 <init_task> 0x0
0xffffffff82a03e10 <init_thread_union+15888>: 0xffffffff82a0e900 <init_task> 0xe86f235840
0xffffffff82a03e20 <init_thread_union+15904>: 0xf7a4 0xffff88808450d000
0xffffffff82a03e30 <init_thread_union+15920>: 0xffffffff 0x3f
0xffffffff82a03e40 <init_thread_union+15936>: 0xe851282c80 0xf7a4
0xffffffff82a03e50 <init_thread_union+15952>: 0xffffffffffffffff 0xffffffff8215d04f <pv_native_safe_halt+15>
0xffffffff82a03e60 <init_thread_union+15968>: 0x10 0x206
0xffffffff82a03e70 <init_thread_union+15984>: 0xffffffff82a03e80 <init_thread_union+16000> 0x18
0xffffffff82a03e80 <init_thread_union+16000>: 0xffffffff8215e6c9 <default_idle+9> 0xffffffff8215e93b <default_idle_call+43>
0xffffffff82a03e90 <init_thread_union+16016>: 0xffffffff812e4d85 <do_idle+437> 0xffff888007a289c0
0xffffffff82a03ea0 <init_thread_union+16032>: 0x3a34513d6cf2cb00 0xffff888007ed0008
0xffffffff82a03eb0 <init_thread_union+16048>: 0xed 0x0
0xffffffff82a03ec0 <init_thread_union+16064>: 0xffff888007ed0000 0xffffffff82a0e030 <envp_init+16>
0xffffffff82a03ed0 <init_thread_union+16080>: 0xffffffff812e4fb4 <cpu_startup_entry+36> 0x2
0xffffffff82a03ee0 <init_thread_union+16096>: 0xffffffff8215f02c 0x9
0xffffffff82a03ef0 <init_thread_union+16112>: 0xffffffff8325ee16 <start_kernel+1318> 0xffffffff83332020 <command_line>
0xffffffff82a03f00 <init_thread_union+16128>: 0x0 0x0
0xffffffff82a03f10 <init_thread_union+16144>: 0x14970 0xb0
0xffffffff82a03f20 <init_thread_union+16160>: 0x0 0x0
0xffffffff82a03f30 <init_thread_union+16176>: 0xffffffff8326aac8 <x86_64_start_reservations+24> 0xffffffff8326ac0b <x86_64_start_kernel+203>
0xffffffff82a03f40 <init_thread_union+16192>: 0x0 0x0
0xffffffff82a03f50 <init_thread_union+16208>: 0xffffffff8122faa6 <secondary_startup_64+342> 0x1f0f2e6600000000
(gdb) p (long long)regs+sizeof(*regs)
$7 = -2103427456
(gdb) p /x (long long)regs+sizeof(*regs)
$8 = 0xffffffff82a03e80
我们接下来分析一下栈内存上都保留了什么信息。首先栈顶位置存放了一个代码段的地址,这个地址就是 asm_sysvec_apic_timer_interrupt 函数调用完 sysvec_apic_timer_interrupt 后要执行的下一条指令的地址(这个地址可以返回到本文对 asm_sysvec_apic_timer_interrupt 的反汇编的部分进行验证),这个值是callq指令执行时自动压入栈的,下一个位置就是 regs 指针指向的栈内存位置(注意,栈空间是由高地址向低地址增长的),越过栈上的 struct pt_regs 结构体,通过 p /x (long long)regs+sizeof(*regs) 查看更靠前的内存,得到的地址是 0xffffffff82a03e80,这个地址保存的数据指向了 default_idle 函数中的一条指令,值得注意的时 default_idle函数并不是中断处理函数调用链中的某个函数,分析栈空间内容,到 0xffffffff82a03e80 这个地址保存的内容就是发生中断前cpu正常运行时栈上的内容。
我们回到刚进入 asm_sysvec_apic_timer_interrupt 函数时栈空间内容来看,栈上并没有完整的struct pt_regs结构体,只有该结构体部分的内容,通过之前的分析 0xffffffff82a03e80 再往更低的地址看,就是中断上下文保存的内容,也就是刚进入 asm_sysvec_apic_timer_interrupt 函数时,栈空间 0xffffffff82a03e58~0xffffffff82a03e78 这部分内容其实是硬件自动压入栈的,配合 struct pt_regs 结构体定义来看,中断发生时硬件会自动依次将 ssx 寄存器,sp寄存器,flags寄存器,csx寄存器以及ip寄存器压入栈,那么构成 struct pt_regs 结构体的剩余部分内容是怎么入栈的?
回到 asm_sysvec_apic_timer_interrupt 函数的反汇编代码中,我们可以发现,在执行 sysvec_apic_timer_interrupt 函数前,中断流程先调用了一个名叫 error_entry 的函数,我们通过反汇编看看这个函数干了什么
(gdb) disassemble error_entry
Dump of assembler code for function error_entry:
0xffffffff81001910 <+0>: push %rsi
0xffffffff81001911 <+1>: mov 0x8(%rsp),%rsi
0xffffffff81001916 <+6>: mov %rdi,0x8(%rsp)
0xffffffff8100191b <+11>: push %rdx
0xffffffff8100191c <+12>: push %rcx
0xffffffff8100191d <+13>: push %rax
0xffffffff8100191e <+14>: push %r8
0xffffffff81001920 <+16>: push %r9
0xffffffff81001922 <+18>: push %r10
0xffffffff81001924 <+20>: push %r11
0xffffffff81001926 <+22>: push %rbx
0xffffffff81001927 <+23>: push %rbp
0xffffffff81001928 <+24>: push %r12
0xffffffff8100192a <+26>: push %r13
0xffffffff8100192c <+28>: push %r14
0xffffffff8100192e <+30>: push %r15
0xffffffff81001930 <+32>: push %rsi
0xffffffff81001931 <+33>: xor %esi,%esi
0xffffffff81001933 <+35>: xor %edx,%edx
0xffffffff81001935 <+37>: xor %ecx,%ecx
0xffffffff81001937 <+39>: xor %r8d,%r8d
0xffffffff8100193a <+42>: xor %r9d,%r9d
0xffffffff8100193d <+45>: xor %r10d,%r10d
0xffffffff81001940 <+48>: xor %r11d,%r11d
0xffffffff81001943 <+51>: xor %ebx,%ebx
0xffffffff81001945 <+53>: xor %ebp,%ebp
0xffffffff81001947 <+55>: xor %r12d,%r12d
0xffffffff8100194a <+58>: xor %r13d,%r13d
0xffffffff8100194d <+61>: xor %r14d,%r14d
0xffffffff81001950 <+64>: xor %r15d,%r15d
0xffffffff81001953 <+67>: testb $0x3,0x90(%rsp)
0xffffffff8100195b <+75>: je 0xffffffff810019ab <error_entry+155>
0xffffffff8100195d <+77>: swapgs
0xffffffff81001960 <+80>: nopl (%rax)
0xffffffff81001963 <+83>: jmp 0xffffffff81001976 <error_entry+102>
0xffffffff81001965 <+85>: mov %cr3,%rax
0xffffffff81001968 <+88>: nopl 0x0(%rax,%rax,1)
0xffffffff8100196d <+93>: and $0xffffffffffffe7ff,%rax
0xffffffff81001973 <+99>: mov %rax,%cr3
0xffffffff81001976 <+102>: jmp 0xffffffff8100198d <error_entry+125>
0xffffffff81001978 <+104>: mov $0x48,%ecx
0xffffffff8100197d <+109>: mov %gs:0x25085ab(%rip),%rdx # 0xffffffff83509f30 <x86_spec_ctrl_current>
0xffffffff81001985 <+117>: mov %edx,%eax
0xffffffff81001987 <+119>: shr $0x20,%rdx
0xffffffff8100198b <+123>: wrmsr
0xffffffff8100198d <+125>: nop
0xffffffff8100198e <+126>: nopl 0x0(%rax,%rax,1)
0xffffffff81001993 <+131>: jmp 0xffffffff810019a1 <error_entry+145>
0xffffffff81001995 <+133>: int3
0xffffffff81001996 <+134>: int3
0xffffffff81001997 <+135>: int3
0xffffffff81001998 <+136>: int3
0xffffffff81001999 <+137>: int3
0xffffffff8100199a <+138>: int3
--Type <RET> for more, q to quit, c to continue without paging--c
0xffffffff8100199b <+139>: int3
0xffffffff8100199c <+140>: int3
0xffffffff8100199d <+141>: int3
0xffffffff8100199e <+142>: int3
0xffffffff8100199f <+143>: int3
0xffffffff810019a0 <+144>: int3
0xffffffff810019a1 <+145>: lea 0x8(%rsp),%rdi
0xffffffff810019a6 <+150>: jmpq 0xffffffff8215a3f0 <sync_regs>
0xffffffff810019ab <+155>: lea -0x2e8(%rip),%rcx # 0xffffffff810016ca <common_interrupt_return+250>
0xffffffff810019b2 <+162>: cmp %rcx,0x88(%rsp)
0xffffffff810019ba <+170>: je 0xffffffff810019f8 <error_entry+232>
0xffffffff810019bc <+172>: mov %ecx,%eax
0xffffffff810019be <+174>: cmp %rax,0x88(%rsp)
0xffffffff810019c6 <+182>: je 0xffffffff810019f0 <error_entry+224>
0xffffffff810019c8 <+184>: cmpq $0xffffffff810017a3,0x88(%rsp)
0xffffffff810019d4 <+196>: jne 0xffffffff810019d9 <error_entry+201>
0xffffffff810019d6 <+198>: swapgs
0xffffffff810019d9 <+201>: lfence
0xffffffff810019dc <+204>: nopl %cs:0x0(%rax,%rax,1)
0xffffffff810019e5 <+213>: lea 0x8(%rsp),%rax
0xffffffff810019ea <+218>: nop
0xffffffff810019eb <+219>: retq
0xffffffff810019ec <+220>: int3
0xffffffff810019ed <+221>: int3
0xffffffff810019ee <+222>: int3
0xffffffff810019ef <+223>: int3
0xffffffff810019f0 <+224>: mov %rcx,0x88(%rsp)
0xffffffff810019f8 <+232>: swapgs
0xffffffff810019fb <+235>: nopl (%rax)
0xffffffff810019fe <+238>: jmp 0xffffffff81001a11 <error_entry+257>
0xffffffff81001a00 <+240>: mov %cr3,%rax
0xffffffff81001a03 <+243>: nopl 0x0(%rax,%rax,1)
0xffffffff81001a08 <+248>: and $0xffffffffffffe7ff,%rax
0xffffffff81001a0e <+254>: mov %rax,%cr3
0xffffffff81001a11 <+257>: jmp 0xffffffff81001a28 <error_entry+280>
0xffffffff81001a13 <+259>: mov $0x48,%ecx
0xffffffff81001a18 <+264>: mov %gs:0x2508510(%rip),%rdx # 0xffffffff83509f30 <x86_spec_ctrl_current>
0xffffffff81001a20 <+272>: mov %edx,%eax
0xffffffff81001a22 <+274>: shr $0x20,%rdx
0xffffffff81001a26 <+278>: wrmsr
0xffffffff81001a28 <+280>: nop
0xffffffff81001a29 <+281>: nopl 0x0(%rax,%rax,1)
0xffffffff81001a2e <+286>: jmp 0xffffffff81001a3c <error_entry+300>
0xffffffff81001a30 <+288>: int3
0xffffffff81001a31 <+289>: int3
0xffffffff81001a32 <+290>: int3
0xffffffff81001a33 <+291>: int3
0xffffffff81001a34 <+292>: int3
0xffffffff81001a35 <+293>: int3
0xffffffff81001a36 <+294>: int3
0xffffffff81001a37 <+295>: int3
0xffffffff81001a38 <+296>: int3
0xffffffff81001a39 <+297>: int3
0xffffffff81001a3a <+298>: int3
0xffffffff81001a3b <+299>: int3
0xffffffff81001a3c <+300>: lea 0x8(%rsp),%rdi
0xffffffff81001a41 <+305>: callq 0xffffffff8215a430 <fixup_bad_iret>
0xffffffff81001a46 <+310>: mov %rax,%rdi
0xffffffff81001a49 <+313>: jmpq 0xffffffff8215a3f0 <sync_regs>
End of assembler dump.
这个函数开头的部分,都是在疯狂的将寄存器压入栈,尤其下面是这一段
0xffffffff8100191b <+11>: push %rdx
0xffffffff8100191c <+12>: push %rcx
0xffffffff8100191d <+13>: push %rax
0xffffffff8100191e <+14>: push %r8
0xffffffff81001920 <+16>: push %r9
0xffffffff81001922 <+18>: push %r10
0xffffffff81001924 <+20>: push %r11
0xffffffff81001926 <+22>: push %rbx
0xffffffff81001927 <+23>: push %rbp
0xffffffff81001928 <+24>: push %r12
0xffffffff8100192a <+26>: push %r13
0xffffffff8100192c <+28>: push %r14
0xffffffff8100192e <+30>: push %r15
我们再回头看一下 struct pt_regs 的定义
type = struct pt_regs {
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long bp;
unsigned long bx;
unsigned long r11;
unsigned long r10;
unsigned long r9;
unsigned long r8;
unsigned long ax;
unsigned long cx;
unsigned long dx;
unsigned long si;
unsigned long di;
unsigned long orig_ax;
unsigned long ip;
union {
u16 cs;
u64 csx;
struct fred_cs fred_cs;
};
unsigned long flags;
unsigned long sp;
union {
u16 ss;
u64 ssx;
struct fred_ss fred_ss;
};
}
入栈的顺序完全和结构体元素定义的顺序相反,总结一下 regs 指向的 struct pt_regs 结构体 由硬件自动入栈和软件指令压入栈两步来构建。
现在我们看一下由 pt_regs 结构体保存的现场信息
(gdb) p /x regs->sp
$10 = 0xffffffff82a03e80
(gdb) disassemble regs->ip
Dump of assembler code for function pv_native_safe_halt:
0xffffffff8215d040 <+0>: endbr64
0xffffffff8215d044 <+4>: jmp 0xffffffff8215d04d <pv_native_safe_halt+13>
0xffffffff8215d046 <+6>: verw 0xaf425(%rip) # 0xffffffff8220c472 <ds.3829>
0xffffffff8215d04d <+13>: sti
0xffffffff8215d04e <+14>: hlt
0xffffffff8215d04f <+15>: retq
0xffffffff8215d050 <+16>: int3
0xffffffff8215d051 <+17>: int3
0xffffffff8215d052 <+18>: int3
0xffffffff8215d053 <+19>: int3
End of assembler dump.
(gdb) p /x regs->ip
$11 = 0xffffffff8215d04f
通过 p /x regs->sp 打印出中断发生前,栈顶指针的位置为 0xffffffff82a03e80 ,和本文之前查看 pt_regs 结构体前内存位置一致,这个现象也说明当系统处于内核态时,发生中断后,会立即在当前内核线程栈上保留现场,同时马上占用这个内核线程栈来执行中断处理函数。
通过 disassemble regs->ip 与 p /x regs->ip 两个命令可以看出中断发生前cpu正在执行 pv_native_safe_halt 函数中的 hlt 指令,这个指令就是让 cpu 休眠,是一个空闲情况下执行的指令。
接下来我们看看中断发生前函数调用流程,regs->sp 栈顶内存存放了一个指令的地址,我们看看这个地址中的内容
(gdb) disassemble *regs->sp
Dump of assembler code for function default_idle:
0xffffffff8215e6c0 <+0>: endbr64
0xffffffff8215e6c4 <+4>: callq 0xffffffff8215d040 <pv_native_safe_halt>
0xffffffff8215e6c9 <+9>: nop
0xffffffff8215e6ca <+10>: cli
0xffffffff8215e6cb <+11>: retq
0xffffffff8215e6cc <+12>: int3
0xffffffff8215e6cd <+13>: int3
0xffffffff8215e6ce <+14>: int3
0xffffffff8215e6cf <+15>: int3
End of assembler dump.
可以看出就是因为 default_idle 函数调用了 pv_native_safe_halt 这个函数,才会在栈顶将 0xffffffff8215e6c9 这个指令地址压入栈空间,而 pv_native_safe_halt 里面没有将任何内容压入栈,所以在执行htl指令发生中断时 0xffffffff8215e6c9 才会是 *regs->sp 的值(注意解引用符)
我们再继续探索中断发生前的调用链,因为栈上0xffffffff82a03e88 这个地址的值还是指向一个指令
(gdb) disassemble 0xffffffff8215e93b
Dump of assembler code for function default_idle_call:
0xffffffff8215e910 <+0>: nopw (%rax)
0xffffffff8215e914 <+4>: nop
0xffffffff8215e915 <+5>: mov %gs:0x13a86fb(%rip),%rax # 0xffffffff83507018 <current_task>
0xffffffff8215e91d <+13>: andb $0xdf,%ds:0x2(%rax)
0xffffffff8215e922 <+18>: mov (%rax),%rax
0xffffffff8215e925 <+21>: test $0x8,%al
0xffffffff8215e927 <+23>: jne 0xffffffff8215e945 <default_idle_call+53>
0xffffffff8215e929 <+25>: nopl 0x0(%rax,%rax,1)
0xffffffff8215e92e <+30>: xchg %ax,%ax
0xffffffff8215e930 <+32>: nop
0xffffffff8215e931 <+33>: callq 0xffffffff8215d990 <ct_idle_enter>
0xffffffff8215e936 <+38>: callq 0xffffffff8215e740 <arch_cpu_idle>
0xffffffff8215e93b <+43>: callq 0xffffffff8215da90 <ct_idle_exit>
0xffffffff8215e940 <+48>: nop
0xffffffff8215e941 <+49>: xchg %ax,%ax
0xffffffff8215e943 <+51>: xchg %ax,%ax
0xffffffff8215e945 <+53>: sti
0xffffffff8215e946 <+54>: nop
0xffffffff8215e947 <+55>: retq
0xffffffff8215e948 <+56>: int3
0xffffffff8215e949 <+57>: int3
0xffffffff8215e94a <+58>: int3
0xffffffff8215e94b <+59>: int3
0xffffffff8215e94c <+60>: xor %edi,%edi
0xffffffff8215e94e <+62>: callq 0xffffffff81362bb0 <tick_broadcast_oneshot_control>
0xffffffff8215e953 <+67>: jmp 0xffffffff8215e945 <default_idle_call+53>
0xffffffff8215e955 <+69>: mov %gs:0x13a86d8(%rip),%edx # 0xffffffff83507034 <cpu_number>
0xffffffff8215e95c <+76>: mov %gs:0x13a86d1(%rip),%eax # 0xffffffff83507034 <cpu_number>
0xffffffff8215e963 <+83>: bt %rax,0xb697e5(%rip) # 0xffffffff82cc8150 <__cpu_online_mask>
0xffffffff8215e96b <+91>: jae 0xffffffff8215e943 <default_idle_call+51>
0xffffffff8215e96d <+93>: incl %gs:0x13a86bc(%rip) # 0xffffffff83507030 <__preempt_count>
0xffffffff8215e974 <+100>: mov 0xb4563d(%rip),%rax # 0xffffffff82ca3fb8 <__tracepoint_cpu_idle+56>
0xffffffff8215e97b <+107>: test %rax,%rax
0xffffffff8215e97e <+110>: je 0xffffffff8215e98e <default_idle_call+126>
0xffffffff8215e980 <+112>: mov 0x8(%rax),%rdi
0xffffffff8215e984 <+116>: mov $0xffffffff,%esi
0xffffffff8215e989 <+121>: callq 0xffffffff813d7960 <__traceiter_cpu_idle>
0xffffffff8215e98e <+126>: decl %gs:0x13a869b(%rip) # 0xffffffff83507030 <__preempt_count>
0xffffffff8215e995 <+133>: jne 0xffffffff8215e943 <default_idle_call+51>
0xffffffff8215e997 <+135>: nopl 0x0(%rax,%rax,1)
0xffffffff8215e99c <+140>: jmp 0xffffffff8215e943 <default_idle_call+51>
0xffffffff8215e99e <+142>: mov %gs:0x13a868f(%rip),%edx # 0xffffffff83507034 <cpu_number>
0xffffffff8215e9a5 <+149>: mov %gs:0x13a8688(%rip),%eax # 0xffffffff83507034 <cpu_number>
0xffffffff8215e9ac <+156>: bt %rax,0xb6979c(%rip) # 0xffffffff82cc8150 <__cpu_online_mask>
0xffffffff8215e9b4 <+164>: jae 0xffffffff8215e930 <default_idle_call+32>
0xffffffff8215e9ba <+170>: incl %gs:0x13a866f(%rip) # 0xffffffff83507030 <__preempt_count>
0xffffffff8215e9c1 <+177>: mov 0xb455f0(%rip),%rax # 0xffffffff82ca3fb8 <__tracepoint_cpu_idle+56>
0xffffffff8215e9c8 <+184>: test %rax,%rax
0xffffffff8215e9cb <+187>: je 0xffffffff8215e9db <default_idle_call+203>
0xffffffff8215e9cd <+189>: mov 0x8(%rax),%rdi
0xffffffff8215e9d1 <+193>: mov $0x1,%esi
0xffffffff8215e9d6 <+198>: callq 0xffffffff813d7960 <__traceiter_cpu_idle>
0xffffffff8215e9db <+203>: decl %gs:0x13a864e(%rip) # 0xffffffff83507030 <__preempt_count>
0xffffffff8215e9e2 <+210>: jne 0xffffffff8215e930 <default_idle_call+32>
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
这个感觉有点奇怪,地址 0xffffffff82a03e88 存放的值是 0xffffffff8215e93b,反汇编过程显示调用了 arch_cpu_idle 函数而没有调用 default_idle 这个函数,我们继续反汇编
(gdb) disassemble 0xffffffff8215e740
Dump of assembler code for function arch_cpu_idle:
0xffffffff8215e740 <+0>: endbr64
0xffffffff8215e744 <+4>: jmpq 0xffffffff8215e6c0 <default_idle>
End of assembler dump.
这下所有的内容都清晰了 arch_cpu_idle 函数通过 jmpq指令直接跳转到 default_idle 去执行,而不是通常的函数调用指令 callq(不会将下调指令的地址压入栈),所以栈上并没有出现 arch_cpu_idle 函数的栈帧。
经过上述调试,现在程序发生中断后执行asm_sysvec_apic_timer_interrupt 函数,执行完callq指令后,停留在 sysvec_apic_timer_interrupt 函数入口处,此时我们执行bt指令会出现下面这样的情况
(gdb) bt
#0 sysvec_apic_timer_interrupt (regs=0xffffffff82a03dd8 <init_thread_union+15832>) at arch/x86/kernel/apic/apic.c:1050
#1 0xffffffff8100148a in asm_sysvec_apic_timer_interrupt () at ./arch/x86/include/asm/idtentry.h:702
#2 0x0000000000014970 in ?? ()
#3 0xffffffff82a0e030 in envp_init () at lib/dump_stack.c:88
#4 0x0000000000000000 in ?? ()
这个现象其实很奇怪的,因为我编译出来的内核是带符号表的,不太应该出现 ??() 这样的函数,我们再次查看栈内存分布
(gdb) x /50a $rsp
0xffffffff82a03dd0 <init_thread_union+15824>: 0xffffffff8100148a <asm_sysvec_apic_timer_interrupt+26> 0x14970
0xffffffff82a03de0 <init_thread_union+15840>: 0xffffffff82a0e030 <envp_init+16> 0x0
0xffffffff82a03df0 <init_thread_union+15856>: 0x0 0x0
0xffffffff82a03e00 <init_thread_union+15872>: 0xffffffff82a0e900 <init_task> 0x0
0xffffffff82a03e10 <init_thread_union+15888>: 0xffffffff82a0e900 <init_task> 0xe86f235840
0xffffffff82a03e20 <init_thread_union+15904>: 0xf7a4 0xffff88808450d000
0xffffffff82a03e30 <init_thread_union+15920>: 0xffffffff 0x3f
0xffffffff82a03e40 <init_thread_union+15936>: 0xe851282c80 0xf7a4
0xffffffff82a03e50 <init_thread_union+15952>: 0xffffffffffffffff 0xffffffff8215d04f <pv_native_safe_halt+15>
0xffffffff82a03e60 <init_thread_union+15968>: 0x10 0x206
0xffffffff82a03e70 <init_thread_union+15984>: 0xffffffff82a03e80 <init_thread_union+16000> 0x18
0xffffffff82a03e80 <init_thread_union+16000>: 0xffffffff8215e6c9 <default_idle+9> 0xffffffff8215e93b <default_idle_call+43>
0xffffffff82a03e90 <init_thread_union+16016>: 0xffffffff812e4d85 <do_idle+437> 0xffff888007a289c0
0xffffffff82a03ea0 <init_thread_union+16032>: 0x3a34513d6cf2cb00 0xffff888007ed0008
0xffffffff82a03eb0 <init_thread_union+16048>: 0xed 0x0
0xffffffff82a03ec0 <init_thread_union+16064>: 0xffff888007ed0000 0xffffffff82a0e030 <envp_init+16>
0xffffffff82a03ed0 <init_thread_union+16080>: 0xffffffff812e4fb4 <cpu_startup_entry+36> 0x2
0xffffffff82a03ee0 <init_thread_union+16096>: 0xffffffff8215f02c 0x9
0xffffffff82a03ef0 <init_thread_union+16112>: 0xffffffff8325ee16 <start_kernel+1318> 0xffffffff83332020 <command_line>
0xffffffff82a03f00 <init_thread_union+16128>: 0x0 0x0
0xffffffff82a03f10 <init_thread_union+16144>: 0x14970 0xb0
0xffffffff82a03f20 <init_thread_union+16160>: 0x0 0x0
0xffffffff82a03f30 <init_thread_union+16176>: 0xffffffff8326aac8 <x86_64_start_reservations+24> 0xffffffff8326ac0b <x86_64_start_kernel+203>
0xffffffff82a03f40 <init_thread_union+16192>: 0x0 0x0
0xffffffff82a03f50 <init_thread_union+16208>: 0xffffffff8122faa6 <secondary_startup_64+342> 0x1f0f2e6600000000
(gdb) p /x regs->r15
$12 = 0x14970
(gdb) disassemble asm_sysvec_apic_timer_interrupt
Dump of assembler code for function asm_sysvec_apic_timer_interrupt:
0xffffffff81001470 <+0>: endbr64
0xffffffff81001474 <+4>: nopl (%rax)
0xffffffff81001477 <+7>: cld
0xffffffff81001478 <+8>: pushq $0xffffffffffffffff
0xffffffff8100147a <+10>: callq 0xffffffff81001910 <error_entry>
0xffffffff8100147f <+15>: mov %rax,%rsp
0xffffffff81001482 <+18>: mov %rsp,%rdi
0xffffffff81001485 <+21>: callq 0xffffffff8215cc20 <sysvec_apic_timer_interrupt>
0xffffffff8100148a <+26>: jmpq 0xffffffff81001a50 <error_return>
End of assembler dump.
我们可以再次分析栈上内容,栈顶元素指向返回 asm_sysvec_apic_timer_interrupt 的地址,接下来地址上的元素其实就是 pr_regs 结构体的元素了,通过 asm_sysvec_apic_timer_interrupt 反汇编可以发现,调用完 error_entry 构建完 pt_regs 结构体后没有再向栈上压入数据因此 regs->r15 就是栈上第二个元素,通过gdb打印印证了这一点,我们看一下 bt 的 fr 2 函数的地址,完全就是 regs->r15 对应的数据,这个地址可以很确定的说其实就是个没有意义的值,在这种情况下 gdb 误认为 0x14970 是上个函数通过 callq 指令调用 asm_sysvec_apic_timer_interrupt 时压入栈的返回地址,因此出现这种错误的情况。
通过上述分析可以知道,中断开始时的栈分析与函数调用时栈分析完全不同,中断过程中使用 bt 指令查看函数调用链时,到中断描述符对应的函数为止是有意义的,在往前打印出来的东西完全没有意义,需要根据 pt_regs 重构中断前的现场才能继续分析。
为了让cpu绝大多数时间运行在用户态,我们需要写一个死循环让cpu疯狂运转
int main()
{
volatile int i = 0;
while(1)
{
i++;
}
return 0;
}
用户程序编译运行起来后在 sysvec_apic_timer_interrupt 函数入口打上断点,触发中断后看一下栈空间内存情况
Breakpoint 2, sysvec_apic_timer_interrupt (regs=0xffffc900001cff58) at arch/x86/kernel/apic/apic.c:1050
1050 DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt)
(gdb) x /50a $rsp
0xffffc900001cff50: 0xffffffff8100148a <asm_sysvec_apic_timer_interrupt+26> 0x0
0xffffc900001cff60: 0x4c0018 0x0
0xffffc900001cff70: 0x402d50 0x7ffeaf3dfdb0
0xffffc900001cff80: 0x400518 0x206
0xffffc900001cff90: 0x2f40 0x6
0xffffc900001cffa0: 0x0 0x45a2734
0xffffc900001cffb0: 0x44b9f0 0x7ffeaf3dfef8
0xffffc900001cffc0: 0x7ffeaf3dfee8 0x1
0xffffc900001cffd0: 0xffffffffffffffff 0x401cc4
0xffffc900001cffe0: 0x33 0x202
0xffffc900001cfff0: 0x7ffeaf3dfdb0 0x2b
0xffffc900001d0000: Cannot access memory at address 0xffffc900001d0000
(gdb) p /x (long long)regs+sizeof(struct pt_regs)
$13 = 0xffffc900001d0000
我们发现栈上仅保留了一个函数返回地址以及一个 pt_regs 结构体,没有其他任何内容,我们再查看一下中断发生前cpu在干什么事
(gdb) disassemble regs->ip
No function contains specified address.
(gdb) disassemble regs->ip-0x10,regs->ip+5
Dump of assembler code from 0x401cb4 to 0x401cc9:
0x0000000000401cb4: push %rbx
0x0000000000401cb6: nop %edx
0x0000000000401cb9: push %rbp
0x0000000000401cba: mov %rsp,%rbp
0x0000000000401cbd: movl $0x0,-0x4(%rbp)
0x0000000000401cc4: mov -0x4(%rbp),%eax
0x0000000000401cc7: add $0x1,%eax
End of assembler dump.
(gdb) disassemble regs->ip-0x10,regs->ip+0x10
Dump of assembler code from 0x401cb4 to 0x401cd4:
0x0000000000401cb4: push %rbx
0x0000000000401cb6: nop %edx
0x0000000000401cb9: push %rbp
0x0000000000401cba: mov %rsp,%rbp
0x0000000000401cbd: movl $0x0,-0x4(%rbp)
0x0000000000401cc4: mov -0x4(%rbp),%eax
0x0000000000401cc7: add $0x1,%eax
0x0000000000401cca: mov %eax,-0x4(%rbp)
0x0000000000401ccd: jmp 0x401cc4
0x0000000000401ccf: nop
0x0000000000401cd0: push %rbx
0x0000000000401cd1: sub $0x88,%rsp
End of assembler dump.
(gdb) p /x regs->ip
$14 = 0x401cc4
(gdb) p /x regs->sp
$15 = 0x7ffeaf3dfdb0
(gdb) p /x $rsp
$16 = 0xffffc900001cff50
通过反汇编分析,中断发生前cpu确实在用户态执行死循环,而且中断发生前后栈空间其实发生了从用户态带内核态的切换。
因此我们可以得出结论,当cpu在用户态执行过程中发生了中断时会立即切换到内核态,同时占用当前线程的内核栈空间来执行中断处理函数。同时某个线程在用户态执行时,其内核栈栈顶位置与栈底位置是相同的,也就是内核栈完全是空的。
我们再看看这种情况下在中断处理过程中使用 bt 指令应该就不会有莫名奇妙的函数出来了
(gdb) bt
#0 sysvec_apic_timer_interrupt (regs=0xffffc900001cff58) at arch/x86/kernel/apic/apic.c:1050
#1 0xffffffff8100148a in asm_sysvec_apic_timer_interrupt () at ./arch/x86/include/asm/idtentry.h:702
#2 0x0000000000000000 in ?? ()
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。