这篇是我自己探索实现 MIT 6.828 lab 的笔记记录,会包含一部分代码注释和要求的翻译记录,以及踩过的坑/个人的解决方案
这里是我实现的完整代码仓库,也包含其他笔记等等:https://github.com/yunwei37/6.828-2018-labs
kern/cpu.h
多处理器支持的内核私有定义kern/mpconfig.c
读取多处理器配置的代码kern/lapic.c
驱动每个处理器中的本地 APIC 单元的内核代码kern/mpentry.S
非引导 CPU 的汇编语言入口代码kern/spinlock.h
自旋锁的内核私有定义,包括大内核锁kern/spinlock.c
实现自旋锁的内核代码kern/sched.c
您将要实现的调度程序的代码框架我们将让 JOS 支持“对称多处理”(SMP),这是一种多处理器模型,在该模型中,所有 CPU 都具有对系统资源(例如内存和 I/O 总线)的同等访问权限。尽管 SMP 中所有 CPU 的功能都相同,但在引导过程中它们可以分为两种类型:引导处理器 (BSP) 负责初始化系统和引导操作系统;只有在操作系统启动并运行后,BSP 才会激活应用处理器(AP)。哪个处理器是 BSP 是由硬件和 BIOS 决定的。到目前为止,您现有的所有 JOS 代码都已在 BSP 上运行。
在 SMP 系统中,每个 CPU 都有一个伴随的本地 APIC (LAPIC) 单元。LAPIC 单元负责在整个系统中传送中断。LAPIC 还为其连接的 CPU 提供唯一标识符。在本实验中,我们使用 LAPIC 单元(在kern/lapic.c 中)的以下基本功能:
处理器使用内存映射 I/O (MMIO) 访问其 LAPIC。
void *
mmio_map_region(physaddr_t pa, size_t size)
{
// Where to start the next region. Initially, this is the
// beginning of the MMIO region. Because this is static, its
// value will be preserved between calls to mmio_map_region
// (just like nextfree in boot_alloc).
static uintptr_t base = MMIOBASE;
// Reserve size bytes of virtual memory starting at base and
// map physical pages [pa,pa+size) to virtual addresses
// [base,base+size). Since this is device memory and not
// regular DRAM, you'll have to tell the CPU that it isn't
// safe to cache access to this memory. Luckily, the page
// tables provide bits for this purpose; simply create the
// mapping with PTE_PCD|PTE_PWT (cache-disable and
// write-through) in addition to PTE_W. (If you're interested
// in more details on this, see section 10.5 of IA32 volume
// 3A.)
//
// Be sure to round size up to a multiple of PGSIZE and to
// handle if this reservation would overflow MMIOLIM (it's
// okay to simply panic if this happens).
//
// Hint: The staff solution uses boot_map_region.
//
// Your code here:
uintptr_t start = base;
base += ROUNDUP(size, PGSIZE);
boot_map_region(kern_pgdir, start, ROUNDUP(size, PGSIZE), pa, PTE_PCD|PTE_PWT|PTE_W);
if (base > MMIOLIM) {
panic("mmio_map_region overflows MMIOLIM");
}
return (void*)start;
}
在启动 AP 之前,BSP 应首先收集有关多处理器系统的信息,例如 CPU 总数、它们的 APIC ID 和 LAPIC 单元的 MMIO 地址。
...
cprintf("Init pages alloc...\n");
size_t i;
for (i = 1; i < npages_basemem; i++) {
if (i * PGSIZE != MPENTRY_PADDR){
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}
....
在编写多处理器操作系统时,区分每个处理器私有的每个 CPU 状态和整个系统共享的全局状态很重要。
以下是您应该注意的每个 CPU 状态:
static void
mem_init_mp(void)
{
// Map per-CPU stacks starting at KSTACKTOP, for up to 'NCPU' CPUs.
//
// For CPU i, use the physical memory that 'percpu_kstacks[i]' refers
// to as its kernel stack. CPU i's kernel stack grows down from virtual
// address kstacktop_i = KSTACKTOP - i * (KSTKSIZE + KSTKGAP), and is
// divided into two pieces, just like the single stack you set up in
// mem_init:
// * [kstacktop_i - KSTKSIZE, kstacktop_i)
// -- backed by physical memory
// * [kstacktop_i - (KSTKSIZE + KSTKGAP), kstacktop_i - KSTKSIZE)
// -- not backed; so if the kernel overflows its stack,
// it will fault rather than overwrite another CPU's stack.
// Known as a "guard page".
// Permissions: kernel RW, user NONE
//
// LAB 4: Your code here:
for (int i = 0; i < NCPU; ++i) {
uintptr_t kstacktop = KSTACKTOP - i * (KSTKSIZE + KSTKGAP);
boot_map_region(kern_pgdir, kstacktop - KSTKSIZE, KSTKSIZE, PADDR(percpu_kstacks[i]), PTE_W);
}
}
// Initialize and load the per-CPU TSS and IDT
void
trap_init_percpu(void)
{
// The example code here sets up the Task State Segment (TSS) and
// the TSS descriptor for CPU 0. But it is incorrect if we are
// running on other CPUs because each CPU has its own kernel stack.
// Fix the code so that it works for all CPUs.
//
// Hints:
// - The macro "thiscpu" always refers to the current CPU's
// struct CpuInfo;
// - The ID of the current CPU is given by cpunum() or
// thiscpu->cpu_id;
// - Use "thiscpu->cpu_ts" as the TSS for the current CPU,
// rather than the global "ts" variable;
// - Use gdt[(GD_TSS0 >> 3) + i] for CPU i's TSS descriptor;
// - You mapped the per-CPU kernel stacks in mem_init_mp()
// - Initialize cpu_ts.ts_iomb to prevent unauthorized environments
// from doing IO (0 is not the correct value!)
//
// ltr sets a 'busy' flag in the TSS selector, so if you
// accidentally load the same TSS on more than one CPU, you'll
// get a triple fault. If you set up an individual CPU's TSS
// wrong, you may not get a fault until you try to return from
// user space on that CPU.
//
// LAB 4: Your code here:
// Setup a TSS so that we get the right stack
// when we trap to the kernel.
thiscpu->cpu_ts.ts_esp0 = (uintptr_t)percpu_kstacks[cpunum()];
thiscpu->cpu_ts.ts_ss0 = GD_KD;
thiscpu->cpu_ts.ts_iomb = sizeof(struct Taskstate);
// Initialize the TSS slot of the gdt.
gdt[(GD_TSS0 >> 3) + thiscpu->cpu_id] = SEG16(STS_T32A, (uint32_t) (&thiscpu->cpu_ts),
sizeof(struct Taskstate) - 1, 0);
gdt[(GD_TSS0 >> 3) + thiscpu->cpu_id].sd_s = 0;
// Load the TSS selector (like other segment selectors, the
// bottom three bits are special; we leave them 0)
ltr(GD_TSS0 + (thiscpu->cpu_id << 3));
// Load the IDT
lidt(&idt_pd);
}
在让 AP 更进一步之前,我们需要首先解决多个 CPU 同时运行内核代码时的竞争条件。实现这一点的最简单方法是使用大内核锁。大内核锁是一个全局锁,每当环境进入内核模式时都会持有,并在环境返回用户模式时释放。在这个模型中,用户态的环境可以在任何可用的 CPU 上并发运行,但内核态下只能运行一个环境;任何其他尝试进入内核模式的环境都被迫等待。
kern/spinlock.h声明了大内核锁,即 kernel_lock. 它还提供lock_kernel() 和unlock_kernel(),用于获取和释放锁。您应该在四个位置应用大内核锁:
(照上面写就行)
您在本实验中的下一个任务是更改 JOS 内核,以便它可以以“循环”方式在多个环境之间交替。
// Choose a user environment to run and run it.
void
sched_yield(void)
{
struct Env *idle;
struct Env *cur_env = curenv;
//cprintf("sched_yield cur_env: %08x\n",cur_env? cur_env->env_id:0);
// Implement simple round-robin scheduling.
//
// Search through 'envs' for an ENV_RUNNABLE environment in
// circular fashion starting just after the env this CPU was
// last running. Switch to the first such environment found.
//
// If no envs are runnable, but the environment previously
// running on this CPU is still ENV_RUNNING, it's okay to
// choose that environment.
//
// Never choose an environment that's currently running on
// another CPU (env_status == ENV_RUNNING). If there are
// no runnable environments, simply drop through to the code
// below to halt the cpu.
// LAB 4: Your code here.
if (cur_env) {
for (int i = ENVX(cur_env->env_id) + 1; i < NENV; i++) {
if (envs[i].env_status == ENV_RUNNABLE) {
env_run(&envs[i]);
}
}
for (int i = 0; i < ENVX(cur_env->env_id); i++) {
if (envs[i].env_status == ENV_RUNNABLE) {
env_run(&envs[i]);
}
}
if (cur_env->env_status == ENV_RUNNING) {
env_run(cur_env);
}
} else {
for (int i = 0; i < NENV; i++) {
if (envs[i].env_status == ENV_RUNNABLE) {
env_run(&envs[i]);
}
}
}
// sched_halt never returns
sched_halt();
}
尽管您的内核现在能够在多个用户级环境之间运行和切换,但它仍然仅限于内核最初设置的运行环境。您现在将实现必要的 JOS 系统调用,以允许用户环境创建和启动其他新用户环境。
// Allocate a new environment.
// Returns envid of new environment, or < 0 on error. Errors are:
// -E_NO_FREE_ENV if no free environment is available.
// -E_NO_MEM on memory exhaustion.
static envid_t
sys_exofork(void)
{
// Create the new environment with env_alloc(), from kern/env.c.
// It should be left as env_alloc created it, except that
// status is set to ENV_NOT_RUNNABLE, and the register set is copied
// from the current environment -- but tweaked so sys_exofork
// will appear to return 0.
// LAB 4: Your code here.
struct Env* new_env;
int result;
static envid_t last_id;
struct Env* cur_env = curenv;
if ((result = env_alloc(&new_env, curenv->env_id)) < 0) {
return result;
}
new_env->env_status = ENV_NOT_RUNNABLE;
new_env->env_tf = cur_env->env_tf;
new_env->env_tf.tf_regs.reg_eax = 0;
// cprintf("new_env: %d\n", new_env->env_id);
return new_env->env_id;
}
// Set envid's env_status to status, which must be ENV_RUNNABLE
// or ENV_NOT_RUNNABLE.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
// -E_INVAL if status is not a valid status for an environment.
static int
sys_env_set_status(envid_t envid, int status)
{
// Hint: Use the 'envid2env' function from kern/env.c to translate an
// envid to a struct Env.
// You should set envid2env's third argument to 1, which will
// check whether the current environment has permission to set
// envid's status.
// LAB 4: Your code here.
struct Env* new_env;
int result;
if ((result = envid2env(envid, &new_env, 1)) < 0){
return result;
}
if (status != ENV_RUNNABLE && status != ENV_NOT_RUNNABLE) {
return -E_INVAL;
}
new_env->env_status = status;
return 0;
}
// Allocate a page of memory and map it at 'va' with permission
// 'perm' in the address space of 'envid'.
// The page's contents are set to 0.
// If a page is already mapped at 'va', that page is unmapped as a
// side effect.
//
// perm -- PTE_U | PTE_P must be set, PTE_AVAIL | PTE_W may or may not be set,
// but no other bits may be set. See PTE_SYSCALL in inc/mmu.h.
//
// Return 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
// -E_INVAL if va >= UTOP, or va is not page-aligned.
// -E_INVAL if perm is inappropriate (see above).
// -E_NO_MEM if there's no memory to allocate the new page,
// or to allocate any necessary page tables.
static int
sys_page_alloc(envid_t envid, void *va, int perm)
{
// Hint: This function is a wrapper around page_alloc() and
// page_insert() from kern/pmap.c.
// Most of the new code you write should be to check the
// parameters for correctness.
// If page_insert() fails, remember to free the page you
// allocated!
// LAB 4: Your code here.
struct Env* new_env;
int result;
struct PageInfo *p = NULL;
if ((uintptr_t)va >= UTOP || (uintptr_t)va % PGSIZE ) {
return -E_INVAL;
}
if (perm & ~PTE_SYSCALL) {
return -E_INVAL;
}
if ((result = envid2env(envid, &new_env, 1)) < 0) {
return result;
}
if (!(p = page_alloc(ALLOC_ZERO))) {
return -E_NO_MEM;
}
if ((result = page_insert(new_env->env_pgdir, p, va, perm | PTE_U)) < 0){
page_free(p);
return result;
}
return 0;
}
// Map the page of memory at 'srcva' in srcenvid's address space
// at 'dstva' in dstenvid's address space with permission 'perm'.
// Perm has the same restrictions as in sys_page_alloc, except
// that it also must not grant write access to a read-only
// page.
//
// Return 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if srcenvid and/or dstenvid doesn't currently exist,
// or the caller doesn't have permission to change one of them.
// -E_INVAL if srcva >= UTOP or srcva is not page-aligned,
// or dstva >= UTOP or dstva is not page-aligned.
// -E_INVAL is srcva is not mapped in srcenvid's address space.
// -E_INVAL if perm is inappropriate (see sys_page_alloc).
// -E_INVAL if (perm & PTE_W), but srcva is read-only in srcenvid's
// address space.
// -E_NO_MEM if there's no memory to allocate any necessary page tables.
static int
sys_page_map(envid_t srcenvid, void *srcva,
envid_t dstenvid, void *dstva, int perm)
{
// Hint: This function is a wrapper around page_lookup() and
// page_insert() from kern/pmap.c.
// Again, most of the new code you write should be to check the
// parameters for correctness.
// Use the third argument to page_lookup() to
// check the current permissions on the page.
// LAB 4: Your code here.
struct Env* src_env;
struct Env* dst_env;
int result;
struct PageInfo *p = NULL;
pte_t *pte;
if ((uintptr_t)srcva >= UTOP || (uintptr_t)srcva % PGSIZE || (uintptr_t)srcva >= UTOP || (uintptr_t)srcva % PGSIZE) {
return -E_INVAL;
}
if (perm & ~PTE_SYSCALL) {
return -E_INVAL;
}
if ((result = envid2env(srcenvid, &src_env, 1)) < 0){
return result;
}
if ((result = envid2env(dstenvid, &dst_env, 1)) < 0){
return result;
}
if (!(p = page_lookup(src_env->env_pgdir, srcva, &pte))) {
return -E_INVAL;
}
if (!(*pte & PTE_W) && (perm & PTE_W)) {
return -E_INVAL;
}
if ((result = page_insert(dst_env->env_pgdir, p, dstva, perm | PTE_U)) < 0){
return result;
}
return 0;
}
// Unmap the page of memory at 'va' in the address space of 'envid'.
// If no page is mapped, the function silently succeeds.
//
// Return 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
// -E_INVAL if va >= UTOP, or va is not page-aligned.
static int
sys_page_unmap(envid_t envid, void *va)
{
// Hint: This function is a wrapper around page_remove().
// LAB 4: Your code here.
struct Env* new_env;
int result;
if ((uintptr_t)va >= UTOP || (uintptr_t)va % PGSIZE ) {
return -E_INVAL;
}
if ((result = envid2env(envid, &new_env, 1)) < 0){
return result;
}
page_remove(new_env->env_pgdir, va);
return 0;
}