Understanding Linux Kernel Stack

Published:

Kernel stack pages

  • In kernel 2.4.0, the kernel mode stack is right above the task_struct, in do_fork, Line 669 will get two pages, the low address is for task struct, the end is for pt_regs, the address right below pt_regs is for the kernel mode stack.

kernel stack v2.4

Figure credit to Linux内核源代码情景分析

kernel stack v2.4

Figure credit to Understanding the Linux Kernel

Kernel stack usages

For AArch64 kernel 3.10.

  • The kernel stack of a process is defined in union thread_union. The thread_info struct is at the beginning while the remaining part is kernel stack.
  • struct thread_info is different from struct thread_struct, which is in task_struct. thread_struct contains cpu_context, which contains the callee-saved registers (x19, …, x28, x29, x30, sp).
  • The end of kernel stack is pt_regs, which contains all the registers and sp, pc, pstate. The middle part of kernel stack is similar to user stack, the frame pointer (x29) which points to the bottom of current frame, which contains the frame pointer of previous frame.

kernel stack v2.4

kernel stack v2.4

  • From above figure, it is easy to see that kernel stack frame is linked by frame pointer and from unwind_frame, we know that return address is at frame pointer+8, in higher address.

cpu_context and pt_regs

struct cpu_context {
	unsigned long x19;
	unsigned long x20;
	unsigned long x21;
	unsigned long x22;
	unsigned long x23;
	unsigned long x24;
	unsigned long x25;
	unsigned long x26;
	unsigned long x27;
	unsigned long x28;
	unsigned long fp;
	unsigned long sp;
	unsigned long pc;
};

struct pt_regs {
	union {
		struct user_pt_regs user_regs;
		struct {
			u64 regs[31];
			u64 sp;
			u64 pc;
			u64 pstate;
		};
	};
	u64 orig_x0;
	u64 syscallno;
	u64 orig_addr_limit;
	u64 unused;	// maintain 16 byte alignment
};

Both of them contains stack pointer sp and pc.

  • pt_regs is at the high end of kernel stack, is mainly used for saving user registers in user-kernel mode switching. Therefore, after returning to user space, the first instruction get executed is at pt_regs->pc.

  • cpu_context is in task_struct->thread_struct, is mainly used for saving registers of context switch. So right after context switch to a process, its cpu_context->pc will get executed.

User-kernel (syscall) mode switch

  • User mode to kernel mode switch happens in kernel_entry in entry.S. Note that here stack pointer grows from high address to low address, while in memory allocation of pt_regts, small number registers are in low addresses, which large number registers are in high addresses, so need push large number registers first.

  • Correspondingly, kernel mode to user mode switch will pop out all these registers, as shown in kernel_exit.

Context switch

References

  1. Understanding the Linux Kernel
  2. Linux内核源代码情景分析