跳至主要內容

带你修改一次 Linux 内核

pedrogaoososlinux大约 87 分钟

使用

运行容器:

$ git clone https://github.com/pedrogao/qemu-linux
$ cd qemu-linux && docker build . -t qemu-linux
$ docker run -it qemu-linux /bin/bash

进入容器后,运行脚本启动 qemu:

# ./run.sh

运行结果: 图片

环境搭建

依赖:

  • Docker 通过 docker 搭建 ubuntu20.04 环境,Dockerfile 如下:
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive

RUN echo "deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse \n \
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse \n \
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse \n \
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse \n \
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse \n \
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse \n \
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse \n  \
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse \n \
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse \n \
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse">/etc/apt/sources.list

RUN cat /etc/apt/sources.list
RUN apt update 
RUN apt-get install -y wget \
    gcc \
    gcc-multilib \
    git \
    make \
    bc \
    texinfo \
    gdb \
    cgdb \
    qemu-system-x86-64 \
    libncurses5-dev \
    vim \
    cpio

然后将其编译为一个本地镜像:

docker build -t qemu-20 .

这样就有一个装有 qemu 的 ubuntu20.04 的镜像环境了:

docker run -it qemu-20 /bin/bash

进入容器后,编译 Linux 源码:

$ cd home/
$ wget http://ftp.sjtu.edu.cn/sites/ftp.kernel.org/pub/linux/kernel/v4.x/linux-4.9.301.tar.gz
$ tar -xzf ./linux-4.9.301.tar.gz

$ make menuconfig

开启 debug 信息,路径如下:

Kernel hacking --->
  Compile-time checks and compiler options --->
    [ ] Compile the kernel with debug info

按空格选中,其它默认。生成 .config 配置文件后,开始编译内核:

make -j8

查询内核镜像:

# ls ./arch/x86_64/boot/        
bzImage

下载 busybox:

# cd ..
# wget https://busybox.net/downloads/busybox-1.32.1.tar.bz2
# tar -xf busybox-1.32.1.tar.bz2
# cd busybox-1.32.1

编译 busybox:

# make menuconfig

选择 settings -> build options -> build static binary,保存并退出,然后编译:

# make -j8 && make install
# ls ./_install/

bin  linuxrc  sbin  usr

构建 initramfs 根文件系统:

# cd .. 
# mkdir initramfs
# cd initramfs/

# cp -rf ../busybox-1.32.1/_install/* ./ 
# mkdir dev proc sys
# ln -sf /dev/null /dev/tty1
# ln -sf /dev/null /dev/tty2
# ln -sf /dev/null /dev/tty3
# ln -sf /dev/null /dev/tty4
# cp -a /dev/{null,console,tty,tty1,tty2,tty3,tty4} dev/
# rm -f linuxrc

新建 init 文件,并添加如下内容:

#!/bin/busybox sh
echo "{==DBG==} INIT SCRIPT"
mount -t proc none /proc
mount -t sysfs none /sys
echo -e "{==DBG==} Boot took $(cut -d' ' -f1 /proc/uptime) seconds"
exec /sbin/init

如下:

# chmod a+x init

# ls
bin  dev  init  proc  sbin  sys  usr 

打包 initramfs:

# find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../initramfs.cpio.gz
# cd ..
# ls | grep 'initramfs'
initramfs
initramfs.cpio.gz

运行内核:

# qemu-system-x86_64 -kernel ./linux-4.9.301/arch/x86/boot/bzImage -initrd ./initramfs.cpio.gz -append "nokaslr console=ttyS0" -nographic

或者:

# qemu-system-x86_64 -nographic -kernel ./linux-4.9.301/arch/x86/boot/bzImage -initrd ./initramfs.cpio.gz -append "noapic console=ttyS0 norandmaps"

运行成功截图: 图片

或者调试内核,监听在 1234 端口:

# qemu-system-x86_64 -kernel ./linux-4.9.301/arch/x86/boot/bzImage -initrd ./initramfs.cpio.gz -append "nokaslr console=ttyS0" -s -S -nographic

通过 vscode attch 至容器,然后在容器中安装 cpp 插件,在 init/main.c 中打上断点,调试配置如下:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "kernel-debug",
            "type": "cppdbg",
            "request": "launch",
            "miDebuggerServerAddress": "127.0.0.1:1234",
            "program": "${workspaceFolder}/vmlinux",
            "args": [],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "logging": {
                "engineLogging": false
            },
            "MIMode": "gdb",
        }
    ]
}

如下图: 图片

然后开始愉快的调试 Linux 内核:

图片

Crosstool-ng

下载 ng:

# wget http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.25.0.tar.xz

# tar -xf ./crosstool-ng-1.25.0.tar.xz
# cd crosstool-ng-1.25.0
# export CROSSTOOL_DIR=`pwd`
# mkdir install
# cd install
# export CROSSTOOL_INSTALL_DIR=`pwd`

配置:

# cd $CROSSTOOL_DIR
# apt-get install flex bison unzip help2man gawk libtool libtool-bin
# ./configure --prefix=$CROSSTOOL_INSTALL_DIR

# make && make install

安装新的编译器:

# ./ct-ng x86_64-unknown-linux-gnu
# ./ct-ng menuconfig

设置 root 用户可编译:

  • Paths and misc options Try features marked as EXPERIMENTAL

Allow building as root user (READ HELP!)

Are you sure?

选择内核版本为 4.9.301,编译:

# ./ct-ng build

直接通过 docker 编译,参考 https://hub.docker.com/r/bensuperpc/crosstool-ngopen in new window

Buildroot

安装 buildroot:

# wget https://buildroot.org/downloads/buildroot-2022.02.3.tar.gz
# tar -xzf ./buildroot-2022.02.3.tar.gz
# cd buildroot-2022.02.3
# export BUILDROOT=/home/buildroot-2022.02.3
# mkdir buildroot-build
# export BUILDROOT_BUILD=/home/buildroot-build
# cd $BUILDROOT_BUILD
# touch Config.in external.mk
# echo 'name: mini_linux' > external.desc
# echo 'desc: minimal linux system with buildroot' >> external.desc

# mkdir configs overlay
# cd $BUILDROOT

Ctrl-a + x 退出 qemu,没有反应就多按几下。

# make menuconfig
Toolchain ---> Toolchain type ---> External toolchain
Toolchain ---> Toolchain ---> Custom toolchain
Toolchain ---> Toolchain origin ---> Pre-installed toolchain
Toolchain ---> Toolchain path ---> /opt/toolchains/x86_64-unknown-linux-gnu
Toolchain ---> Toolchain prefix ---> x86_64-unknown-linux-gnu
Toolchain ---> External toolchain gcc version ---> 5.x
Toolchain ---> External toolchain kernel headers series ---> 4.3.x
Toolchain ---> External toolchain C library ---> glibc/eglibc
Toolchain ---> Toolchain has C++ support? ---> yes
System configuration ---> System hostname ---> mini_linux
System configuration ---> System banner ---> Welcome to mini_linux
System configuration ---> Run a getty (login prompt) after boot ---> TTY port ---> ttyS0
System configuration ---> Network interface to configure through DHCP --->
System configuration ---> Root filesystem overlay directories ---> $(BR2_EXTERNAL)/overlay
Kernel ---> Linux Kernel ---> no
Filesystem images ---> cpio the root filesystem (for use as an initial RAM filesystem) ---> yes
Filesystem images ---> Compression method ---> gzip

修改源代码

系统调用定义:

arch/x86/include/generated/uapi/asm/unistd_64.h
arch/x86/include/generated/asm/syscalls_64.h

arch/x86/entry/syscalls/syscall_64.tbl

新增一个系统调用定义:

// arch/x86/entry/syscalls/syscall_64.tbl
// ...
536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
537	x32	recvmmsg		compat_sys_recvmmsg
538	x32	sendmmsg		compat_sys_sendmmsg
539	x32	process_vm_readv	compat_sys_process_vm_readv
540	x32	process_vm_writev	compat_sys_process_vm_writev
541	x32	setsockopt		compat_sys_setsockopt
542	x32	getsockopt		compat_sys_getsockopt
543	x32	io_setup		compat_sys_io_setup
544	x32	io_submit		compat_sys_io_submit
545	x32	execveat		compat_sys_execveat/ptregs
546	x32	preadv2			compat_sys_preadv64v2
547	x32	pwritev2		compat_sys_pwritev64v2
+ 998 common get_cpus     sys_get_cpus
// include/linux/syscalls.h
// ...
asmlinkage long sys_pkey_mprotect(unsigned long start, size_t len,
                  unsigned long prot, int pkey);
asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
asmlinkage long sys_pkey_free(int pkey);

+ asmlinkage long sys_get_cpus(void);

系统调用代码

// kernel/sys.c
// ...

+ SYSCALL_DEFINE0(get_cpus) {    
+    // 获取系统中有多少 CPU
+    return num_present_cpus(); 
+ }

num_present_cpus 定义在:

// include/linux/cpumask.h
// ...
#if NR_CPUS > 1
#define num_online_cpus()   cpumask_weight(cpu_online_mask)
#define num_possible_cpus() cpumask_weight(cpu_possible_mask)
#define num_present_cpus()  cpumask_weight(cpu_present_mask)
#define num_active_cpus()   cpumask_weight(cpu_active_mask)
#define cpu_online(cpu)     cpumask_test_cpu((cpu), cpu_online_mask)
#define cpu_possible(cpu)   cpumask_test_cpu((cpu), cpu_possible_mask)
#define cpu_present(cpu)    cpumask_test_cpu((cpu), cpu_present_mask)
#define cpu_active(cpu)     cpumask_test_cpu((cpu), cpu_active_mask)
#else
#define num_online_cpus()   1U
#define num_possible_cpus() 1U
#define num_present_cpus()  1U
#define num_active_cpus()   1U
#define cpu_online(cpu)     ((cpu) == 0)
#define cpu_possible(cpu)   ((cpu) == 0)
#define cpu_present(cpu)    ((cpu) == 0)
#define cpu_active(cpu)     ((cpu) == 0)
#endif

在容器中重新编译内核。

# cd linux-4.9.301
# make -j8

# ls ./arch/x86/boot/bzImage 
./arch/x86/boot/bzImage

编译完成后,写个简单的程序来调用一下 sys_get_cpus:

// get_cpus.c
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

int main(int argc, char const *argv[])
{
    long number = syscall(998);
    printf("number of cpu is: %ld\n", number);
    return 0;
}

将其编译为可执行文件,并加入 rootfs 中:

# gcc get_cpus.c -static -o get_cpus
# mv ./get_cpus ./initramfs/usr/bin/
# cp -a ../get_cpus ./usr/bin/
# cd initramfs
# find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../initramfs.cpio.gz
# cd ..
# ls -lh | grep initramfs.cpio.gz
-rw-r--r--  1 root root 1.4M Jul 14 01:29 initramfs.cpio.gz

记住:一定要加上 -static!!! 再次通过 qemu 运行新的内核:

# qemu-system-x86_64 -kernel ./linux-4.9.301/arch/x86/boot/bzImage -initrd ./initramfs.cpio.gz -append "nokaslr console=ttyS0" -nographic

/ # ls ./usr/bin/ | grep 'cpu'

可以看到在 /usr/bin 目录下存在新的可执行文件 get_cpus,我们运行它:

# ./get_cpus 
number of cpu is: 1

结果输出 1。 结果没问题,但有个点很奇怪,目前来说 1 核的 CPU,即使是虚拟机也不多见,那么为什么这里的 cpu 个数结果恰好是 1 呢?

原因其实在 docker 上,由于我们的环境跑在容器中,而 docker 默认就对每个容器的 CPU 进行了限制,这样就不会因为容器过度使用 CPU 而使宿主机资源不够。

通过 cat 命令查看容器的 CPU 限制:

# cat /sys/fs/cgroup/cpu/cpu.shares 
1024

1024/1024 就能得到容器 CPU 大概的限制了,恰好结果就是 1。

参考资料