Browsed by

Kernel Bootup Page Table Initialize Process(x86_64)

Kernel Bootup Page Table Initialize Process(x86_64)

This article will provide detailed information about the kernel bootup page table setup.

In a brief view, the kernel setup page table in three steps:

  1. Setup the 4GB identity mapping
  2. Setup 64bit mode page table early_top_pgt
  3. Setup 64bit mode page table init_top_pgt

The last two steps are both higher mapping: Map the 512MB physical address to virtual address 0xffff80000000 – 0xffff80000000 + 512MB.

Next, we will talk about the details. We will use the 4.14 version code to explain the process.

You need to know the IA32e paging mechanism and relocation to read the article. The Intel manual has a good explaination of IA32e paging

Before decompression

When the kernel is being loaded, it is either decompressed by a third-party bootloader like GRUB2 or by the kernel itself. Now we will talk about the second condition. The code started from arch/x86/boot/header.S . It is in 16bit real mode at the time. Then in code  arch/x86/boot/compressed/head_64.S  We setup the first page table in 32bit mode. We need this page table to take us to do take us to 64bit mode.

The following code is the set-up process

Notice that from the comment above. %ebx contain the address where we move kernel to make a safe decompression. Which means we should treat %ebx as an offset to the compiled binary. The compiled binary start at 0. So we fix-up the difference to reach the real physical address.

The above code setup Top level page directory. This only set the lowest page directory entry to (1007 + pgtable). This is a pointer to the next level page table. And next level page table start at 0x1000 + pgtable. The last line adds %edx to 4+%edi will set encryption masks if SEV is active. Currently, we can omit this line.

Then we look at the next level.

Here, we can see we set up four entries. and each entry point to another page directory.

This is the last level of page directory, these entry will point to a physical page frame directly. Now let’s take a look at the code. It sets up 2048 entries. Each entry with a Page Flag R/W = 1 U/S = 0 PS = 1. This means the page is read / write by kernel only and its size is 2MB. Each PTE(Page Table Entry) is a 8 Byte block data. So one page can contain at most 512 entries. Here kernel setup 4 pages of Level 2 Page Directory. The following image show the current page table structure.

In total we have 2048 * 2MB = 4GB physical address, identity mapped to 0 – 4GB linear address.


Then we use a long return to switch to 64bit mode.

Kernel push the startup_64 and CS register to stack, then perform a long return to enter 64bit mode.  And then after copy the compressed kernel, we jump to symbol relocated

In the relocated code, we do the kernel decompression.

The decompressed kernel is compiled at high address(we take ffffffff81000000 for example). But now we don’t have the correct page table to do the mapping. Fortunately, the extract_kernel function returns the physical address of the decompressed kernel. (Which is %ebp, equals to %ebx). After decompression, %rax contains the kernel physical start address. We jump there to perform the further setup.

Start execution in vmlinux

We now arrived at arch/x86/kernel/head_64.S. Before we continue, we must notice two things first.

  • After decompression, the kernel is placed at physical address %rbp (If we do not set CONFIG_RELOCATABLE it’s equal to 0x1000000
  • After decompression, we now in the kernel code compiled with the virtual address ffffffff81000000(as we mentioned above).

So here is a big pitfall. We cannot access ANY of the symbols in vmlinux currently. Because we only have a basic identity mapping now. But we need to visit the variables. How can we make it? The kernel uses a trick here, I will show it below

This function fixup the symbol virtual address to the real physical address.

“Current Valid Addr” = “Virtual Hi Addr” – “Kernel Virtual Address Base Addr” + “%rax Extracted kernel physical address”.

Now we continue reading the arch/x86/kernel/head_64.S  assembly code, this is where we landed from arch/x86/compressed/head_64.S

The enrty is startup_64:

In this article, we talk about self loading, instead of using a third party 64bit bootloader like GRUB. So as the comment said, we come here from arch/x86/boot/compressed/head_64.S. If we config the kernel with CONFIG_RELOCATABLE, the kernel won’t run at the place we compiled, page table fixup need to be performed. The page table is fixed in __startup_64

We compute the load_delta, and fixup the early_top_pgt. Now we just assume we don’t configure the kernel with CONFIG_RELOCATABLE. Then we can look at the page table built at compile time. First we look at the top level early_top_pgt. It set only the last entry point to level3 page table. which means only virtual address start with 0xff8000000000 will be valid.

Now we look at the next level (We do not use 5 Level Paging).

This level we have two entries, one for kernel address space. One for fixmap address space, fixmap address space is used for IO mapping, DMA, etc. Now we just look at the fixmap address space. It’s at index 510. in binary mode 0b111111110. Combine with the top level we get a smaller linear address space. Only address start from 0xffff80000000 is valid.

Then it’s the last level page directory. level2_kernel_pgt

This level is a mapping to physical address 0 – 512MB (it maps more than that, but we only need 512MB) So we get the current mapping then.

Linear: 0xffff80000000 – 0xffff80000000 + 512MB =====> Physical: 0 – 512MB

You can use a gdb to print the page table and debug it in your own. Here is a simple “it works!” script for parsing the page directory entry

Kernel load the early_top_pgt into cr3 using the following code

The current page table structure is shown below:

Now we are free to visit any kernel symbol without to force convert the address using fixup_address or something else. We can go further to the init/main.c code.

We use a long return to get to get to x86_64_start_kernel

initial_code here is defined as x86_64_start_kernel.


Moving to init/main.c

We are now at arch/x86/kernel/head64.c and in function x86_64_start_kernel 

We set up init_top_pgt[511] same as early_top_pgt[511]  . init_top_pgt is the final kernel page table. From x86_64_start_reservations we get to start_kernel This is a function located at init/main.c

After calling setup_arch, CR3 is loaded with init_top_pgt. Then the kernel page table will not change. I wonder if there is a change to switch kernel page table from 2MB size physical page to 4KB physical page, but it seems that the CR3 remained unchanged, and I examined the page entries, they remain unchanged, too. Even the code has executed into rest_init then do_idle

The following function is a simple debug function to output the current CR3 register since GDB cannot get the CR3 register value, I just print it out to see when it changed.


Kernel Driver btusb Overview

Kernel Driver btusb Overview



btusb_probe is use for hot plug-in for bluetooth usb generic controller, here will explain the function in detail.

First is an interface check mechanism

This special condition is used for supporting apple Macbook 12,8 (2015 early). According to the normal specification, the main interface for USB is 0, and audio (isochronous) is 1, but apple made a change on it, changing the main interface to 2 and audio to 3. The “bInterfaceNumber !=2 ” is for checking hardware for the special case in Apple series product. The macro BTUSB_IFNUM_2 is a driver_info flag, for Macbook devices, this flag will be set, else it will be 0. See the btusb_table for detail.

Then do further check on blacklist devices, some of the blacklist device is because there are specific driver (e.g bcmxxxx) for the device, so they do not use the generic one called btusb. Some of them just because they are not supported, and other reasons.(Not sure what reason are there)


Then we allocate memory for structure btusb_data, use this to store data for the USB interface. Also we need to check the memory remained for the allocation. Then we do the real work: set up currrent interface endpoints for interrupt and bulk (Why only these two?) It go through all the endpoint in the current interface. We get the current_altsetting to get a list of current active(available) endpoints.


usb_endpoint_is_int_in and usb_endpoint_is_bulk_out, usb_endpoint_is_bulk_in are functions use to know what type of the endpoint is it. These info is use to set up driver data at the end of the call. If none of inter_ep, bulk_tx_ep or bulk_rx_ep is set, it will also result in No Device Error(ENODEV)

This part of code is used for URB generation. URB is short for “USB Request Block” According to the Bluetooth v5.0 Specification, When sending an Control URB to AMP, the bRequest field should be 0x2b. Shown in the figure below.


Currently, for the interface to work with kernel to perform different operations. The driver itself need to be convert to device structure. Use the function named interface_to_usbdev Here is a quote from Linux Device Driver 3 :

A USB device driver commonly has to convert data from a given struct usb_interface structure into a struct usb_device structure that the USB core needs for a wide range of function calls. To do this, the function interface_to_usbdev is provided. Hopefully, in the future, all USB calls that currently need a struct usb_device will be converted to take a struct usb_interface parameter and will not require the drivers to do the conversion.


Then we continue with the initialize process.


Here we init the workqueue, data->work and data->waker these are shared workqueue offered by kernel. (Default Shared workqueue). We call schedule_work(data->work) in btusb_notify function to submit a job into workqueue and data->waker is also controlled by other functions

Then these init_usb_anchor calls. In my view, is just a sort of data queue, URB request will be queued(anchored) in certain queue, then processed in serial. Then init the spinlock for the device(interface)


Another special case, for Intel bluetooth usb generic driver, kernel will use special recv handler functions, for other USB generic bluetooth driver, kernel just use the common one.

Then do a lot of device specific set-up, we skip the code and go to the  isochronous setup process.



Here, the usb_driver_claim_interface is used for set up more than one interface binding to the current device driver. It also happens when this is a isochronous or acm(?) interface, here it’s a isochronous interface

Finally we call hci_register_dev to register it , this is one of the function in the Bluetooth Host Controller Interface core function series, from file net/bluetooth/hci/hci_core.c. After that, we set the interface data to intf

Building your own live streaming site using Nginx RTMP & video.js

Building your own live streaming site using Nginx RTMP & video.js

As I said in twitter I will update my blog at least once a week, so now I am writing this week’s blog (Although this article doesn’t contain too much technical detail) I just built my personal live server, for the trail version on bilibili is expired. And I don’t want to send sensitive personal data to that platform, so I decided to build one on my own.

Previously I built a live stream service using my raspberry pi, and only use the most simple configuration of nginx, and it does not play very well. Now I have bought a shiny new VPS from CAT.NET with my partner onion, it’s awesomely fast and fluent, so I use this server to build my live stream service, including a frontend to play the stream.

The tutorial is here: 

The following guide will show how to build one stream server on Archlinux (Yes, archilnux ONLY, but compatible with many other distro), Just follow the basic steps:

Setting up the RTMP Streaming Server with HLS

  1. Install nginx-rtmp from AUR (nginx-mainline + nginx-rtmp-module may also works, but I have problems when compiling the module using makepkg)
  2. If you have previous nginx configuration, install nginx-rtmp will conflict with nginx, just remove it, no worry about the configuration file you wrote, it will be stored at /etc/nginx/nginx.conf.pacsave
  3. If you have a previous installation of nginx, after install nginx-rtmp  exec the command mv /etc/nginx/nginx.conf /etc/nginx/nginx.conf.old and mv /etc/nginx/nginx.conf.pacsave /etc/nginx/nginx.conf
  4. Then restart the nginx server and reload the daemon systemctl daemon-reload  && systemctl restart nginx.conf
  5. This restart shouldn’t generate any error except you have error in your previous nginx config
  6. Then create the rtmp.conf file (or whatever you name it)  example configuration can be found here

Here is my rtmp.conf, remember to include it in nginx.conf OUTSIDE the http block like this:

Then you can use OBS or anything else to push to the livestream, in this example, you should push to

In the key field just write something e.g: test

Remember to mkdir /tmp/hls before using the live stream


Setting up the Live Stream data service

We need hls.js or videojs with hls supported, I choose the latter one.

create a config in /etc/nginx/sites-available


put these lines in the config file

Then when you visit you will get the live stream playlist of live named “test”.

This *.m3u8 file is a text file that describes every segment to play (segments are named in <name>-<id>.ts format), here is a sample file

The videojs / hls.js will recognize format and parse it, fetch *.ts segments from server then play it one by one, making it looks like a live stream.

Create a webpage to show it

I use videojs with hls support for this, just take a look at view-source:// you can get a video player that works.

See for more details




I didn’t find any credential configuration when setting up rtmp stream, this will make it dangerous when someone know my rtmp URI. Bad guy can push nasty live stream / video to my site, currently the URI is complex, and cannot be fetched from frontend (I only expose the hls interface). Future will try to support auth


If you have problems when setting up the live stream service & frontend, just feel free to comment below, I am glad to help





这篇文章写出来的时候,VOID001 已经毕业了(虽然因为学分问题导致毕业证和学位证还拿不到,又因为嫌毕业文化衫太丑代码太挫没有照全专业的合影,还因为毕业这几天一直在和各个老师谈NEUOJ NEUOS的事情导致毕业典礼那天我用来休息和整理东西没有参加,但是我确确实实是毕业了。)。而公元二零一七年六月二十九日又恰好是 VOID001 的阴历生日,所以这篇文章就作为对自己大学生活以及自己这一年的一个总结一并涵盖了。本文中所有人和事都是真实存在且发生过的,文中将人名以本人最熟知的昵称+姓名的汉语拼音首字母标示,后续再次出现仅仅使用昵称。




Read More Read More









i3 + conky 打造实用美观的桌面环境 =w=

i3 + conky 打造实用美观的桌面环境 =w=

看着 #archlinux-cn 的一些小伙伴使用各种 wm (Window Manager),在一段短暂的犹豫之后,窝也开始转向 i3wm 啦 =w=






  • i3wm — 我们要用的就是它 (pacman)
  • conky with lua support — 用来显示状态信息的工具,支持用户自定 lua 脚本(另外不是Condy(逃 (AUR)
  • 卡夫的 conky-i3bar =w=
  • feh — 用来换壁纸哒 OwO (pacman)
  • GIMP — 这个一会儿就知道干什么用的啦( (pacman)
  • InkScape — 矢量画图工具,用来在 conky 上绘制图形 (pacman)
  • compton — 用来让窗口支持透明的混成器 (pacman)

在安装的时候要注意 AUR 里的 conky-lua 已经不可用了,社区有用户自己修改了 PKGBUILD 并 comment 在了这个包的下面,使用这个 PKGBUILD 即可编译通过,conky 就有 lua support 啦 ~

下载这个 PKGBUILD 文件并 makepkg 即可



i3wm 的 wiki 页面已经给了很详细的介绍 在此仅说明几个常用的配置项

  • 快捷键支持,这个只需要使用 bindsym <YourKeys> <YourCommand>即可将按键组合绑定到任何可以执行的命令上,比如这个例子  bindsym $mod+Ctrl+l exec i3lock -i /home/void001/Pictures/WP/lock-ohana.png 就将 Modkey(我的是Meta) + Ctrl + L 绑定为锁屏。
  • i3 启动时命令执行, 这个只需要在 i3 config 里配置  exec <Your Command> 或者 exec_always <Your Command> 前者只会执行一次,只有退出 i3 再次进入才会重新执行,后者只要 restart 一次 i3 就会执行一次
  • 将特定类别的窗口分配到某一个 workspace , 或者置为悬浮状态,我们需要使用 for_window ,语法是这样的 for_window <criteria> <command> 其中 citeria 可以在这里找到 这里举一个最常见的例子,在 obs 启动时,将他的窗口置为 floating   for_window [class="obs"] floating enable
  • 另外,将特定的窗口分配到特定的 workspace 也是一个很重要的功能, 这里的语法是 assign <criteria> <workspace>  这里我们举例将所有的浏览器都放在 workspace 3 里 assign [window_role = "browser"] 3
  • 当然不能忘了字体的设置啦,为了让 i3 的窗口标题栏字体显示看起来适当,我们需要设置字体,我们可以使用系统的字体作为设置,需要加上 pango 前缀 例如:  font pango:DejaVu Sans Mono 20
  • 然后,记得去掉 bar以及大括号内部的那些东西 =w= 因为我们有好看的 conky-i3bar 并不需要用 i3bar 啦(

关于 i3 的使用方法在文档里说的已经很清楚了,如果不想看文档的话,也可以去看 这期视频教程

各种问题 OAO

怎么没有桌面壁纸 OAO

那么现在我们就有一个 Tiling Window Manager 了,开心的登录进来之后发现


WTF!壁纸呢 QAQ (你又没配置它当然不会有了

经过查阅资料得知,X 窗口管理下面,我们的桌面壁纸可以看做是显示在 root window 下的,我们可以通过 xsetroot 设置壁纸,然而这个工具十分难用,而且大概只支持 png 格式的壁纸(之前还写了一个转换jpg到png的程序给它用),这时候就是安利 feh 的时候啦~ 不仅支持各种格式的壁纸图片,还支持随机壁纸哦~ 我们把壁纸都放在 ~/Pictures/Wallpapers 下面,随机切换壁纸的指令如下

feh –recursive –randomize –bg-fill ~/Pictures/WallPapers
我们将它绑定到 i3 的快捷键上 OwO 这样就可以一键换壁纸啦 \w/,然后,我们把终端设置为透明的, 这样我们就不需要终端的壁纸只用桌面壁纸就可以啦(Konsole里面的 Profile 可以修改透明度,不过要先运行 compton)


这里就是 GIMP 发挥作用的时候啦~! GIMP 可以轻松的将特别亮的图变暗哦~



是不是好了很多 OwO ,只要在 GIMP 里打开图片,在 Color->Brightness and Contrast 下将图片的亮度降低,然后导出,即可~

我的 Okular 图标都丢了诶!原来用 KDE 的时候还是好的

这个问题是因为 Okular 检测不到 KDE 的环境变量,因而显示的时候缺少了 KDE 的主题效果,解决办法由 eatradish 提供 OwO

将 export DESKTOP_SESSION=kde 这一行加入到 .xprofile 中即可,在 X 启动时候设置好环境变量。再次使用 Okular Dolphin 都没问题啦~

i3bar 太丑啦! 窝想要个好看的状态栏

尽管 i3bar 支持Unicode Character, Font Awesome 来配置显示,可是它显示的还只能是字符 🙁 为了让状态栏看起来更好看 ,我们来使用 conky! 作为 statusbar !

下载好上面的 conky-i3bar, 安装好 conky-lua 并确定 lua 已经安装在你的系统里,我们先来试试水:

记得先修改 conkyrc.lua 里面的 lua_load 改为你的 i3bar.lua 的地址

conky -c /path/to/conky-i3bar/conkyrc.lua 看看效果…

这是什么鬼啊 (#`Д´)ノ ,看来想偷懒是不行了( ˘・з・) 开始调整吧 OwO,因为是 lua 写的所以调整起来并不是很难,基本上都是调节每一个component的xpos, ypos ,还有调整字体大小,注意字体大小不要在 conkyrc.lua 里设置,i3bar.lua 会对字体进行设置, 修改字体的时候记得修改这里。另外在修改的过程中如果发现图片和文字重叠了而且无法调x y pos解决(比如显示日期的那个矢量图因为字体调整之后显得很鬼畜)就直接打开 InkScape 去修改相应的矢量图即可,修改后保存立即可以看到效果,这点还是很不错的 =w=

不过 conky 的 reload 只有在 i3bar.lua 这个文件被修改的时候才会自动 reload 为了让我们能看到实时的修改效果,写一个脚本不停重启 conky 吧(

经过一番调整之后 我们有了这样的状态栏 (,,・ω・,,)


我还想看 Wifi 状态可是 component 里没有啊 QAQ

没关系, repo 的作者提供了供大家开发使用的工具库 util.lua,并且在 component 里有一系列的例子关于如何编写一个组件,然后窝实现了这样的一个组件

三个小圈圈是信号强度,现在的状态是很强的信号,如果信号衰弱则就会减少圈圈,旁边显示的是我的网络 interface 名称以及我 连接的 Wifi 的名称 OwO



实际上上面用了四个 svg 图

没有圈,一个圈,两个,三个,下面是这个 component 的代码,供对开发组件有兴趣的小伙伴研究


夏娜还是第一次做和”装饰”有关的 Coding 呢,感觉用 InkScape 画画图很有趣呢。调间距,字号什么的虽然很累,不过调整好的时候真的很开心w

[Linux 0.11] Draft 6 IA32架构下多任务的硬件支持

[Linux 0.11] Draft 6 IA32架构下多任务的硬件支持


支持多任务的硬件结构为 Task Register (TR) Task State Segment (TSS), LDT 以及 Task Gate。而最核心的,存储任务上下文信息的就是 Task State Segment, 下面对其进行详细的说明


  • GDT, LDT
  • TSS(Task State Segment)  has its own descriptor called TSS Descriptor
  • Structure of 32 bit TSS, store the context and link to previous task, and 3 different privileged Stack


  • Previous Task Link: 存储的是上一个任务的选择符
  • LDT Segment Selector: 存储的是这个Task使用的LDT
  • I/O Map Base Address: I/O Map的基地址(要对 I/O Map 是什么进行进一步的解读: I/O Map 包含一个权限Map和一个Interrupt redirect Map)

TSS 描述符的一些说明

  • TSS Descriptor 用于描述 Task State Segment 的描述符,当选择符的TI Flag被置位的时候(即指向当前LDT)不可以访问TSS, 这种情况下,使用 CALL, JMP 会引发GP#, 使用IRET 会引发#TS(Invalid TSS Exception), TSS Descriptor只能被放在 GDT 中,不可以放在 LDT IDT 中
  • 下面是TSS描述符的结构

 Type 有两种 1001 1011, 前者表示该TSSD对应的Task处于 inactive 状态下,后者表示该 TSSD 对应的Task正处于Busy状态下


  • 每一个Task只能有一个TSS Descriptor对应
  • Task Gate Descriptor  做切换的时候不需要检查 CPL




  • 当前运行的任务 JMP / CALL 了GDT中的一个TSS , 当 CALL / JMP 以 FAR 模式被调用的时候,如果 Segement Selector 指向的选择符是一个 TSSD, 那么就会发生 Task Switch, 忽略掉 CALL / JMP 的 offset。
  • 同样的,如果调用 JMP / CALL 而且 Selector 为 GDT / LDT 中的一个 Task Gate Descriptor(任务门描述符)也会发生 Task Switch
  • 发生中断/异常,且中断向量指向一个Task Gate Descriptor
  • IRET 指令被调用且 EFLAGS 寄存器的 NT Flag 被置位




  1. 获取 TSS 的 Segment Selector, 如果是 JMP / CALL 则是直接给出的, 或者是通过中断指向的 Task Gate Descriptor 给出,如果是 IRET 指令,则查看当前 TSS 的 Previous Task Link (见上图 TSS结构图)获得 Selector
  2. 检查特权级别, 如果是 CALL / JMP 则需要检查特权级别,中断 和 IRET 不需要检查, INT n 指令引发的中断调用,会检查特权级别
  3. 检查 TSS 的合法性,以及其他必要检查
  4. 清除或者设置相应的标志位
  5. 将当前(old)任务状态保存到当前的TSS内,
  6. 如果任务切换不是因为 IRET 引起的,则将新任务的 TSSD 的 Busy 位置位
  7. 装载新的 Task Descriptor(Selector) 到 Task Register
  8. 将新的 Task State Segment(TSS) 中的上下文装入通用寄存器,段寄存器, EIP EFLAGS 等寄存器(见上图 TSS结构)
  9. 上下文(context)已经设置好了,开始执行新的任务(此时的EIP, CS等 寄存器已经是该TSS中给定的寄存器了,之前的任务完全转移走了)


[Linux 0.11] Draft 5 虚拟内存管理(mm/memory.c)

[Linux 0.11] Draft 5 虚拟内存管理(mm/memory.c)


本文介绍 32 位保护模式下的内存分页, 以及 linux0.11 对物理内存的管理。


既然要使用内存分页,那么就一定有他的优势所在。我们首先来探讨一下使用内存分页的意义, 分页在于为进程提供了虚拟地址空间,通过提供的这个虚拟地址空间,我们可以做到下面的这些:

  1. 精细的权限控制粒度: 内存页大小以 4KB 为一页单位(也可以是4MB, 这里不讨论这种情况), 可以针对每一个内存页设置该4K空间的访问权限。相比分段的粗粒度精细了很多,目前的分段功能只是一个兼容性考虑,分段都是将整个0 – 最大内存空间直接映射为一个段;
  2. 可以使得进程独享地址空间: 通过切换CR3寄存器的内容,可以实现多个进程占用同一个虚拟地址空间。
  3. 有效的解决了进程对大块连续内存的请求: 这里用一个例子来说明一下。

例子是这样的: 如果我们有16MB的物理内存,现在有进程A B C分别占用5MB, 2MB, 3MB内存, 且不共享内存。然后现在进程B退出了,我们还剩余8MB内存,可是如果不使用分页的话,内存连续分配, 我们现在没有办法分配出一个连续的8MB的空间了,只能分配最多最多6MB连续的内存空间, 如下图所示。




我们先来看一下虚拟地址是如何转换为物理地址的 完整的转换过程为

虚拟地址 --> (GDT) --> 线性地址 --> (Page Table) --> 物理地址

关于VirtualAddr -> LinearAddr 的过程我在之前的文章中有过介绍,这里就不做说明,想要跳过那个文章的读者可以简单的理解为目前的虚拟地址和线性地址值是相同的,因而我们直接从线性地址出发。首先来看下图


(图片来源:Intel® 64 and IA-32 Architectures Software Developer’s Manual)

线性地址为32位的地址, 其中高十位为 Page Directory,中间十位为 Page Table,最后十二位是 Offset。过程如下:

  1. 根据 CR3 寄存器(忘记了寄存器的功能的话戳这里)找到了页目录表的地址。
  2. 根据 Linear Address 的高十位与页表目录地址相加,找到了对应的页目录项(PDE)。
  3. 页目录项内存有该页目录的起始地址,将此地址与 Linear Address 21 – 12 位(即中间十位)相加,找到对应的页表项(PTE)
  4. 根据页表项中记载的物理页地址,找到对应的物理页起始地址,再与 Offset 相加,对应到实际的物理地址。



一个页目录项(PTE)和页表项(PDE)均占用 4Bytes 空间

(图片来源:Intel® 64 and IA-32 Architectures Software Developer’s Manual)

如上图所示,这是一个 Page Table Entry (页面大小为4K)

首先我们看 Bit 0 ,这是 Present bit,只有当这个 Bit 为 1 的时候此项才表示一个 PTE(或者PDE),否则就是一个无效的表项。

Bit 1 表示这个页面的读写权限,当这个位被置 1 的时候 表示可读可写, 为 0 则表示只读 ,不过这里要注意一点,对于 Ring0 且 CR0 的 WP 位为 0 情况来说,Ring0 可以对只读页面进行写入,如果想要让 Ring 0 也不可以写入只读页面,需要将 CR0 的 WP 位置1

Bit 2 表示这个页面是系统页面还是用户页面,如果是系统页面则用户态无法使用此页面。

后续的标志位暂时不介绍。我们来看高20位, 这里表示了物理页的起始地址,如果这是一个页目录项而不是页表项,则这里表示的是页目录的起始物理地址


通过上述的过程我们也可以看出, Page Transform 的过程需要多次读取内存,会映像执行效率,为了解决这个问题,CPU对转换信息进行了缓存,如  TLB, PCID  等技术,这些不在本文进行介绍,这里要说明的是,因为缓存的存在,导致如果更新了已被缓存的页表项的信息(如修改FLAG, 或者物理地址),我们需要使缓存失效,具体的做法就是重新装载一次 CR3 寄存器。


说了这么多理论知识,我们来进行几个实际的操作看看。我们基于 linux0.11 的模型来进行下面四个实验(毕设结束后会放出git repo,目前暂无)

  • 使得某个特定的线性地址无效 (下文有介绍)
  • 将线性地址映射到给定的物理地址 (提示,查看 mm.c put_page 函数)
  • 修改该页的权限为只读并验证  (修改 FLAG 的另一个实验)
  • 给出当前线性地址所在页的详细信息 (打印出必要的FLAG信息,以及该线性地址对应的物理页地址)

disable_linear 实现


void disable_linear(unsigned long addr)

这个函数内我们需要对线性地址进行禁用, 根据上文中所介绍的,使得线性地址无效,实际上就是使得那个对应的页表项无效,因而我们需要通过线性地址对应到其页表项,并对该页表项的 Bit 0 清零。如何 Disable 与 Reload TLB 留给大家自行实现

linear_to_pte这个函数的实现过程实际上就是地址转换过程的前三步, 实现的时候要注意移位操作不要出问题,其他的没有什么难点


上面的代码对页目录项是否存在,以及是否是页目录项进行了判断,失败将返回 0 。

下面我们需要对函数的正确性进行验证,由于此代码版本还没有补充缺页异常的处理代码,因而当访问失效的虚拟内存地址时,会引发Page Fault -> Double Fault -> Triple Fault最后导致我们的OS不断重启,我们来验证一下,验证用代码如下:

运行之后, 看到 qemu / bochs 的确在不停的 reboot , 并且可以通过 bochs 的 GUI 界面看到我们的页表由原有的16MB一致映射变为了中间缺少 0xdad000 页。