OS Note - Virtual Memory

In Chapter 9, we discussed various memory-management strategies used in computer systems. All these strategies have the same goal: to keep many processes in memory simultaneously to allow multiprogramming. However, they tend to require that an entire process be in memory before it can execute.

Virtual memory is a technique that allows the execution of processes that are not completely in memory. One major advantage of this scheme is that pro- grams can be larger than physical memory. Further, virtual memory abstracts main memory into an extremely large, uniform array of storage, separating logical memory as viewed by the programmer from physical memory. This technique frees programmers from the concerns of memory-storage limita- tions. Virtual memory also allows processes to share files and libraries, and to implement shared memory. In addition, it provides an efficient mechanism for process creation. Virtual memory is not easy to implement, however, and may substantially decrease performance if it is used carelessly. In this chap- ter, we provide a detailed overview of virtual memory, examine how it is implemented, and explore its complexity and benefits.

1. Demand Paging

valid/invalid = page currently in main memory or not

Page fault

當 page fault 發生時，先去檢查這個 page 是否為 valid(in backing store) or invalid(not exist)，這個檢查表通常存在 PCB 裡
若 page 為 valid，則找到該 page 存於 backing store 的位址
在 free-frame list 中找到一個 free-frame
發起 second storage I/O request 並將 backing store 裡的 data 寫進 main memory
更新 process 的 page tabel（page table entry 對到的 value、valid bit）
重新開始觸發 page fault 的 instruction

More detailed version:

Instruction Restart and its Issue

重新開始時可能原本的資料已被更動過，造成錯誤
解決方法：
- 在開始做任何 instruction 前，先把會碰到的 page 都戳一遍，就可確保在執行期間不會發生 page fault
- 用 register 暫時存取 instruction 開始前各個位址的值，page fault handle 完後再寫回相應的位址

Swap Space

Swap space 是一個在 backing store 特殊的區域，其讀寫速度會快於其他區域
其設計的目的是為了存放時常被改動，但又放不進 Main Memory 的 Data
通常有兩種 manage swap space 的方法：
- 在 process start 時，把整個 file image 都複製到 swap space
  - 在 process start 時有 overhead，並且可能 file 的某些部分完全不會用到
  - process 進行中會有比較小的 page-in overhead
- 一開始什麼都不做，但當有 page 被 page-out（移到 backing store）時，把它寫進 swap space
  - 再次讀取被 page-out 的 page 會比較快
有些 OS 會限制不能將 binary executable 放在 swap space 中，因為通常 executable 不會被改動

2. Copy-on-Write

注意 vfork() 只會 copy，不會 on-write，正確使用方式是 call 完 vfork 馬上 call exec

3. Page Replacement

當我們需要把某個在 backing store 的 page swap in 時，發現當前 main memory 已經滿了，這時候就需要做 page replacement
利用 dirty-bit 記錄檔案是否被更改過，若沒有被更改過，在 page-out 時就不需要寫回 backing-store，就可減少一半的 swapping time

Ａ. FIFO Page Replacement

Belady’s Anomaly: 對於某些 page replacement algorithm, page fault rate 不一定會隨著 Memory size 增加而減少, 因此增加 memory 有時候並不會增加效能

B. Optimal Page Replacement

Replace 未來最久不會用到的 page
需要 future knowledge，沒有實務價值，通常用於與別的 strategy 比較

C. LRU Page Replacment

Replace 過去最久沒有用到的 page
Performence 通常很不錯，經常被選擇
2 implemetation method:
- Counter
  - 在 page table entry 裡多紀錄一個 time-of-use info — 上次 reference 時間（CPU clock time）的訊息
  - Replace time-of-use 最小的 page
  - 在 Replacement 時會有搜尋 page table 的 overhead
- Stack
  - maintain 一個 head / tail pointer 來記錄 stack 的範圍
  - 當某個 page 被 reference 時，把他在 stack 中出現的 entry rmove，再把他 push to top of stack
  - 從 stack 的最下面 remove page
  - Update 時有 overhead 但不需要 search page table
問題：update refernce time 時 overhead 太高（要 update entry 裡的整個 clock field）

* Stack Algorithms

Stack algorithm 須滿足以下下性質：假設 memory 可以存放 N 個 frame，則每個時刻在 memory 的 page number 集合，必須為 memory 可以存放 N+1 個 frame 時對應時刻的 page number 集合的 subset
Optimal / LRU page replacement 皆是 Stack algorithm，因此只要 memory size increase，便可確保效能一定會上升 -> not suffer from Belady’s anomaly

D. LRU-Approximation Page Replacement

改進 LRU 的缺點，使用 reference bit 將 page 分成兩群
以記錄較不精確的資訊換取效率

D-1. Additional-Reference-Bits Algorithm

利用 8 個 bit 紀錄大致的 page 使用狀況
週期性（e.g. every 100 ms）地進行：
- 將 8 個 bits right shift 一格（丟掉最久遠的一格）
- 將該 page 當下的 reference bit shift 進 MSB（left most bit）

D-2. Second-Chance Algorithm

利用 FIFO 與 refernce-bit 的輔助
當 FIFO algorithm 要 page-out 一個 page 時：
- 如果 reference bit == 0，pager out normally
- 如果 reference bit == 1，give it second chance(set reference bit to 0) and check the next page
當一個 page 曾經被 reference 過，他就有一張免死金牌
Worst case: 所有 page 的 reference bit 都是 1，則 second chance algorithm 會 degenerates to FIFO

D-3. Enhanced Second-Chance Algorithm

除了 refernce-bit，再多加一個 modify-bit 紀錄資訊
當 (reference, modify) =
- (0, 0): Best for replacement，因為不需要寫回 backing storage
- (0, 1): not recently used but modified
- (1, 0): recently used but clean
- (1, 1): recently used and modified

E. Counting-Based Page Replacement

Maintain 一個 counter for each page 去紀錄這個 page 被 reference 了幾次
Two kinds of algorithms:
- Least frequently used (LFU) algorithm
- Most frequently used (MFU) algorithm

F. Page-Buffering Algorithms

F-1. Keep a pool of free frames

把需要 page-in 的 page 直接放進 pool 裡面的 free frame，就可以不用花費等待 page-out 的時間
需要犧牲一部分的 free frame 拿來做 pool

F-2. Maintain a list of modified pages

趁 second storage 沒有 I/O event 在進行時，先將 modified page（需要寫回去的 page）寫回 backing storage
額外紀錄的 overhead
可能其他 second storage I/O 會被這個 routine 卡到（可以透過與 free frames pool 混用改進）

F-3. Keep a pool of free frames but remember which page was in each frame

page fault 發生時，檢查該 page 是否對應到某一個 free frame pool 中的 frame
減少 second storage I/O 的次數

4. Frames Allocation

Minimum Number of Frames

每個 process 都需要分配到一定數目的 frames，不然執行一次需要 access 多個 page 的 instruction 只會不斷觸發 page fault 無法完成

Allocation Algorithms

Equal allocation
- 平等的分配現有的 frame 給全部的 process（餘數可做成 free frame pool）
- 問題：每個 process 需要的 frame 數量不一
Proportional allocation
- 依照每個 process 的 page number，按比例分配 frames
- 也可以加上 priority factor，讓比較重要的 process 分配到比較多 frames

Globel vs. Local Allocation

Global replacment
- 當需要 page replacement 時，可以從全部 process 的 frame space 中選擇 victim
- 好處：paging 較有彈性
- 多數 OS 使用這個方法
Local replacement
- 當需要 page replacement 時，只能從該 process 的 frame space 中選擇 victim
- 好處：process 的 performance 比較穩定

Reclaiming Pages

當 system 的 free memory 小於某一個 thershold 時，一個名為 reaper（收割機）的 kernel routine 便會啟動，開始依照 paging algorithm 去取回 process 佔用的 frame
在 Linux 系統中有更極端的例子：當 Out-of-memory 時，系統會根據 OOM score 殺掉 process

Non-Uniform Memory Access (NUMA)

在一些比較先進的 multi-core system 中，每個 CPU 都可以有自己的 memory
在做 page-in 時，應該優先考慮正在執行該 process 的 CPU 的 memory

5. Thrashing

當一個 process 的 paging time 大於 execution time 時，我們稱這種情況叫 thrashing
Thrashing 應該被極力避免，因為會照成 performance 低落

Causing of Thrashing

多個 process 同時進行時，某個 process 產生 page fault 並 swap out 其他 process 的 page，造成其他 process 執行時也會 trigger page fault
上述情況造成 CPU 使用率下降（都在做 paging），因此 CPU 便會排進更多的 process 準備執行，讓情況變得更糟
儘管換成 Local replacement，問題還是沒有完全解決，因為 system 還是不能完全清楚一個 process 需要的 frame number，依舊可能造成 thrashing

Working-Set Model

Locality

Process 在進行時，會從一個 locality 換到另一個 locality（可以想成當前的 memory state），如果能知道每一階段的 locality 大小，便可更彈性且完善的提供每個 process 足夠的 frame

Working-Set strategy 會去看過去 $\Delta$ 次的 page reference，便從這幾次總共 reference 了幾個 page 來決定 locality size
$\Delta$ – Working-set window
問題：存取連續的 Working-set 有較高的 overhead（maintain 最近 $\Delta$ 個的 reference）
Approxamation Version:
- 只記錄在過去 N 個時間段中該 page 是否有被 reference
- 檢查當下的 reference bit 跟 N 個 additional bits

Page-Fault frequency

利用 Page-Fault frequency(PFF) 來決定是否分配或收回 process 的 frames
- PFF > upper bound, 分多一點 frames 給 process
- PFF < lower bound, 收回一些 frames

6. Memory Compression

將一些暫時不需要用到的 frame 壓縮，需要的時候再解壓縮回來

7. Allocating Kernel Memory

我們會為 kernel memory 訂立獨立的 allocation 規則，因為：
- kernel 的 data structure 有不一的大小，並且大部分是 fixed-size，如果都用 page size 存的話會造成嚴重的 internal fragmentation
- 有些 hardware device 會直接與 physical memory 互動，並且需要連續的 physical memory space

Buddy System

將一塊連續的 memory 切成小塊(buddies)，切到可以裝得下需求量的最小單位
好處：連續的 buddies 可以組合
問題：依舊會有 internal fragmentation

Slab Allocation

一個 slab 由一個或多個連續的 physical page 組成
一個 cache 由一個或多個 slab 組成，每一種 kernel data structure 都會有自己的 cache (e.g. PCB, file object…)
當需要 kernel memory 時，slab allocator 會從 cache 中找到一個 free object 並把他 mark 成 used，如果沒有則會重新 allocate 一塊 object 大小的 memory
好處：不會有 internal fragmentation，每次 allocator 都會 return object 剛剛好的大小

8. Other Considerations

Prepaging

在 process 執行前，先預測會使用哪些 page ，並將這些 page load 進 memory
可以避免 process 一開始的大量 paging

Page Size

If page size become larger:
- Page table size decrease
- I/O time decrease
- Page fault frequency decrease
If page size become smaller:
- Less internal fragmentation
- More accurate locality
現今的趨勢是越來越大的 page size