OS Note - I/O System

The two main jobs of a computer are I/O and computing. In many cases, the main job is I/O, and the computing or processing is merely incidental. For instance, when we browse a web page or edit a file, our immediate interest is to read or enter some information, not to compute an answer.

The role of the operating system in computer I/O is to manage and con- trol I/O operations and I/O devices. Although related topics appear in other chapters, here we bring together the pieces to paint a complete picture of I/O. First, we describe the basics of I/O hardware, because the nature of the hardware interface places constraints on the internal facilities of the operating system. Next, we discuss the I/O services provided by the operating system and the embodiment of these services in the application I/O interface. Then, we explain how the operating system bridges the gap between the hardware interface and the application interface. We also discuss the UNIX System V STREAMS mechanism, which enables an application to assemble pipelines of driver code dynamically. Finally, we discuss the performance aspects of I/O and the principles of operating-system design that improve I/O performance.

1. Overview

Device Drivers

因為 I/O devices 有許多種類，OS 需要一個統一的介面才能與他們溝通。Device Drivers 便是扮演這樣的角色，他給出一個統一的 device-access interface to I/O subsystem。（如同 system call 給出一個 application 與 OS 間的溝通介面）

2. I/O Hardware

I/O device 的常見種類有：
- storage device (disks, tapes)
- transmission device (network connections, Bluetooth)
- human-interface device (screen, keyboard, mouse, audio)

I/O Connections

port
- machine 與 device 之間藉由連接點 (connection point) 連接，如 serial port
bus - 多個 device 共享同一組線路，與定義其上訊息傳遞的 protocol，統稱為 bus - PCIe bus 是常見的 PC system bus，連接 processor-memory subsystem 到比較快速的 devices - Expansion bus 則連接較慢的 devices，如 keyboard，mouse 等 - Serial-attached SCSI (SAS) bus 一頭連接多個 disk，另一頭連接 SAS controller

controller - a collection of electronics that can operate a port, a bus, or a device.
daisy chain
- device A has a cable that plugs into device B, and device B has a cable that plugs into device C, and device C plugs into a port on the computer
- daisy chain 通常被當成 bus 來使用

Memory-Mapped I/O

processor 如何與 controller 溝通？透過 controller 內部的 registers。processor 將 bit pattern 寫進對應的 register 或從 register 中讀取資訊。
Memory-mapped I/O 則是將這些 registers map 到 processor 的 address space。因此，在執行 I/O 時，可以不用發起 I/O instruction (將訊息透過 standard data-transfer instructions 寫到 registers 裡)，直接讀寫 physical memory 即可。
Memory-mapped I/O 的好處：讀寫 memory 很有效率，比接連發起 I/O instructions 還要快速，因此現今大多數 I/O 都是採用 memory-mapped I/O

Registers in I/O Device Controller

data-in register - host 從這裡讀資料
data-out register - host 把資料寫進這裡給 controller
status register - 給 host 讀的 register，會存目前的 status 資訊，如當前指令是否已完成、data-in register 是否有資料可以讀取、或是否有錯誤發生等
control(command) register - 給 host 寫的 register，可以讓 host 控制 controller’s mode，如 serial port 是 full-duplex/half-duplex、是否開啟 parity check 等

下圖展示 host 如何與 device controller 進行溝通

在第一個步驟中，host is busy-waiting or polling，這對於一個較慢的 I/O 是一種浪費 CPU 的行為。因此，我們需要一種 device 可以「通知」 CPU 的方法 - Interrupt

Interrupt

在 CPU 中，有一條叫 interrupt-request line 的 wire，CPU 會在每次 instruction 結束時去檢查他。當 CPU 發現這條 wire 上有 signal 時，會「存下目前的狀態」(state save)然後 jump to interrupt-handler routine，這個 routine 會存在 memory 中的一個 fixed-address
Interrupt-handler routine 會辨識 interrupt 的發生原因、執行必要的作業、將剛剛存下的狀態載回來 (state restore)、最後執行 return from interrupt 指令已回到 interrupt 前的 execution state
在這個過程中，我們會使用以下詞彙來描述。The device controller raises an interrupt by asserting a signal on the interrupt request line, the CPU catches the interrupt and dispatches it to the interrupt handler, and the handler clears the interrupt by servicing the device.

這個「簡易版」的 interrupt handling 存在一些問題：

當一個重要的 process 正在執行時，我們不希望他會被 interrupt 影響
需要一個有效率的方法得知是哪個 device 產生這個 interrupt 的，不然要把每個 device 都檢查一遍反而更花時間
同 1. 某些重要 process 發起的 interrupt 也應該擁有更高的 priority，i.e. interrupts 需要有輕重緩急之分
除了 I/O 以外，我們也需要一個其他 instruction 可以打斷 OS 的機制，讓一些無法忽視的錯誤（e.g. page faults, division by zero）可以被及時修正

第四點就是 trap 的概念

Interrupts Mechanism in Modern Computer Hardware

多數 CPU 都會有兩條 interrupt request line，分別管理 maskable / nonmaskable interrupt。
Interrupt 會以 address 的方式傳遞，這個 address 會是 interrupt vector 的 offset (e.g. 0~255)
Interrupt 的種類通常會大於 interrupt vector 的長度，因此就需要做 interrupt chaining，也就是每個 vector entry 存的只是 list head，後面還接著其他 interrupt
每個 interrupt routine 都會有自己的 interrupt priorty level，以讓 CPU 決定要先 handle 哪一個
有些系統會把 interrupt management 做分工：first-level interrupt handler (FLIH) / second-level interrupt handler (SLIH)
由 software 發起的 interrupt 又稱為 trap，注意 trap 的 priority 通常比硬體發起的 interrupt 還低

Direct Memory Access (DMA)

programmed I/O (PIO) - feed data into a controller register one byte at a time.

對於 large transfer 如 disk drive 等，我們需要避免使用 PIO 來佔住太多的 CPU time。DMA 便是解決方法之一。
在一個 DMA transfer 中，host 會將資訊寫進 DMA command block，如 pointer to the source、pointer to the destination、transfer 的大小，DMA controller 再根據 block 的內容執行 transfer
DMA transfer 不需要 CPU 的幫助

3. Application I/O Interface

Different Characteristics of I/O devices

Data-transfer mode : character-stream / block - 一次一個 character 還是一次一個 block (如：terminal / disk)
Access method : seqential / random access (如：modem / CD-ROM)
Transfer schedule : sychronous / asynchronous - 可以預測 response time / 不能預測 response time (如：tape / keyboard)
Sharing : dedicated / sharable - 同時只能給一個 process 使用 / 同時可以給多個 process 使用 (如：tape / keyboard)
Device speed : 每個 device 的 speed 都不同
I/O direction : read only / write only / read-write - (如：CD-ROM / graphics controller / disk)

ioctl() 是一個 back door 的 system call，可以直接向 device 下達命令。他有三個參數：

device identifier，指明哪一個 device
command，通常是一個 integer，指明 device implement 的哪一個指令
memory pointer，讓 application 跟 device 可以在這塊 memory 進行交流

Block and Character Devices

一個 block-device interface 應該支援 read() / write()，若是 random access 還需支援 seek()
有些 system (如 OS 自己、database-management system) 會需要 access a block device as a linear array of blocks，這種模式被稱為 raw I/O
某些 application 會自己做 buffering and locking，不需要 OS 的幫助，若這時 OS 插手，反而會拖慢速度。因此便有 direct I/O 模式，讓 application 可以不經由 OS 直接控制 device

Network Devices

大部分 OS 使用 socket 來進行 network I/O。為了防止 polling，socket interface 會實作 select() (當有連線來時再叫醒 server)

Clocks and Timers

一個 programmable interval timer 會在時間到的時候發起 interrupt 通知 OS，也可以設定讓 timer 週期性的發起 interrupts
應用的例子：
- Disk I/O subsystem 週期性的 flush dirty cache to disk
- Network subsystem 週期性的檢查沒有回應的 process，並把他們殺掉
現代電腦使用 network time protocol(NTP) 來確保時間準確

Nonblocking and Asynchronous I/O

Blocking vs. Nonblocking

當一個 thread 發出 blocking system call 時，他會被 suspend (i.e. move from run queue to wait queue)，當 system call return 時，才會再次回到 run queue
一個 nonblocking system call 會很快 return，並回傳在這段時間裡 how many bytes were transferred (不一定會全部完成)

Synchronous vs. Asynchronous

一個 synchronous system call 會等到相應的 I/O 做完後才 return，就這個觀點上來說，Blocking & Nonblocking I/O 都屬於 Synchronous I/O !
一個 asychronous system call 會發起一個 request 並馬上 return (不會花時間進行 I/O)，不會佔用 CPU time，等到 I/O 處理完後自己 interrupt

Vectored I/O

Vectored I/O 可以一次對多個 buffer 進行 I/O，最後再將每個 buffer 的內容結合起來，這種 scatter-gather 的方法可以有效增進效率

4. Kernel I/O Sybsystem

I/O Scheduling

想像一種情況，磁碟讀取頭目前在 disk 的開始位置，接連有三個 I/O request 發起，第一個讀取要磁碟尾，第二個要讀取磁碟頭，第三個要讀取磁碟中間。照著 2->3->1 的順序會比較有效率。

若我們照著 I/O request 的時間順序進行 I/O，常常會造成資源的浪費，因此 I/O scheduling (重新排 I/O 的順序) 是必要的
每個 device 會有自己的 waiting queue of request，並可能會被 I/O scheduler 重新排列
對於支援 asychronous I/O 的系統，同常會把這類的資訊存於 device-status table

Buffering

buffer 是存在於兩個 devices 之間或 device 與 application 之間的 memory area

需要做 buffering 的原因有三個

A. Speed mismatch

兩個 devices 之間的讀寫速度可能差到千倍那麼多 (如：SSD & network)，因此需要一個 buffer 在中間存放較慢那一方傳進來的東西，並在 buffer 滿的時候再讓快的那一方讀取
這樣會有一個問題，即使快的那一方讀取比較快速，但還是需要一點時間，所以在 buffer 滿時慢的那一方還需要等待 buffer 被讀取完畢才能繼續傳送，造成時間的浪費
Double Buffering - 使用兩個 buffer，讓其中一個 buffer 滿的時候，慢的那一方可以寫到另一個空的 buffer 裡

B. Different Data-transfer Size

在網路中，傳送資料的過程會把資料切成一個一個 packets，到目的地再把 packets 結合起來變成原本的資料
因為 packets 抵達的順序不一定，所以需要一個 buffer 存放收到的 packets，並照原本的順序排好

C. Copy Semantics

Copy semantics - copy 過去的東西不會跟 call copy 之後的改動有關

利用 kernel buffer 可以實作 copy semantics。在某一個 application 需要做 copy 時，kernel 先將資料複製到自己的 buffer，再把主動權還給 application，這樣就可以確保複製到的東西與 call copy 時是一致的

Caching

Cache 跟 buffer 最大的差異是存在 buffer 裡的東西有可能是唯一一份 copy，但 cache 裡的一定會有兩份 copy 以上。作為 cache 的 memory 也通常比較快速

Spooling and Device Reservation

Spooling 也是一種解決 devices 間快慢差異的方法。
例子：當現在電腦有很多個工作要交給 printer (速度較慢) 時，可以先把工作的資訊丟到 spool (一個 buffer) 裡，接下來交給 spooler program 執行即可 (不需要 CPU)

Error Handling

在 I/O 進行中可能會出錯，因此大部分 I/O system 在 I/O 結束時會以某些方式回傳執行的 status (可能是一個 bit、或是像 UNIX system 中的 errno)

I/O Protection

為了要防止 user 進行不合法的 I/O，所有 I/O operation 都應該透過 system call 來達成 (privileged instructions)

5. Transforming I/O Requests to Hardware Operations

當收到一個 I/O request 時，要如何知道該呼叫哪一個 device？

MS-DOS

MS-DOS 的做法是每一個檔案系統裡的路徑都由兩個部分組成，device name & file name，並且用冒號分隔開來，如：C:Windows\System32
這樣的好處是 device name space 跟 file-system name space 是完全分開的，實作上比較方便，不需要經過其他轉換

UNIX

Unix 的做法是把所有 device 都當成檔案來看，對 device 下達指令只要去讀寫相應的檔案就好，也就是只有 file-system name space 沒有 device name space
但要如何知道當前檔案的位置是在哪個 device 上呢？UNIX system 中的 mount table 會存有特定 prefix of path name 對應到的 device (如：/home mount on /dev/sda) 再進行檔案 I/O 前，會先在 mount table 中尋找 the longest matching prefix，便可知道對應的 device 為何
知道要找的 device 後，便可在 file-system structure 中找到一組數字 <major,minor>， major 代表對應的 device driver 號碼，minor 則會傳入 driver 中來 index device table，被 indexed 到的 entry 就會指向相應 device controller 的 port-address 或 memory-mapped address

Life Cycle of a Blocking `read()` Request

一個 process 對一個 file descriptor 發起 blocking read() system call
在 kernel 裡的 system call code 檢查呼叫是否合法 (parameter 是否正確等)，並檢查 cache 中是否有相應的內容，有的話就直接 return
如果沒有的話，process 便會從 run queue 被移到 wait queue 並發起 I/O request，這個 request 會由 I/O subsystem 發給 device driver
Device driver 會 allocate 好 kernel buffer (為了接收待會讀進來的資料)，並將 I/O 放進排程。當 device controller 空閒時，將對應的資訊寫進 device-control registers
Device controller 進行 data transfer
Driver 可能會 polling 的等待 controller 完成他的工作，或建立一個 DMA transfer 讓資料直接寫進 kernel buffer，兩種情況皆會在結束時 raise interrupt
CPU 接收到 interrupt 並從 interrupt-vector 中找到相應的 interrupt handler，handler 處理完該做的是並 return
CPU 接收到 interrupt 結束，通知 device driver，device driver 搞清楚是哪個 I/O 完成後告訴 kernel I/O subsystem
Kernel 將 data 移到目的地的 address space 並將 process 從 wait queue 移到 run queue
CPU schedule 到 process

6. Stream

Stream 是一種 full-duplex connection，連接 device driver 和 user-level process
Stream 由作為 user 介面的 stream head，控制 device 的 driver end，和中間的 streams modules 組成。每一個 compoments 都有兩個 queue：read queue & write queue
Stream modules 是可以動態變化的，若現在的 device 是 USB 就可以把相關的 module push 進來
因為每個 module 的速度不一，所以有些 queue 會支援 flow control，等下一個 queue 有空位時才寫，或等自己有空位時才讀
除了在 stream haed (與 user 接觸的部分是 blocking)，其餘部分的 stream 都是進行 asynchmous I/O
Stream 的好處是
1. 清楚的 framework
2. 模組化的組成，容易實作與管理

7. Performance

其實如果沒有太多時間花在 polling 的話，programmed I/O 的效率是高於 interrupt-driven I/O 的，i.e. interrupt handling is expensive task，其最根本的原因是 interrupt handling 的過程有很多 context switch 造成的 overhead
Network traffic 通常會包含很多次的 context switch (總共需經過兩次 kernel 與兩個 interrupt)，因此效率不佳
I/O subsystem 可藉由以下幾點增進效率
1. 減少 context switch 的次數
2. 減少 device 跟 apllication 間資訊交換的次數 (e.g. cache)
3. 減少 interrupt 的次數 (e.g. large transfer、smart controller、polling)
4. 利用 DMA 或 channel (一種專門做 I/O 的 CPU) 來減少佔用 CPU 的時間
5. 讓 CPU, memory subsystem, bus, and I/O 彼此達到平衡，因為任何一個 overload 都會使其他人必須等待
在實作新的 I/O functionalty 的時候，通常回如下圖所示，從最頂層 (application) 開始實作，因為 application 實作簡單且不易使系統 crash，適合實驗。但因為 application 層的 efficicy 不好，一但實驗成功便會逐漸往下 implement