Linux I/O overview in the 2.6 kernel

Written by Tom on Tuesday 14/08/07


The Linux kernel is a collection of code that runs on one or more processors. The processor's interface to the rest of the system is through the supporting hardware. At it lowest machine dependent layer, the kernel communicates with these devices with simple assembly-language instructions.


The processor communicates
to surrounding devices through a series of electrical connections, agroup of these connections is called a bus.


buses: A typical device will be connected to the processor via an address bus and a data bus and control bus.


A bridge is a hardware device that connects two buses together. There are typically two major peripheral devices/controllers which are referred to as the Northbridge and the Southbridge.


The Northbridge connects the high-speed, high performance peripherals, such as the memory controller and the PCI controller. To achieve speed and good performance, the Northbridge bridges the front side bus with the PCI bus and/or the memory bus.


The Southbridge connects to the Northbrigdge and to a combination o low performance devices. Typically the IDE controller, the USB, the real-time clock and the interrupt controller.


Some architectures use a I/O bus to communicate with the keyboard, serial and parallel port. The I/O bus is a type of control bus and is slow. Other architectures use memory mapped I/O. With memory mapped I/O, devices are assigned regions of address space for communication and control.


Newer architectures put more discrete I/O devices into a single integrated circuits called Superio chips. Superio function is often consolidated into the Southbridge chip.


The newer Intel architecture has moved to the hub concept. The Northbridge is known as the Graphics and Memory Controller Hub. It supports high performance AGP and DDR memory controller. The Southbridge is known as the I/O controller.


Devices


Two kinds of device files exist: block device files (transfer data in chunks) and character device files (transfer data one character at a time ).There is a third device type called the network device this special device can transfer data in both block and character mode.


An example of a block device is a hard drive. The filesystem name for the first IDE disk is /dev/hda.


The device driver registers itself at driver initialization time. This adds the driver to the kernels driver table, mapping the device number to the block_device_operations structure. The block_device_operations structure contains the function for starting and stopping a given block device in the system. At initialization time the block device driver registers a request queue handler (block device manager) with the kernel to facilitate the read/write options for the read/write device. Initialization also determines the the I/O scheduling algorithm to use when a read or a write is attempted on the block device.

The I/O scheduling algorithm is determined by the kernel at boot time with the default being the anticipatory I/O scheduler. By getting the kernel parameter elevator to the following values, you can change the type of I/O scheduler.


deadline: For the deadline I/O scheduler.


noop: For the no-operation I/O scheduler.


as: For the anticipatory I/O scheduler.


A patch exists that makes the I/O schedulers fully modular. Using modprobe, the user can load the modules and switch between them on the fly. With this patch at least one scheduler must be compiled into the kernel to begin with.


Block devices use request queues to order the many block I/O requests the devices are given. Request need to be ordered because of the overhead in reading and writing. Request queues optimize read and write requests to increase throughput.


No-op I/O scheduler

Takes a request and scans through its queue to determine if it can be merged with an existing request. A merge will be attempted if a request is close to an existing request. If the new request is for I/O blocks before an existing request, it is merged on the front of the existing request. If the new request is for I/O blocks aftere an existing request, it is merged on the back of the existing request. If a request can not be merged then it is placed on the tail of the request queue.


Deadline I/O scheduler

The no-op scheduler suffers from one major problem; with enough close requests, new requests are never handled. The deadline scheduler attempts to solve this problem by assigning each request an expiration time and uses two additional queues to manage time efficiency and another queue similar to the no-op algorithm. Read requests are favored over write requests due to the fact that read requests are usually blocking and write are non-blocking. A read queue and a write queue are kept in addition to to the queue sorted by a requests sector proximity. In the read and write queue, requests are ordered by time (FIFO). When the deadline scheduler is given a request, it first check the read queue head, if the request has expired it is immediately handled, the write queue head is then checked and if the request has expired it is immediately handled. The standard queue is checked only when no read and write have expired and request are handled nearly the same way as the no-op algorithm. Read request also expire faster that write request .5 sec for read and 5 sec for a write. So read requests can starve write requests. There is a parameter that tell the scheduler the maximum number of times that reads can starve a write.


Anticipatory I/O scheduling

A problem with deadline scheduling is that writes can be preempted by reads. Anticipatory I/O scheduling attempt to anticipate what the next operation is and aims to improve I/O throughput in doing so. Anticipatory I/O has a read and write queue each ordered by time (FIFO) and a default queue that is ordered by sector proximity. After a read request the scheduler wait for 6 ms in anticipation of the of an additional read. If another read request is received it is immediately handled else it returns to normal operation.


Device Operations

The basic generic block device has open, close(release), ioctl and request functions. The ioctl() interface can be used for debugging and performance measurements by bypassing the various software layers. The request function is called when a request is put on the queue by the file system, it extracts the request structure and acts up its structure.


Character Device

The character device sends a stream of data. All serial devices are serial devices.


I/O rules

All Linux device I/O is based on files.
All Linux device I/O is either block or character.


Network Devices

Have attributes of a character and block device. At the physical level data is transmitted serially at the network layer packets of data is moved in and out via DMA.


Clock Device

Clocks are I/O devices that count the hardware heartbeat of the system.


Terminal Devices

The main console (configurable at boot time) is the first terminal to come up on a Linux system. Often, a graphical interface is launched, and terminal emulator windows are user thereafter.


DMA

The DMA controller is a hardware device that is situated between an I/O device and the high speed bus in the system. The purpose of the DMA controller is to move large amounts of data without processor intervention. Many controllers (disk, network and graphics) have DMA engine built-in and can therefore transfer large amounts of data without using precious processors cycles.


I do not claim any originality in this article.

Source: ‘The Linux Kernel Primer’: Claudia Salzberg Rodriguez, Gordon Fisher, Steven Smolski.