Linux Kernel: Memory management - Overview.

Written by Tom on Thursday 09/08/07

Memory management is the method by which an application running on a computer accesses memory through a combination of hardware and software manipulation. The job of the memory management subsystem is to allocate available memory to requesting processes and to deallocate the memory from the process as it releases it, keeping track of memory as it is handled.

The operating system lifespan can be split up into two phases: normal execution and bootstraping. The bootstraping phase makes temporary use of memory. Normal execution phase splits the memory between a portion that is permanently assigned to the kernel code and data, and a second portion that is assigned for dynamic memory request. Dynamic memory requests come about from process creation and growth.

Virtual memory: Is a method that has been adopted to support to support pro grammes which need to access more memory than is physically available on the system and to facilitate the efficient sharing of memory among multiple programs. Physical or core memory is made available by the RAM in the system. Virtual memory transparently makes use of disk space. Disk space is cheap and has more capacity for storage that physical memory, can be used as an extension of physical memory.

The basic unit of virtual memory is the page. The memory management swaps pages from the disk to the physical memory based on applications requests. Pages are held in partitions called page frames.

When a program fetches data from memory, it uses addresses to indicate the portion of memory it needs to access. These addresses, called virtual addresses, make up the process virtual address space. Each process has its ow range of virtual addresses that prevent if from reading or writing over another programs data. Virtual memory allows a process to use more memory than is physically available. Therefore the operating system can afford to give each process its own virtual linear address space.

The memory manager is a part of the operating system that keeps track of association between virtual addresses and physical addresses and handles paging. To the memory manager, the page is the basic unit of memory. The Memory Management Unit (MMU), which is a hardware agent, performs the actual translation. The kernel provides page tables, index lists of the available pages, and there associated addresses that the MMU can access when performing address translations. These are updated whenever a page is loaded into memory.

As the basic unit of memory managed by the memory manager, a page has a lot of state that it needs to keep track of. To do this the kernel uses page descriptors. Every physical page in memory is assigned a page descriptor.

A Memory Zone is composed of page frames or physical pages, which means that a page frame is allocated from a particular memory zone. The three zones that exist in Linux are:

ZONE_DMA (DMA page frames)

ZONE_NORMAL (non DMA page frames with virtual mapping)

ZONE_HIGHMEM (pages whose addresses are not contained in the virtual address space)
When page frames are allocated and deallocated, the system runs into a memory fragmentation problem called external fragmentation. There are various approaches to reduce external fragmentation, Linux uses an implementation of a memory management algorithm called the buddy system.

Buddy system maintain a list of available blocks in memory. Each list will points to blocks of memory of different sizes, but they are all sized in powers of two. The number of list depends on the implementation. Page frame are allocated from the list of free blocks of the smallest possible size. This maintains larger contiguous block sizes for the larger requests. When allocated blocks are returned, the buddy system searches the free lists for available blocks of memory that are the same size as the returning blocks. If any of these blocks are contiguous to the returned block, they are merged into a block twice the size of each individual. These blocks are called buddies hence the name buddy system. This method ensure that larger blocks sizes become available as soon as a page frames are freed.

To support the allocation of smaller memory requests made through calls to kmalloc() and the like, the kernel implements the slab allocator, which is a layer of the memory manager that acts on required pages.

The slab allocator seeks to reduce the cost incurred by allocating, initializing, destroying, and freeing memory areas by maintaining a ready cache of commonly used areas.

The slab allocator is made up of many caches, each of which stores memory areas of different sizes. Caches can be specialised or general purpose. Specialised caches store memory area that hold specific objects,such as descriptors. General caches are made of memory area of predetermined sizes from 32 -> 131,072 bytes.

You ca run cat /proc/slabinfo to list the existing slab allocators.
A cache is further subdivided into containers called slabs. Each slab is made up from one or more contiguous page frames from which the smaller memory areas are allocated. Every cache has a cache descriptor of type kmem_cache_s, which holds its information.

Upon creation a user space process is assigned a virtual address space, this can grow or shrink through the addition or removal of linear address intervals. The address intervals represent yet another unit of memory called a memory region or a memory area . Certain parts of the programs code are marked as read-only (text) while other are writable (variables) or executable (instructions). Within the kernel, a process address space,as well as all the information relating to it, is kept in a mm_struct descriptor.

Process Image Layout and Linear Address Space

Text: This section also known as the code segment, holds holds the executable instructions of the program.

Data: This section holds the initialized data. Initialized data includes statically allocated and global data that are initialized.

gvar: A global variable that is stored and initialized in this area. This region has read and write attributes but can not be shared among processes running the same program.

BSS: This section holds uninitialized data. This data consists of global variable that the system initializes with 0's upon program execution.

Heap: This is used to grow linear address space of a process. When a program uses malloc() to obtain dynamic memory this memory is placed on the heap.

Stack: This contains all the local variables that get initiated. When a function is called, the local variables are pushed onto the stack, when the function ends the variables are popped off the stack.

The memory map of a process may be accessed through the output of /proc/"pid"/maps.

When a process attempts to access memory that it does not have permissions for the system generats a page fault. The page fault is an exception handler that managers errors in a programs page accesses. Pages are fetched from storage when the hardware raises a page fault exception that the system traps. The kernel then allocates the missing page.

I do not claim any originality in this article.

Source: ‘The Linux Kernel Primer’: Claudia Salzberg Rodriguez, Gordon Fisher, Steven Smolski.