Not all memory allocated in the virtual memory space is the same. We can classify it through two axis: the first axis is whether memory is private (specific to that process) or shared, the second axis is whether the memory is file-backed or not (in which case it is said the be anonymous). This creates a classification with 4 memory classes:
Private memory is, as its name says, memory that is specific to the process. Most of the memory you deal with in a program is actually private memory.
Since changes made in private memory are not visible to other processes, it is subject to copy-on-write. As a side-effect, this means that even if the memory is private, several processes might share the same physical memory to store the data. In particular, this is the case for binary files and shared libraries. A common misbelief is that KDE takes a lot of RAM because every single process loads Qt and the KDElibs, however, thanks to the COW mechanism, all the processes will use the exact same physical memory for the read-only parts of those libs.
In case of file-backed private memory, the changes made by the process are not written back to the underlying file, however changes made to the file may or may not be made available to the process.
Shared memory is something designed for inter-process communication. It can only be created by explicitly requesting it using the right mmap()
call or a dedicated call (shm*
). When a process writes in a shared memory, the modification is seen by all the processes that map the same memory.
In case the memory is file-backed, any process mapping the file will see the changes in the file since those changes are propagated through the file itself.
Anonymous memory is purely in RAM. However, the kernel will not actually map that memory to a physical address before it gets actually written. As a consequence, anonymous memory does not add any pressure on the kernel before it is actually used. This allows a process to “reserve” a lot of memory in its virtual memory address space without really using RAM. As a consequence, the kernel lets you reserve more memory than actually available. This behavior is often referenced as over-commit (or memory overcommitment).
When a memory map is file-backed, the data is loaded from the disk. Most of the time, it is loaded on demand, however, you can give hints to the kernel so that it can prefetch memory ahead of read. This helps keeping your program snappy when you know you have a particular pattern of accesses (mostly sequential accesses). In order to avoid using too much RAM, you can also tell the kernel that you don’t care to have the pages in RAM anymore without unmapping the memory. All this is done using the madvise()
system call.
When the system falls short of physical memory, the kernel will try to move some data from RAM to the disk. If the memory is file-backed and shared, this is quite easy. Since the file is the source of the data, it is just removed from RAM, then the next time it will be read, it will be loaded from the file.
The kernel can also choose to remove anonymous/private memory from RAM. In which case that data is written in a specific place on disk. It’s said to be swapped out. On Linux, the swap is usually stored in a specific partition, on other systems this can be specific files. Then, it just works the same way it works for file-backed memory: when it gets accessed, it is read from the disk and reloaded in RAM.