Overcommit Accounting

The Linux kernel supports the following overcommit handling modes

0
Heuristic overcommit handling. Obvious overcommits of addressspace are refused. Used for a typical system. It ensures aseriously wild allocation fails while allowing overcommit toreduce swap usage. root is allowed to allocate slightly morememory in this mode. This is the default.
1
Always overcommit. Appropriate for some scientificapplications. Classic example is code using sparse arrays andjust relying on the virtual memory consisting almost entirelyof zero pages.
2

Don’t overcommit. The total address space commit for thesystem is not permitted to exceed swap + a configurable amount(default is 50%) of physical RAM. Depending on the amount youuse, in most situations this means a process will not bekilled while accessing pages but will receive errors on memoryallocation as appropriate.

Useful for applications that want to guarantee their memoryallocations will be available in the future without having toinitialize every page.

The overcommit policy is set via the sysctlvm.overcommit_memory.

The overcommit amount can be set viavm.overcommit_ratio (percentage)orvm.overcommit_kbytes (absolute value).

The current overcommit limit and amount committed are viewable in/proc/meminfo as CommitLimit and Committed_AS respectively.

Gotchas

The C language stack growth does an implicit mremap. If you want absoluteguarantees and run close to the edge you MUST mmap your stack for thelargest size you think you will need. For typical stack usage this doesnot matter much but it’s a corner case if you really really care

In mode 2 the MAP_NORESERVE flag is ignored.

How It Works

The overcommit is based on the following rules

For a file backed map
SHARED or READ-only - 0 cost (the file is the map not swap)
PRIVATE WRITABLE - size of mapping per instance
For an anonymous or/dev/zero map
SHARED - size of mapping
PRIVATE READ-only - 0 cost (but of little use)
PRIVATE WRITABLE - size of mapping per instance
Additional accounting
Pages made writable copies by mmap
shmfs memory drawn from the same pool

Status

  • We account mmap memory mappings
  • We account mprotect changes in commit
  • We account mremap changes in size
  • We account brk
  • We account munmap
  • We report the commit status in /proc
  • Account and check on fork
  • Review stack handling/building on exec
  • SHMfs accounting
  • Implement actual limit enforcement

To Do

  • Account ptrace pages (this is hard)