Movatterモバイル変換

[0]ホーム

Jump to content

Memory barrier

Edit links

From Wikipedia, the free encyclopedia

Computer synchronizing instruction

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Memory barrier" – news ·newspapers ·books ·scholar ·JSTOR(January 2016) (Learn how and when to remove this message)

Incomputing, amemory barrier, also known as amembar,memory fence orfence instruction, is a type ofbarrier instruction that causes acentral processing unit (CPU) orcompiler to enforce anordering constraint onmemory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

Memory barriers are necessary because most modern CPUs employ performance optimizations that can result inout-of-order execution. This reordering of memory operations (loads and stores) normally goes unnoticed within a singlethread of execution, but can cause unpredictable behavior inconcurrent programs anddevice drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent and defined by the architecture'smemory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.

Memory barriers are typically used when implementing low-levelmachine code that operates on memory shared by multiple devices. Such code includessynchronization primitives andlock-free data structures onmultiprocessor systems, and device drivers that communicate withcomputer hardware.

Example

[edit]

When a program runs on a single-CPU machine, the hardware performs the necessary bookkeeping to ensure that the program executes as if all memory operations were performed in the order specified by the programmer (program order), so memory barriers are not necessary. However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, ormemory-mapped peripherals, out-of-order access may affect program behavior. For example, a second CPU may see memory changes made by the first CPU in a sequence that differs from program order.

A program is run via a process which can be multi-threaded (i.e. a software thread such aspthreads as opposed to a hardware thread). Different processes do not share a memory space so this discussion does not apply to two programs, each one running in a different process (hence a different memory space). It applies to two or more (software) threads running in a single process (i.e. a single memory space where multiple software threads share a single memory space). Multiple software threads, within a single process, may runconcurrently on amulti-core processor.

The following multi-threaded program, running on a multi-core processor gives an example of how such out-of-order execution can affect program behavior:

Initially, memory locationsx andf both hold the value0. The software thread running on processor #1 loops while the value off is zero, then it prints the value ofx. The software thread running on processor #2 stores the value42 intox and then stores the value1 intof. Pseudo-code for the two program fragments is shown below.

The steps of the program correspond to individual processor instructions.

Thread #1 Core #1:

while(f==0);// Memory fence required hereprintln(x);

Thread #2 Core #2:

x=42;// Memory fence required heref=1;

One might expect the print statement to always print the number "42"; however, if thread #2's store operations are executed out-of-order, it is possible forf to be updatedbeforex, and the print statement might therefore print "0". Similarly, thread #1's load operations may be executed out-of-order and it is possible forx to be readbeforef is checked, and again the print statement might therefore print an unexpected value. For most programs neither of these situations is acceptable. A memory barrier must be inserted before thread #2's assignment tof to ensure that the new value ofx is visible to other processors at or prior to the change in the value off. Another important point is a memory barrier must also be inserted before thread #1's access tox to ensure the value ofx is not read prior to seeing the change in the value off.

Another example is when a driver performs the following sequence:

//prepare data for a hardware module// Memory fence required here//trigger the hardware module to process the data

If the processor's store operations are executed out-of-order, the hardware module may be triggered before data is ready in memory.

For another illustrative example (a non-trivial one that arises in actual practice), seedouble-checked locking.

In the case of the PowerPC processor, theeieio instruction ensures, as memory fence, that any load or store operations previously initiated by the processor are fully completed with respect to the main memory before any subsequent load or store operations initiated by the processor access the main memory.^[1]^[2]

In the case of theARM architecture family, theDMB,^[3]DSB^[4] andISB^[5] instructions are used.^[6]

In the case of the RISC-V architecture, theFENCE instruction is used.

In the case of the x86 architecture, theMFENCE,LFENCE, andSFENCE instructions are used.

Multithreaded programming and memory visibility

[edit]

Out-of-order execution versus compiler reordering optimizations

[edit]

Memory barrier instructions address reordering effects only at the hardware level. Compilers may also reorder instructions as part of theprogram optimization process. Although the effects on parallel program behavior can be similar in both cases, in general, it is necessary to take separate measures to inhibit compiler reordering optimizations for data that may be shared by multiple threads of execution.

InC andC++, thevolatile keyword was intended to allow C and C++ programs to directly accessmemory-mapped I/O. Memory-mapped I/O generally requires that the reads and writes specified in source code happen in the exact order specified with no omissions. Omissions or reorderings of reads and writes by the compiler would break the communication between the program and the device accessed by memory-mapped I/O. A C or C++ compiler may not omit reads from and writes to volatile memory locations, nor may it reorder read/writes relative to other such actions for the same volatile location (variable). The keywordvolatiledoes not guarantee a memory barrier to enforce cache-consistency. Therefore, the use ofvolatile alone is not sufficient to use a variable for inter-thread communication on all systems and processors.^[7]

The C and C++ standards prior to C11 and C++11 do not address multiple threads (or multiple processors),^[8] and as such, the usefulness ofvolatile depends on the compiler and hardware. Althoughvolatile guarantees that the volatile reads and volatile writes will happen in the exact order specified in the source code, the compiler may generate code (or the CPU may re-order execution) such that a volatile read or write is reordered with regard to non-volatile reads or writes, thus limiting its usefulness as an inter-thread flag or mutex.

References

[edit]

^May, Cathy; Silha, Ed; Simpson, Eick; Warren, Hank (1993).The PowerPC Architecture: A Specification for a New Family of RISC Processors. Morgan Kaufmann Publishers. p. 350.ISBN 1-55860-316-6.
^Kacmarcik, Cary (1995).Optimizing PowerPC Code. Addison-Wesley Publishing Company. p. 188.ISBN 0-201-40839-2.
^"DMB".Developer.ARM.com. RetrievedJanuary 12, 2025.
^"DSB".Developer.ARM.com. RetrievedJanuary 12, 2025.
^"ISB".Developer.ARM.com. RetrievedJanuary 12, 2025.
^"DMB, DSB, and ISB".Developer.ARM.com. RetrievedJanuary 12, 2025.
^Corbet, Jonathan."Why the 'Volatile' Type Class Should not Be Used".Kernel.org. RetrievedApril 13, 2023.
^Boehm, Hans (June 2005).Threads Cannot Be Implemented As a Library.Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation.Association for Computing Machinery. p. 261.CiteSeerX 10.1.1.308.5939.doi:10.1145/1065010.1065042.ISBN 1595930566.

External links

[edit]

Memory Barriers: a Hardware View for Software Hackers
LINUX KERNEL MEMORY BARRIERS, kernel/git/torvalds/linux.git
Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler Barriers. September 2010. Oracle.
Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences. September 2010. Oracle.
User-space RCU: Memory-barrier menagerie. LWN.net.

Linux kernel

Organization

Kernel	Linux Foundation Linux Mark Institute Linus's law Tanenbaum–Torvalds debate Tux SCO disputes Linaro GNU GPL v2 menuconfig Supported computer architectures Version history Criticism
Support	Developers The Linux Programming Interface kernel.org LKML Linux conferences Users Linux User Group (LUG)
People	Werner Almesberger H. Peter Anvin Jens Axboe Moshe Bar Suparna Bhattacharya Andries Brouwer Rémy Card Alan Cox Matthew Garrett Avi Kivity Con Kolivas Greg Kroah-Hartman Benson Leung Robert Love David S. Miller Ingo Molnár Andrew Morton Hans Reiser Rusty Russell Shuah Khan Linus Torvalds Theodore Ts'o Stephen Tweedie Harald Welte Chris Wright

Technical

Debugging

Startup

ABIs

APIs

Kernel

System Call Interface	POSIX ioctl select open read close sync … Linux-only futex epoll splice dnotify inotify readahead …
In-kernel	ALSA Crypto API io_uring DRM kernfs Memory barrier New API RCU Video4Linux IIO

Userspace

Daemons, File systems	bpffs configfs devfs devpts debugfs FUSE hugetlbfs pipefs procfs securityfs sockfs sysfs tmpfs systemd udev Kmscon binfmt_misc
Wrapper libraries	C standard library glibc uClibc Bionic libhybris dietlibc EGLIBC klibc musl Newlib libcgroup libdrm libalsa libevdev libusb liburing