Kernel mode NEON¶
TL;DR summary¶
- Use only NEON instructions, or VFP instructions that don’t rely on supportcode
- Isolate your NEON code in a separate compilation unit, and compile it with‘-march=armv7-a -mfpu=neon -mfloat-abi=softfp’
- Put kernel_neon_begin() and kernel_neon_end() calls around the calls into yourNEON code
- Don’t sleep in your NEON code, and be aware that it will be executed withpreemption disabled
Introduction¶
It is possible to use NEON instructions (and in some cases, VFP instructions) incode that runs in kernel mode. However, for performance reasons, the NEON/VFPregister file is not preserved and restored at every context switch or takenexception like the normal register file is, so some manual intervention isrequired. Furthermore, special care is required for code that may sleep [i.e.,may call schedule()], as NEON or VFP instructions will be executed in anon-preemptible section for reasons outlined below.
Lazy preserve and restore¶
The NEON/VFP register file is managed using lazy preserve (on UP systems) andlazy restore (on both SMP and UP systems). This means that the register file iskept ‘live’, and is only preserved and restored when multiple tasks arecontending for the NEON/VFP unit (or, in the SMP case, when a task migrates toanother core). Lazy restore is implemented by disabling the NEON/VFP unit afterevery context switch, resulting in a trap when subsequently a NEON/VFPinstruction is issued, allowing the kernel to step in and perform the restore ifnecessary.
Any use of the NEON/VFP unit in kernel mode should not interfere with this, soit is required to do an ‘eager’ preserve of the NEON/VFP register file, andenable the NEON/VFP unit explicitly so no exceptions are generated on firstsubsequent use. This is handled by the function kernel_neon_begin(), whichshould be called before any kernel mode NEON or VFP instructions are issued.Likewise, the NEON/VFP unit should be disabled again after use to make sure usermode will hit the lazy restore trap upon next use. This is handled by thefunction kernel_neon_end().
Interruptions in kernel mode¶
For reasons of performance and simplicity, it was decided that there shall be nopreserve/restore mechanism for the kernel mode NEON/VFP register contents. Thisimplies that interruptions of a kernel mode NEON section can only be allowed ifthey are guaranteed not to touch the NEON/VFP registers. For this reason, thefollowing rules and restrictions apply in the kernel:* NEON/VFP code is not allowed in interrupt context;* NEON/VFP code is not allowed to sleep;* NEON/VFP code is executed with preemption disabled.
If latency is a concern, it is possible to put back to back calls tokernel_neon_end() and kernel_neon_begin() in places in your code where none ofthe NEON registers are live. (Additional calls to kernel_neon_begin() should bereasonably cheap if no context switch occurred in the meantime)
VFP and support code¶
Earlier versions of VFP (prior to version 3) rely on software support for thingslike IEEE-754 compliant underflow handling etc. When the VFP unit needs suchsoftware assistance, it signals the kernel by raising an undefined instructionexception. The kernel responds by inspecting the VFP control registers and thecurrent instruction and arguments, and emulates the instruction in software.
Such software assistance is currently not implemented for VFP instructionsexecuted in kernel mode. If such a condition is encountered, the kernel willfail and generate an OOPS.
Separating NEON code from ordinary code¶
The compiler is not aware of the special significance of kernel_neon_begin() andkernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructionsbetween calls to these respective functions. Furthermore, GCC may generate NEONinstructions of its own at -O3 level if -mfpu=neon is selected, and even if thekernel is currently compiled at -O2, future changes may result in NEON/VFPinstructions appearing in unexpected places if no special care is taken.
Therefore, the recommended and only supported way of using NEON/VFP in thekernel is by adhering to the following rules:
- isolate the NEON code in a separate compilation unit and compile it with‘-march=armv7-a -mfpu=neon -mfloat-abi=softfp’;
- issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the callsinto the unit containing the NEON code from a compilation unit which isnotbuilt with the GCC flag ‘-mfpu=neon’ set.
As the kernel is compiled with ‘-msoft-float’, the above will guarantee thatboth NEON and VFP instructions will only ever appear in designated compilationunits at any optimization level.
NEON assembler¶
NEON assembler is supported with no additional caveats as long as the rulesabove are followed.
NEON code generated by GCC¶
The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicitparallelism, and generates NEON code from ordinary C source code. This is fullysupported as long as the rules above are followed.
NEON intrinsics¶
NEON intrinsics are also supported. However, as code using NEON intrinsicsrelies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you shouldobserve the following in addition to the rules above:
- Compile the unit containing the NEON intrinsics with ‘-ffreestanding’ so GCCuses its builtin version of <stdint.h> (this is a C99 header which the kerneldoes not supply);
- Include <arm_neon.h> last, or at least after <linux/types.h>