Assembler Annotations

Copyright (c) 2017-2019 Jiri Slaby

This document describes the new macros for annotation of data and code inassembly. In particular, it contains information aboutSYM_FUNC_START,SYM_FUNC_END,SYM_CODE_START, and similar.

Rationale

Some code like entries, trampolines, or boot code needs to be written inassembly. The same as in C, such code is grouped into functions andaccompanied with data. Standard assemblers do not force users into preciselymarking these pieces as code, data, or even specifying their length.Nevertheless, assemblers provide developers with such annotations to aiddebuggers throughout assembly. On top of that, developers also want to marksome functions asglobal in order to be visible outside of their translationunits.

Over time, the Linux kernel has adopted macros from various projects (likebinutils) to facilitate such annotations. So for historic reasons,developers have been usingENTRY,END,ENDPROC, and otherannotations in assembly. Due to the lack of their documentation, the macrosare used in rather wrong contexts at some locations. Clearly,ENTRY wasintended to denote the beginning of global symbols (be it data or code).END used to mark the end of data or end of special functions withnon-standard calling convention. In contrast,ENDPROC should annotateonly ends ofstandard functions.

When these macros are used correctly, they help assemblers generate a niceobject with both sizes and types set correctly. For example, the result ofarch/x86/lib/putuser.S:

Num:    Value          Size Type    Bind   Vis      Ndx Name 25: 0000000000000000    33 FUNC    GLOBAL DEFAULT    1 __put_user_1 29: 0000000000000030    37 FUNC    GLOBAL DEFAULT    1 __put_user_2 32: 0000000000000060    36 FUNC    GLOBAL DEFAULT    1 __put_user_4 35: 0000000000000090    37 FUNC    GLOBAL DEFAULT    1 __put_user_8

This is not only important for debugging purposes. When there are properlyannotated objects like this, tools can be run on them to generate more usefulinformation. In particular, on properly annotated objects,objtool can berun to check and fix the object if needed. Currently,objtool can reportmissing frame pointer setup/destruction in functions. It can alsoautomatically generate annotations forORC unwinderfor most code. Both of these are especially important to support reliablestack traces which are in turn necessary forKernel live patching.

Caveat and Discussion

As one might realize, there were only three macros previously. That is indeedinsufficient to cover all the combinations of cases:

  • standard/non-standard function
  • code/data
  • global/local symbol

There was adiscussion and instead of extending the currentENTRY/END*macros, it was decided that brand new macros should be introduced instead:

So how about using macro names that actually show the purpose, insteadof importing all the crappy, historic, essentially randomly chosendebug symbol macro names from the binutils and older kernels?

Macros Description

The new macros are prefixed with theSYM_ prefix and can be divided intothree main groups:

  1. SYM_FUNC_* – to annotate C-like functions. This means functions withstandard C calling conventions. For example, on x86, this means that thestack contains a return address at the predefined place and a return fromthe function can happen in a standard way. When frame pointers are enabled,save/restore of frame pointer shall happen at the start/end of a function,respectively, too.

    Checking tools likeobjtool should ensure such marked functions conformto these rules. The tools can also easily annotate these functions withdebugging information (likeORC data) automatically.

  2. SYM_CODE_* – special functions called with special stack. Be itinterrupt handlers with special stack content, trampolines, or startupfunctions.

    Checking tools mostly ignore checking of these functions. But some debuginformation still can be generated automatically. For correct debug data,this code needs hints likeUNWIND_HINT_REGS provided by developers.

  3. SYM_DATA* – obviously data belonging to.data sections and not to.text. Data do not contain instructions, so they have to be treatedspecially by the tools: they should not treat the bytes as instructions,nor assign any debug information to them.

Instruction Macros

This section coversSYM_FUNC_* andSYM_CODE_* enumerated above.

  • SYM_FUNC_START andSYM_FUNC_START_LOCAL are supposed to bethemost frequent markings. They are used for functions with standard callingconventions – global and local. Like in C, they both align the functions toarchitecture specific__ALIGN bytes. There are also_NOALIGN variantsfor special cases where developers do not want this implicit alignment.

    SYM_FUNC_START_WEAK andSYM_FUNC_START_WEAK_NOALIGN markings arealso offered as an assembler counterpart to theweak attribute known fromC.

    All of theseshall be coupled withSYM_FUNC_END. First, it marksthe sequence of instructions as a function and computes its size to thegenerated object file. Second, it also eases checking and processing suchobject files as the tools can trivially find exact function boundaries.

    So in most cases, developers should write something like in the followingexample, having some asm instructions in between the macros, of course:

    SYM_FUNC_START(memset)    ... asm insns ...SYM_FUNC_END(memset)

    In fact, this kind of annotation corresponds to the now deprecatedENTRYandENDPROC macros.

  • SYM_FUNC_START_ALIAS andSYM_FUNC_START_LOCAL_ALIAS serve for thosewho decided to have two or more names for one function. The typical use is:

    SYM_FUNC_START_ALIAS(__memset)SYM_FUNC_START(memset)    ... asm insns ...SYM_FUNC_END(memset)SYM_FUNC_END_ALIAS(__memset)

    In this example, one can call__memset ormemset with the sameresult, except the debug information for the instructions is generated tothe object file only once – for the non-ALIAS case.

  • SYM_CODE_START andSYM_CODE_START_LOCAL should be used only inspecial cases – if you know what you are doing. This is used exclusivelyfor interrupt handlers and similar where the calling convention is not the Cone._NOALIGN variants exist too. The use is the same as for theFUNCcategory above:

    SYM_CODE_START_LOCAL(bad_put_user)    ... asm insns ...SYM_CODE_END(bad_put_user)

    Again, everySYM_CODE_START*shall be coupled bySYM_CODE_END.

    To some extent, this category corresponds to deprecatedENTRY andEND. ExceptEND had several other meanings too.

  • SYM_INNER_LABEL* is used to denote a label inside someSYM_{CODE,FUNC}_START andSYM_{CODE,FUNC}_END. They are very similarto C labels, except they can be made global. An example of use:

    SYM_CODE_START(ftrace_caller)    /* save_mcount_regs fills in first two parameters */    ...SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL)    /* Load the ftrace_ops into the 3rd parameter */    ...SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)    call ftrace_stub    ...    retqSYM_CODE_END(ftrace_caller)

Data Macros

Similar to instructions, there is a couple of macros to describe data in theassembly.

  • SYM_DATA_START andSYM_DATA_START_LOCAL mark the start of some dataand shall be used in conjunction with eitherSYM_DATA_END, orSYM_DATA_END_LABEL. The latter adds also a label to the end, so thatpeople can uselstack and (local)lstack_end in the followingexample:

    SYM_DATA_START_LOCAL(lstack)    .skip 4096SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end)
  • SYM_DATA andSYM_DATA_LOCAL are variants for simple, mostly one-linedata:

    SYM_DATA(HEAP,     .long rm_heap)SYM_DATA(heap_end, .long rm_stack)

    In the end, they expand toSYM_DATA_START withSYM_DATA_ENDinternally.

Support Macros

All the above reduce themselves to some invocation ofSYM_START,SYM_END, orSYM_ENTRY at last. Normally, developers should avoid usingthese.

Further, in the above examples, one could seeSYM_L_LOCAL. There are alsoSYM_L_GLOBAL andSYM_L_WEAK. All are intended to denote linkage of asymbol marked by them. They are used either in_LABEL variants of theearlier macros, or inSYM_START.

Overriding Macros

Architecture can also override any of the macros in their ownasm/linkage.h, including macros specifying the type of a symbol(SYM_T_FUNC,SYM_T_OBJECT, andSYM_T_NONE). As every macrodescribed in this file is surrounded by#ifdef +#endif, it is enoughto define the macros differently in the aforementioned architecture-dependentheader.