Histogram Design Notes

Author:

Tom Zanussi <zanussi@kernel.org>

This document attempts to provide a description of how the ftracehistograms work and how the individual pieces map to the datastructures used to implement them in trace_events_hist.c andtracing_map.c.

Note

All the ftrace histogram command examples assume the workingdirectory is the ftrace /tracing directory. For example:

# cd /sys/kernel/tracing

Also, the histogram output displayed for those commands will begenerally be truncated - only enough to make the point is displayed.

‘hist_debug’ trace event files

If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, anevent file named ‘hist_debug’ will appear in each event’ssubdirectory. This file can be read at any time and will display someof the hist trigger internals described in this document. Specificexamples and output will be described in test cases below.

Basic histograms

First, basic histograms. Below is pretty much the simplest thing youcan do with histograms - create one with a single key on a singleevent and cat the output:

# echo 'hist:keys=pid' >> events/sched/sched_waking/trigger# cat events/sched/sched_waking/hist{ pid:      18249 } hitcount:          1{ pid:      13399 } hitcount:          1{ pid:      17973 } hitcount:          1{ pid:      12572 } hitcount:          1...{ pid:         10 } hitcount:        921{ pid:      18255 } hitcount:       1444{ pid:      25526 } hitcount:       2055{ pid:       5257 } hitcount:       2055{ pid:      27367 } hitcount:       2055{ pid:       1728 } hitcount:       2161Totals:  Hits: 21305  Entries: 183  Dropped: 0

What this does is create a histogram on the sched_waking event usingpid as a key and with a single value, hitcount, which even if notexplicitly specified, exists for every histogram regardless.

The hitcount value is a per-bucket value that’s automaticallyincremented on every hit for the given key, which in this case is thepid.

So in this histogram, there’s a separate bucket for each pid, and eachbucket contains a value for that bucket, counting the number of timessched_waking was called for that pid.

Each histogram is represented by a hist_data struct.

To keep track of each key and value field in the histogram, hist_datakeeps an array of these fields named fields[]. The fields[] array isan array containingstructhist_field representations of eachhistogram val and key in the histogram (variables are also includedhere, but are discussed later). So for the above histogram we have onekey and one value; in this case the one value is the hitcount value,which all histograms have, regardless of whether they define thatvalue or not, which the above histogram does not.

Eachstructhist_field contains a pointer to the ftrace_event_fieldfrom the event’s trace_event_file along with various bits related tothat such as the size, offset, type, and a hist_field_fn_t function,which is used to grab the field’s data from the ftrace event buffer(in most cases - some hist_fields such as hitcount don’t directly mapto an event field in the trace buffer - in these cases the functionimplementation gets its value from somewhere else). The flags fieldindicates which type of field it is - key, value, variable, variablereference, etc., with value being the default.

The other important hist_data data structure in addition to thefields[] array is the tracing_map instance created for the histogram,which is held in the .map member. The tracing_map implements thelock-free hash table used to implement histograms (seekernel/trace/tracing_map.h for much more discussion about thelow-level data structures implementing the tracing_map). For thepurposes of this discussion, the tracing_map contains a number ofbuckets, each bucket corresponding to a particular tracing_map_eltobject hashed by a given histogram key.

Below is a diagram the first part of which describes the hist_data andassociated key and value fields for the histogram described above. Asyou can see, there are two fields in the fields array, one val fieldfor the hitcount and one key field for the pid key.

Below that is a diagram of a run-time snapshot of what the tracing_mapmight look like for a given run. It attempts to show therelationships between the hist_data fields and the tracing_mapelements for a couple hypothetical keys and values.:

+------------------+| hist_data        |+------------------+     +----------------+  | .fields[]      |---->| val = hitcount |----------------------------+  +----------------+     +----------------+                            |  | .map           |       | .size        |                            |  +----------------+       +--------------+                            |                           | .offset      |                            |                           +--------------+                            |                           | .fn()        |                            |                           +--------------+                            |                                 .                                     |                                 .                                     |                                 .                                     |                         +----------------+ <--- n_vals                |                         | key = pid      |----------------------------|--+                         +----------------+                            |  |                           | .size        |                            |  |                           +--------------+                            |  |                           | .offset      |                            |  |                           +--------------+                            |  |                           | .fn()        |                            |  |                         +----------------+ <--- n_fields              |  |                         | unused         |                            |  |                         +----------------+                            |  |                           |              |                            |  |                           +--------------+                            |  |                           |              |                            |  |                           +--------------+                            |  |                           |              |                            |  |                           +--------------+                            |  |                                          n_keys = n_fields - n_vals   |  |

The hist_data n_vals and n_fields delineate the extent of the fields[]array and separate keys from values for the rest of the code.

Below is a run-time representation of the tracing_map part of thehistogram, with pointers from various parts of the fields[] arrayto corresponding parts of the tracing_map.

The tracing_map consists of an array of tracing_map_entrys and a setof preallocated tracing_map_elts (abbreviated below as map_entry andmap_elt). The total number of map_entrys in the hist_data.map array =map->max_elts (actually map->map_size but only max_elts of those areused. This is a property required by themap_insert() algorithm).

If a map_entry is unused, meaning no key has yet hashed into it, its.key value is 0 and its .val pointer is NULL. Once a map_entry hasbeen claimed, the .key value contains the key’s hash value and the.val member points to a map_elt containing the full key and an entryfor each key or value in the map_elt.fields[] array. There is anentry in the map_elt.fields[] array corresponding to each hist_fieldin the histogram, and this is where the continually aggregated sumscorresponding to each histogram value are kept.

The diagram attempts to show the relationship between thehist_data.fields[] and the map_elt.fields[] with the links drawnbetween diagrams:

+-----------+                                                          |  || hist_data |                                                          |  |+-----------+                                                          |  |  | .fields |                                                          |  |  +---------+     +-----------+                                        |  |  | .map    |---->| map_entry |                                        |  |  +---------+     +-----------+                                        |  |                    | .key    |---> 0                                  |  |                    +---------+                                        |  |                    | .val    |---> NULL                               |  |                  +-----------+                                        |  |                  | map_entry |                                        |  |                  +-----------+                                        |  |                    | .key    |---> pid = 999                          |  |                    +---------+    +-----------+                       |  |                    | .val    |--->| map_elt   |                       |  |                    +---------+    +-----------+                       |  |                         .           | .key    |---> full key *        |  |                         .           +---------+    +---------------+  |  |                         .           | .fields |--->| .sum (val)    |<-+  |                  +-----------+      +---------+    | 2345          |  |  |                  | map_entry |                     +---------------+  |  |                  +-----------+                     | .offset (key) |<----+                    | .key    |---> 0               | 0             |  |  |                    +---------+                     +---------------+  |  |                    | .val    |---> NULL                    .          |  |                  +-----------+                             .          |  |                  | map_entry |                             .          |  |                  +-----------+                     +---------------+  |  |                    | .key    |                     | .sum (val) or |  |  |                    +---------+    +---------+      | .offset (key) |  |  |                    | .val    |--->| map_elt |      +---------------+  |  |                  +-----------+    +---------+      | .sum (val) or |  |  |                  | map_entry |                     | .offset (key) |  |  |                  +-----------+                     +---------------+  |  |                    | .key    |---> pid = 4444                         |  |                    +---------+    +-----------+                       |  |                    | .val    |    | map_elt   |                       |  |                    +---------+    +-----------+                       |  |                                     | .key    |---> full key *        |  |                                     +---------+    +---------------+  |  |                                     | .fields |--->| .sum (val)    |<-+  |                                     +---------+    | 65523         |     |                                                    +---------------+     |                                                    | .offset (key) |<----+                                                    | 0             |                                                    +---------------+                                                            .                                                            .                                                            .                                                    +---------------+                                                    | .sum (val) or |                                                    | .offset (key) |                                                    +---------------+                                                    | .sum (val) or |                                                    | .offset (key) |                                                    +---------------+

Abbreviations used in the diagrams:

hist_data = struct hist_trigger_datahist_data.fields = struct hist_fieldfn = hist_field_fn_tmap_entry = struct tracing_map_entrymap_elt = struct tracing_map_eltmap_elt.fields = struct tracing_map_field

Whenever a new event occurs and it has a hist trigger associated withit,event_hist_trigger() is called.event_hist_trigger() first dealswith the key: for each subkey in the key (in the above example, thereis just one subkey corresponding to pid), the hist_field thatrepresents that subkey is retrieved from hist_data.fields[] and thehist_field_fn_tfn() associated with that field, along with thefield’s size and offset, is used to grab that subkey’s data from thecurrent trace record.

Once the complete key has been retrieved, it’s used to look that keyup in the tracing_map. If there’s no tracing_map_elt associated withthat key, an empty one is claimed and inserted in the map for the newkey. In either case, the tracing_map_elt associated with that key isreturned.

Once a tracing_map_elt available,hist_trigger_elt_update() is called.As the name implies, this updates the element, which basically meansupdating the element’s fields. There’s a tracing_map_field associatedwith each key and value in the histogram, and each of these correspondto the key and value hist_fields created when the histogram wascreated.hist_trigger_elt_update() goes through each value hist_fieldand, as for the keys, uses the hist_field’sfn() and size and offsetto grab the field’s value from the current trace record. Once it hasthat value, it simply adds that value to that field’scontinually-updated tracing_map_field.sum member. Some hist_fieldfn()s, such as for the hitcount, don’t actually grab anything from thetrace record (the hitcountfn() just increments the counter sum by 1),but the idea is the same.

Once all the values have been updated,hist_trigger_elt_update() isdone and returns. Note that there are also tracing_map_fields foreach subkey in the key, buthist_trigger_elt_update() doesn’t look atthem or update anything - those exist only for sorting, which canhappen later.

Basic histogram test

This is a good example to try. It produces 3 value fields and 2 keyfields in the output:

# echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger

To see the debug data, cat the kmem/kmalloc’s ‘hist_debug’ file. Itwill show the trigger info of the histogram it corresponds to, alongwith the address of the hist_data associated with the histogram, whichwill become useful in later examples. It then displays the number oftotal hist_fields associated with the histogram along with a count ofhow many of those correspond to keys and how many correspond to values.

It then goes on to display details for each field, including thefield’s flags and the position of each field in the hist_data’sfields[] array, which is useful information for verifying that thingsinternally appear correct or not, and which again will become evenmore useful in further examples:

# cat events/kmem/kmalloc/hist_debug# event histogram## trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active]#hist_data: 000000005e48c9a5n_vals: 3n_keys: 2n_fields: 5val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      VAL: normal u64 value    ftrace_event_field name: bytes_req    type: size_t    size: 8    is_signed: 0  hist_data->fields[2]:    flags:      VAL: normal u64 value    ftrace_event_field name: bytes_alloc    type: size_t    size: 8    is_signed: 0key fields:  hist_data->fields[3]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: common_pid    type: int    size: 8    is_signed: 1  hist_data->fields[4]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: call_site    type: unsigned long    size: 8    is_signed: 0

The commands below can be used to clean things up for the next test:

# echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger

Variables

Variables allow data from one hist trigger to be saved by one histtrigger and retrieved by another hist trigger. For example, a triggeron the sched_waking event can capture a timestamp for a particularpid, and later a sched_switch event that switches to that pid eventcan grab the timestamp and use it to calculate a time delta betweenthe two events:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>        events/sched/sched_waking/trigger# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >>        events/sched/sched_switch/trigger

In terms of the histogram data structures, variables are implementedas another type of hist_field and for a given hist trigger are addedto the hist_data.fields[] array just after all the val fields. Todistinguish them from the existing key and val fields, they’re given anew flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they alsomake use of a new .var.idx field member instructhist_field, whichmaps them to an index in a new map_elt.vars[] array added to themap_elt specifically designed to store and retrieve variable values.The diagram below shows those new elements and adds a new variableentry, ts0, corresponding to the ts0 variable in the sched_wakingtrigger above.

sched_waking histogram

+------------------+| hist_data        |<-------------------------------------------------------++------------------+   +-------------------+                                |  | .fields[]      |-->| val = hitcount    |                                |  +----------------+   +-------------------+                                |  | .map           |     | .size           |                                |  +----------------+     +-----------------+                                |                         | .offset         |                                |                         +-----------------+                                |                         | .fn()           |                                |                         +-----------------+                                |                         | .flags          |                                |                         +-----------------+                                |                         | .var.idx        |                                |                       +-------------------+                                |                       | var = ts0         |                                |                       +-------------------+                                |                         | .size           |                                |                         +-----------------+                                |                         | .offset         |                                |                         +-----------------+                                |                         | .fn()           |                                |                         +-----------------+                                |                         | .flags & FL_VAR |                                |                         +-----------------+                                |                         | .var.idx        |----------------------------+-+ |                         +-----------------+                            | | |                                  .                                     | | |                                  .                                     | | |                                  .                                     | | |                       +-------------------+ <--- n_vals                | | |                       | key = pid         |                            | | |                       +-------------------+                            | | |                         | .size           |                            | | |                         +-----------------+                            | | |                         | .offset         |                            | | |                         +-----------------+                            | | |                         | .fn()           |                            | | |                         +-----------------+                            | | |                         | .flags & FL_KEY |                            | | |                         +-----------------+                            | | |                         | .var.idx        |                            | | |                       +-------------------+ <--- n_fields              | | |                       | unused            |                            | | |                       +-------------------+                            | | |                         |                 |                            | | |                         +-----------------+                            | | |                         |                 |                            | | |                         +-----------------+                            | | |                         |                 |                            | | |                         +-----------------+                            | | |                         |                 |                            | | |                         +-----------------+                            | | |                         |                 |                            | | |                         +-----------------+                            | | |                                           n_keys = n_fields - n_vals   | | |                                                                        | | |

This is very similar to the basic case. In the above diagram, we cansee a new .flags member has been added to thestructhist_fieldstruct, and a new entry added to hist_data.fields representing the ts0variable. For a normal val hist_field, .flags is just 0 (modulomodifier flags), but if the value is defined as a variable, the .flagscontains a set FL_VAR bit.

As you can see, the ts0 entry’s .var.idx member contains the indexinto the tracing_map_elts’ .vars[] array containing variable values.This idx is used whenever the value of the variable is set or read.The map_elt.vars idx assigned to the given variable is assigned andsaved in .var.idx bycreate_tracing_map_fields() after it callstracing_map_add_var().

Below is a representation of the histogram at run-time, whichpopulates the map, along with correspondence to the above hist_data andhist_field data structures.

The diagram attempts to show the relationship between thehist_data.fields[] and the map_elt.fields[] and map_elt.vars[] withthe links drawn between diagrams. For each of the map_elts, you cansee that the .fields[] members point to the .sum or .offset of a keyor val and the .vars[] members point to the value of a variable. Thearrows between the two diagrams show the linkages between thosetracing_map members and the field definitions in the correspondinghist_data fields[] members.:

+-----------+                                                           | | || hist_data |                                                           | | |+-----------+                                                           | | |  | .fields |                                                           | | |  +---------+     +-----------+                                         | | |  | .map    |---->| map_entry |                                         | | |  +---------+     +-----------+                                         | | |                    | .key    |---> 0                                   | | |                    +---------+                                         | | |                    | .val    |---> NULL                                | | |                  +-----------+                                         | | |                  | map_entry |                                         | | |                  +-----------+                                         | | |                    | .key    |---> pid = 999                           | | |                    +---------+    +-----------+                        | | |                    | .val    |--->| map_elt   |                        | | |                    +---------+    +-----------+                        | | |                         .           | .key    |---> full key *         | | |                         .           +---------+    +---------------+   | | |                         .           | .fields |--->| .sum (val)    |   | | |                         .           +---------+    | 2345          |   | | |                         .        +--| .vars   |    +---------------+   | | |                         .        |  +---------+    | .offset (key) |   | | |                         .        |                 | 0             |   | | |                         .        |                 +---------------+   | | |                         .        |                         .           | | |                         .        |                         .           | | |                         .        |                         .           | | |                         .        |                 +---------------+   | | |                         .        |                 | .sum (val) or |   | | |                         .        |                 | .offset (key) |   | | |                         .        |                 +---------------+   | | |                         .        |                 | .sum (val) or |   | | |                         .        |                 | .offset (key) |   | | |                         .        |                 +---------------+   | | |                         .        |                                     | | |                         .        +---------------->+---------------+   | | |                         .                          | ts0           |<--+ | |                         .                          | 113345679876  |   | | |                         .                          +---------------+   | | |                         .                          | unused        |   | | |                         .                          |               |   | | |                         .                          +---------------+   | | |                         .                                  .           | | |                         .                                  .           | | |                         .                                  .           | | |                         .                          +---------------+   | | |                         .                          | unused        |   | | |                         .                          |               |   | | |                         .                          +---------------+   | | |                         .                          | unused        |   | | |                         .                          |               |   | | |                         .                          +---------------+   | | |                         .                                              | | |                  +-----------+                                         | | |                  | map_entry |                                         | | |                  +-----------+                                         | | |                    | .key    |---> pid = 4444                          | | |                    +---------+    +-----------+                        | | |                    | .val    |--->| map_elt   |                        | | |                    +---------+    +-----------+                        | | |                         .           | .key    |---> full key *         | | |                         .           +---------+    +---------------+   | | |                         .           | .fields |--->| .sum (val)    |   | | |                                     +---------+    | 2345          |   | | |                                  +--| .vars   |    +---------------+   | | |                                  |  +---------+    | .offset (key) |   | | |                                  |                 | 0             |   | | |                                  |                 +---------------+   | | |                                  |                         .           | | |                                  |                         .           | | |                                  |                         .           | | |                                  |                 +---------------+   | | |                                  |                 | .sum (val) or |   | | |                                  |                 | .offset (key) |   | | |                                  |                 +---------------+   | | |                                  |                 | .sum (val) or |   | | |                                  |                 | .offset (key) |   | | |                                  |                 +---------------+   | | |                                  |                                     | | |                                  |                 +---------------+   | | |                                  +---------------->| ts0           |<--+ | |                                                    | 213499240729  |     | |                                                    +---------------+     | |                                                    | unused        |     | |                                                    |               |     | |                                                    +---------------+     | |                                                            .             | |                                                            .             | |                                                            .             | |                                                    +---------------+     | |                                                    | unused        |     | |                                                    |               |     | |                                                    +---------------+     | |                                                    | unused        |     | |                                                    |               |     | |                                                    +---------------+     | |

For each used map entry, there’s a map_elt pointing to an array of.vars containing the current value of the variables associated withthat histogram entry. So in the above, the timestamp associated withpid 999 is 113345679876, and the timestamp variable in the same.var.idx for pid 4444 is 213499240729.

sched_switch histogram

The sched_switch histogram paired with the above sched_wakinghistogram is shown below. The most important aspect of thesched_switch histogram is that it references a variable on thesched_waking histogram above.

The histogram diagram is very similar to the others so far displayed,but it adds variable references. You can see the normal hitcount andkey fields along with a new wakeup_lat variable implemented in thesame way as the sched_waking ts0 variable, but in addition there’s anentry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag.

Associated with the new var ref field are a couple of new hist_fieldmembers, var.hist_data and var_ref_idx. For a variable reference, thevar.hist_data goes with the var.idx, which together uniquely identifya particular variable on a particular histogram. The var_ref_idx isjust the index into the var_ref_vals[] array that caches the values ofeach variable whenever a hist trigger is updated. Those resultingvalues are then finally accessed by other code such as trace actioncode that uses the var_ref_idx values to assign param values.

The diagram below describes the situation for the sched_switchhistogram referred to before:

 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >>         events/sched/sched_switch/trigger                                                                           | | +------------------+                                                      | | | hist_data        |                                                      | | +------------------+   +-----------------------+                          | |   | .fields[]      |-->| val = hitcount        |                          | |   +----------------+   +-----------------------+                          | |   | .map           |     | .size               |                          | |   +----------------+     +---------------------+                          | |+--| .var_refs[]    |     | .offset             |                          | ||  +----------------+     +---------------------+                          | ||                         | .fn()               |                          | ||   var_ref_vals[]        +---------------------+                          | ||  +-------------+        | .flags              |                          | ||  | $ts0        |<---+   +---------------------+                          | ||  +-------------+    |   | .var.idx            |                          | ||  |             |    |   +---------------------+                          | ||  +-------------+    |   | .var.hist_data      |                          | ||  |             |    |   +---------------------+                          | ||  +-------------+    |   | .var_ref_idx        |                          | ||  |             |    | +-----------------------+                          | ||  +-------------+    | | var = wakeup_lat      |                          | ||         .           | +-----------------------+                          | ||         .           |   | .size               |                          | ||         .           |   +---------------------+                          | ||  +-------------+    |   | .offset             |                          | ||  |             |    |   +---------------------+                          | ||  +-------------+    |   | .fn()               |                          | ||  |             |    |   +---------------------+                          | ||  +-------------+    |   | .flags & FL_VAR     |                          | ||                     |   +---------------------+                          | ||                     |   | .var.idx            |                          | ||                     |   +---------------------+                          | ||                     |   | .var.hist_data      |                          | ||                     |   +---------------------+                          | ||                     |   | .var_ref_idx        |                          | ||                     |   +---------------------+                          | ||                     |             .                                      | ||                     |             .                                      | ||                     |             .                                      | ||                     | +-----------------------+ <--- n_vals              | ||                     | | key = pid             |                          | ||                     | +-----------------------+                          | ||                     |   | .size               |                          | ||                     |   +---------------------+                          | ||                     |   | .offset             |                          | ||                     |   +---------------------+                          | ||                     |   | .fn()               |                          | ||                     |   +---------------------+                          | ||                     |   | .flags              |                          | ||                     |   +---------------------+                          | ||                     |   | .var.idx            |                          | ||                     | +-----------------------+ <--- n_fields            | ||                     | | unused                |                          | ||                     | +-----------------------+                          | ||                     |   |                     |                          | ||                     |   +---------------------+                          | ||                     |   |                     |                          | ||                     |   +---------------------+                          | ||                     |   |                     |                          | ||                     |   +---------------------+                          | ||                     |   |                     |                          | ||                     |   +---------------------+                          | ||                     |   |                     |                          | ||                     |   +---------------------+                          | ||                     |                         n_keys = n_fields - n_vals | ||                     |                                                    | ||                     |                                                    | ||                     | +-----------------------+                          | |+---------------------->| var_ref = $ts0        |                          | |                      | +-----------------------+                          | |                      |   | .size               |                          | |                      |   +---------------------+                          | |                      |   | .offset             |                          | |                      |   +---------------------+                          | |                      |   | .fn()               |                          | |                      |   +---------------------+                          | |                      |   | .flags & FL_VAR_REF |                          | |                      |   +---------------------+                          | |                      |   | .var.idx            |--------------------------+ |                      |   +---------------------+                            |                      |   | .var.hist_data      |----------------------------+                      |   +---------------------+                      +---| .var_ref_idx        |                          +---------------------+

Abbreviations used in the diagrams:

hist_data = struct hist_trigger_datahist_data.fields = struct hist_fieldfn = hist_field_fn_tFL_KEY = HIST_FIELD_FL_KEYFL_VAR = HIST_FIELD_FL_VARFL_VAR_REF = HIST_FIELD_FL_VAR_REF

When a hist trigger makes use of a variable, a new hist_field iscreated with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, thevar.idx and var.hist_data take the same values as the referencedvariable, as well as the referenced variable’s size, type, andis_signed values. The VAR_REF field’s .name is set to the name of thevariable it references. If a variable reference was created using theexplicit system.event.$var_ref notation, the hist_field’s system andevent_name variables are also set.

So, in order to handle an event for the sched_switch histogram,because we have a reference to a variable on another histogram, weneed to resolve all variable references first. This is done via theresolve_var_refs() calls made fromevent_hist_trigger(). What thisdoes is grabs the var_refs[] array from the hist_data representing thesched_switch histogram. For each one of those, the referencedvariable’s var.hist_data along with the current key is used to look upthe corresponding tracing_map_elt in that histogram. Once found, thereferenced variable’s var.idx is used to look up the variable’s valueusing tracing_map_read_var(elt, var.idx), which yields the value ofthe variable for that element, ts0 in the case above. Note that boththe hist_fields representing both the variable and the variablereference have the same var.idx, so this is straightforward.

Variable and variable reference test

This example creates a variable on the sched_waking event, ts0, anduses it in the sched_switch trigger. The sched_switch trigger alsocreates its own variable, wakeup_lat, but nothing yet uses it:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger

Looking at the sched_waking ‘hist_debug’ output, in addition to thenormal key and value hist_fields, in the val fields section we see afield with the HIST_FIELD_FL_VAR flag, which indicates that that fieldrepresents a variable. Note that in addition to the variable name,contained in the var.name field, it includes the var.idx, which is theindex into the tracing_map_elt.vars[] array of the actual variablelocation. Note also that the output shows that variables live in thesame part of the hist_data->fields[] array as normal values:

# cat events/sched/sched_waking/hist_debug# event histogram## trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]#hist_data: 000000009536f554n_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: ts0    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 8    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: pid    type: pid_t    size: 8    is_signed: 1

Moving on to the sched_switch trigger hist_debug output, in additionto the unused wakeup_lat variable, we see a new section displayingvariable references. Variable references are displayed in a separatesection because in addition to being logically separate fromvariables and values, they actually live in a separate hist_dataarray, var_refs[].

In this example, the sched_switch trigger has a reference to avariable on the sched_waking trigger, $ts0. Looking at the details,we can see that the var.hist_data value of the referenced variablematches the previously displayed sched_waking trigger, and the var.idxvalue matches the previously displayed var.idx value for thatvariable. Also displayed is the var_ref_idx value for that variablereference, which is where the value for that variable is cached foruse when the trigger is invoked:

# cat events/sched/sched_switch/hist_debug# event histogram## trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active]#hist_data: 00000000f4ee8006n_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 0    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: next_pid    type: pid_t    size: 8    is_signed: 1variable reference fields:  hist_data->var_refs[0]:    flags:      HIST_FIELD_FL_VAR_REF    name: ts0    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 000000009536f554    var_ref_idx (into hist_data->var_refs[]): 0    type: u64    size: 8    is_signed: 0

The commands below can be used to clean things up for the next test:

# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger

Actions and Handlers

Adding onto the previous example, we will now do something with thatwakeup_lat variable, namely send it and another field as a syntheticevent.

Theonmatch() action below basically says that whenever we have asched_switch event, if we have a matching sched_waking event, in thiscase if we have a pid in the sched_waking histogram that matches thenext_pid field on this sched_switch event, we retrieve thevariables specified in thewakeup_latency() trace action, and usethem to generate a new wakeup_latency event into the trace stream.

Note that the way the trace handlers such aswakeup_latency() (whichcould equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid)are implemented, the parameters specified to the trace handler must bevariables. In this case, $wakeup_lat is obviously a variable, butnext_pid isn’t, since it’s just naming a field in the sched_switchtrace event. Since this is something that almost everytrace() andsave() action does, a special shortcut is implemented to allow fieldnames to be used directly in those cases. How it works is that underthe covers, a temporary variable is created for the named field, andthis variable is what is actually passed to the trace handler. In thecode and documentation, this type of variable is called a ‘fieldvariable’.

Fields on other trace event’s histograms can be used as well. In thatcase we have to generate a new histogram and an unfortunately named‘synthetic_field’ (the use of synthetic here has nothing to do withsynthetic events) and use that special histogram field as a variable.

The diagram below illustrates the new elements described above in thecontext of the sched_switch histogram using theonmatch() handler andthetrace() action.

First, we define the wakeup_latency synthetic event:

# echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events

Next, the sched_waking hist trigger as before:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>        events/sched/sched_waking/trigger

Finally, we create a hist trigger on the sched_switch event thatgenerates awakeup_latency() trace event. In this case we passnext_pid into the wakeup_latency synthetic event invocation, whichmeans it will be automatically converted into a field variable:

# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \        onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >>        /sys/kernel/tracing/events/sched/sched_switch/trigger

The diagram for the sched_switch event is similar to previous examplesbut shows the additional field_vars[] array for hist_data and showsthe linkages between the field_vars and the variables and referencescreated to implement the field variables. The details are discussedbelow:

  +------------------+  | hist_data        |  +------------------+   +-----------------------+    | .fields[]      |-->| val = hitcount        |    +----------------+   +-----------------------+    | .map           |     | .size               |    +----------------+     +---------------------++---| .field_vars[]  |     | .offset             ||   +----------------+     +---------------------+|+--| .var_refs[]    |     | .offset             |||  +----------------+     +---------------------+||                         | .fn()               |||   var_ref_vals[]        +---------------------+||  +-------------+        | .flags              |||  | $ts0        |<---+   +---------------------+||  +-------------+    |   | .var.idx            |||  | $next_pid   |<-+ |   +---------------------+||  +-------------+  | |   | .var.hist_data      |||+>| $wakeup_lat |  | |   +---------------------+||| +-------------+  | |   | .var_ref_idx        |||| |             |  | | +-----------------------+||| +-------------+  | | | var = wakeup_lat      ||||        .         | | +-----------------------+|||        .         | |   | .size               ||||        .         | |   +---------------------+||| +-------------+  | |   | .offset             |||| |             |  | |   +---------------------+||| +-------------+  | |   | .fn()               |||| |             |  | |   +---------------------+||| +-------------+  | |   | .flags & FL_VAR     ||||                  | |   +---------------------+|||                  | |   | .var.idx            ||||                  | |   +---------------------+|||                  | |   | .var.hist_data      ||||                  | |   +---------------------+|||                  | |   | .var_ref_idx        ||||                  | |   +---------------------+|||                  | |              .|||                  | |              .|||                  | |              .|||                  | |              .||| +--------------+ | |              .+-->| field_var    | | |              . || +--------------+ | |              . ||   | var        | | |              . ||   +------------+ | |              . ||   | val        | | |              . || +--------------+ | |              . || | field_var    | | |              . || +--------------+ | |              . ||   | var        | | |              . ||   +------------+ | |              . ||   | val        | | |              . ||   +------------+ | |              . ||         .        | |              . ||         .        | |              . ||         .        | | +-----------------------+ <--- n_vals || +--------------+ | | | key = pid             | || | field_var    | | | +-----------------------+ || +--------------+ | |   | .size               | ||   | var        |--+|   +---------------------+ ||   +------------+ |||   | .offset             | ||   | val        |-+||   +---------------------+ ||   +------------+ |||   | .fn()               | ||                  |||   +---------------------+ ||                  |||   | .flags              | ||                  |||   +---------------------+ ||                  |||   | .var.idx            | ||                  |||   +---------------------+ <--- n_fields ||                  ||| ||                  |||                           n_keys = n_fields - n_vals ||                  ||| +-----------------------+ ||                  |+->| var = next_pid        | ||                  | | +-----------------------+ ||                  | |   | .size               | ||                  | |   +---------------------+ ||                  | |   | .offset             | ||                  | |   +---------------------+ ||                  | |   | .flags & FL_VAR     | ||                  | |   +---------------------+ ||                  | |   | .var.idx            | ||                  | |   +---------------------+ ||                  | |   | .var.hist_data      | ||                  | | +-----------------------+ ||                  +-->| val for next_pid      | ||                  | | +-----------------------+ ||                  | |   | .size               | ||                  | |   +---------------------+ ||                  | |   | .offset             | ||                  | |   +---------------------+ ||                  | |   | .fn()               | ||                  | |   +---------------------+ ||                  | |   | .flags              | ||                  | |   +---------------------+ ||                  | |   |                     | ||                  | |   +---------------------+ ||                  | | ||                  | | ||                  | | +-----------------------+ +|------------------|-|>| var_ref = $ts0        |  |                  | | +-----------------------+  |                  | |   | .size               |  |                  | |   +---------------------+  |                  | |   | .offset             |  |                  | |   +---------------------+  |                  | |   | .fn()               |  |                  | |   +---------------------+  |                  | |   | .flags & FL_VAR_REF |  |                  | |   +---------------------+  |                  | +---| .var_ref_idx        |  |                  |   +-----------------------+  |                  |   | var_ref = $next_pid   |  |                  |   +-----------------------+  |                  |     | .size               |  |                  |     +---------------------+  |                  |     | .offset             |  |                  |     +---------------------+  |                  |     | .fn()               |  |                  |     +---------------------+  |                  |     | .flags & FL_VAR_REF |  |                  |     +---------------------+  |                  +-----| .var_ref_idx        |  |                      +-----------------------+  |                      | var_ref = $wakeup_lat |  |                      +-----------------------+  |                        | .size               |  |                        +---------------------+  |                        | .offset             |  |                        +---------------------+  |                        | .fn()               |  |                        +---------------------+  |                        | .flags & FL_VAR_REF |  |                        +---------------------+  +------------------------| .var_ref_idx        |                           +---------------------+

As you can see, for a field variable, two hist_fields are created: onerepresenting the variable, in this case next_pid, and one to actuallyget the value of the field from the trace stream, like a normal valfield does. These are created separately from normal variablecreation and are saved in the hist_data->field_vars[] array. Seebelow for how these are used. In addition, a reference hist_field isalso created, which is needed to reference the field variables such as$next_pid variable in thetrace() action.

Note that $wakeup_lat is also a variable reference, referencing thevalue of the expression common_timestamp-$ts0, and so also needs tohave a hist field entry representing that reference created.

Whenhist_trigger_elt_update() is called to get the normal key andvalue fields, it also callsupdate_field_vars(), which goes througheach field_var created for the histogram, and available fromhist_data->field_vars and calls val->fn() to get the data from thecurrent trace record, and then uses the var’s var.idx to set thevariable at the var.idx offset in the appropriate tracing_map_elt’svariable at elt->vars[var.idx].

Once all the variables have been updated,resolve_var_refs() can becalled fromevent_hist_trigger(), and not only can our $ts0 and$next_pid references be resolved but the $wakeup_lat reference aswell. At this point, thetrace() action can simply access the valuesassembled in the var_ref_vals[] array and generate the trace event.

The same process occurs for the field variables associated with thesave() action.

Abbreviations used in the diagram:

hist_data = struct hist_trigger_datahist_data.fields = struct hist_fieldfield_var = struct field_varfn = hist_field_fn_tFL_KEY = HIST_FIELD_FL_KEYFL_VAR = HIST_FIELD_FL_VARFL_VAR_REF = HIST_FIELD_FL_VAR_REF

trace() action field variable test

This example adds to the previous test example by finally making useof the wakeup_lat variable, but in addition also creates a couple offield variables that then are all passed to thewakeup_latency() traceaction via theonmatch() handler.

First, we create the wakeup_latency synthetic event:

# echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events

Next, the sched_waking trigger from previous examples:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger

Finally, as in the previous test example, we calculate and assign thewakeup latency using the $ts0 reference from the sched_waking triggerto the wakeup_lat variable, and finally use it along with a couplesched_switch event fields, next_pid and next_comm, to generate awakeup_latency trace event. The next_pid and next_comm event fieldsare automatically converted into field variables for this purpose:

# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger

The sched_waking hist_debug output shows the same data as in theprevious test example:

# cat events/sched/sched_waking/hist_debug# event histogram## trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]#hist_data: 00000000d60ff61fn_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: ts0    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 8    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: pid    type: pid_t    size: 8    is_signed: 1

The sched_switch hist_debug output shows the same key and value fieldsas in the previous test example - note that wakeup_lat is still in theval fields section, but that the new field variables are not there -although the field variables are variables, they’re held separately inthe hist_data’s field_vars[] array. Although the field variables andthe normal variables are located in separate places, you can see thatthe actual variable locations for those variables in thetracing_map_elt.vars[] do have increasing indices as expected:wakeup_lat takes the var.idx = 0 slot, while the field variables fornext_pid and next_comm have values var.idx = 1, and var.idx = 2. Notealso that those are the same values displayed for the variablereferences corresponding to those variables in the variable referencefields section. Since there are two triggers and thus two hist_dataaddresses, those addresses also need to be accounted for when doingthe matching - you can see that the first variable refers to the 0var.idx on the previous hist trigger (see the hist_data addressassociated with that trigger), while the second variable refers to the0 var.idx on the sched_switch hist trigger, as do all the remainingvariable references.

Finally, the action tracking variables section just shows the systemand event name for theonmatch() handler:

# cat events/sched/sched_switch/hist_debug# event histogram## trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active]#hist_data: 0000000008f551b7n_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 0    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: next_pid    type: pid_t    size: 8    is_signed: 1variable reference fields:  hist_data->var_refs[0]:    flags:      HIST_FIELD_FL_VAR_REF    name: ts0    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 00000000d60ff61f    var_ref_idx (into hist_data->var_refs[]): 0    type: u64    size: 8    is_signed: 0  hist_data->var_refs[1]:    flags:      HIST_FIELD_FL_VAR_REF    name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 0000000008f551b7    var_ref_idx (into hist_data->var_refs[]): 1    type: u64    size: 0    is_signed: 0  hist_data->var_refs[2]:    flags:      HIST_FIELD_FL_VAR_REF    name: next_pid    var.idx (into tracing_map_elt.vars[]): 1    var.hist_data: 0000000008f551b7    var_ref_idx (into hist_data->var_refs[]): 2    type: pid_t    size: 4    is_signed: 0  hist_data->var_refs[3]:    flags:      HIST_FIELD_FL_VAR_REF    name: next_comm    var.idx (into tracing_map_elt.vars[]): 2    var.hist_data: 0000000008f551b7    var_ref_idx (into hist_data->var_refs[]): 3    type: char[16]    size: 256    is_signed: 0field variables:  hist_data->field_vars[0]:    field_vars[0].var:    flags:      HIST_FIELD_FL_VAR    var.name: next_pid    var.idx (into tracing_map_elt.vars[]): 1    field_vars[0].val:    ftrace_event_field name: next_pid    type: pid_t    size: 4    is_signed: 1  hist_data->field_vars[1]:    field_vars[1].var:    flags:      HIST_FIELD_FL_VAR    var.name: next_comm    var.idx (into tracing_map_elt.vars[]): 2    field_vars[1].val:    ftrace_event_field name: next_comm    type: char[16]    size: 256    is_signed: 0action tracking variables (for onmax()/onchange()/onmatch()):  hist_data->actions[0].match_data.event_system: sched  hist_data->actions[0].match_data.event: sched_waking

The commands below can be used to clean things up for the next test:

# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger# echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events

action_data and the trace() action

As mentioned above, when thetrace() action generates a syntheticevent, all the parameters to the synthetic event either already arevariables or are converted into variables (via field variables), andfinally all those variable values are collected via references to theminto a var_ref_vals[] array.

The values in the var_ref_vals[] array, however, don’t necessarilyfollow the same ordering as the synthetic event params. To addressthat,structaction_data contains another array, var_ref_idx[] thatmaps the trace action params to the var_ref_vals[] values. Below is adiagram illustrating that for thewakeup_latency() synthetic event:

+------------------+     wakeup_latency()| action_data      |       event params               var_ref_vals[]+------------------+    +-----------------+        +-----------------+  | .var_ref_idx[] |--->| $wakeup_lat idx |---+    |                 |  +----------------+    +-----------------+   |    +-----------------+  | .synth_event   |    | $next_pid idx   |---|-+  | $wakeup_lat val |  +----------------+    +-----------------+   | |  +-----------------+                                 .            | +->| $next_pid val   |                                 .            |    +-----------------+                                 .            |           .                        +-----------------+   |           .                        |                 |   |           .                        +-----------------+   |    +-----------------+                                              +--->| $wakeup_lat val |                                                   +-----------------+

Basically, how this ends up getting used in the synthetic event probefunction,trace_event_raw_event_synth(), is as follows:

for each field i in .synth_event  val_idx = .var_ref_idx[i]  val = var_ref_vals[val_idx]

action_data and the onXXX() handlers

The hist triggeronXXX() actions other thanonmatch(), such asonmax()andonchange(), also make use of and internally create hiddenvariables. This information is contained in theaction_data.track_data struct, and is also visible in the hist_debugoutput as will be described in the example below.

Typically, theonmax() oronchange() handlers are used in conjunctionwith thesave() andsnapshot() actions. For example:

# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \        onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >>        /sys/kernel/tracing/events/sched/sched_switch/trigger

or:

# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \        onmax($wakeup_lat).snapshot()' >>        /sys/kernel/tracing/events/sched/sched_switch/trigger

save() action field variable test

For this example, instead of generating a synthetic event, thesave()action is used to save field values whenever anonmax() handlerdetects that a new max latency has been hit. As in the previousexample, the values being saved are also field values, but in thiscase, are kept in a separate hist_data array named save_vars[].

As in previous test examples, we set up the sched_waking trigger:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger

In this case, however, we set up the sched_switch trigger to save somesched_switch field values whenever we hit a new maximum latency. Forboth theonmax() handler andsave() action, variables will be created,which we can use the hist_debug files to examine:

# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger

The sched_waking hist_debug output shows the same data as in theprevious test examples:

# cat events/sched/sched_waking/hist_debug## trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]#hist_data: 00000000e6290f48n_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: ts0    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 8    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: pid    type: pid_t    size: 8    is_signed: 1

The output of the sched_switch trigger shows the same val and keyvalues as before, but also shows a couple new sections.

First, the action tracking variables section now shows theactions[].track_data information describing the special trackingvariables and references used to track, in this case, the runningmaximum value. The actions[].track_data.var_ref member contains thereference to the variable being tracked, in this case the $wakeup_latvariable. In order to perform theonmax() handler function, therealso needs to be a variable that tracks the current maximum by gettingupdated whenever a new maximum is hit. In this case, we can see thatan auto-generated variable named ‘ __max’ has been created and isvisible in the actions[].track_data.track_var variable.

Finally, in the new ‘save action variables’ section, we can see thatthe 4 params to thesave() function have resulted in 4 field variablesbeing created for the purposes of saving the values of the namedfields when the max is hit. These variables are kept in a separatesave_vars[] array off of hist_data, so are displayed in a separatesection:

# cat events/sched/sched_switch/hist_debug# event histogram## trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active]#hist_data: 0000000057bcd28dn_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 0    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: next_pid    type: pid_t    size: 8    is_signed: 1variable reference fields:  hist_data->var_refs[0]:    flags:      HIST_FIELD_FL_VAR_REF    name: ts0    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 00000000e6290f48    var_ref_idx (into hist_data->var_refs[]): 0    type: u64    size: 8    is_signed: 0  hist_data->var_refs[1]:    flags:      HIST_FIELD_FL_VAR_REF    name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 0000000057bcd28d    var_ref_idx (into hist_data->var_refs[]): 1    type: u64    size: 0    is_signed: 0action tracking variables (for onmax()/onchange()/onmatch()):  hist_data->actions[0].track_data.var_ref:    flags:      HIST_FIELD_FL_VAR_REF    name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 0000000057bcd28d    var_ref_idx (into hist_data->var_refs[]): 1    type: u64    size: 0    is_signed: 0  hist_data->actions[0].track_data.track_var:    flags:      HIST_FIELD_FL_VAR    var.name: __max    var.idx (into tracing_map_elt.vars[]): 1    type: u64    size: 8    is_signed: 0save action variables (save() params):  hist_data->save_vars[0]:    save_vars[0].var:    flags:      HIST_FIELD_FL_VAR    var.name: next_comm    var.idx (into tracing_map_elt.vars[]): 2    save_vars[0].val:    ftrace_event_field name: next_comm    type: char[16]    size: 256    is_signed: 0  hist_data->save_vars[1]:    save_vars[1].var:    flags:      HIST_FIELD_FL_VAR    var.name: prev_pid    var.idx (into tracing_map_elt.vars[]): 3    save_vars[1].val:    ftrace_event_field name: prev_pid    type: pid_t    size: 4    is_signed: 1  hist_data->save_vars[2]:    save_vars[2].var:    flags:      HIST_FIELD_FL_VAR    var.name: prev_prio    var.idx (into tracing_map_elt.vars[]): 4    save_vars[2].val:    ftrace_event_field name: prev_prio    type: int    size: 4    is_signed: 1  hist_data->save_vars[3]:    save_vars[3].var:    flags:      HIST_FIELD_FL_VAR    var.name: prev_comm    var.idx (into tracing_map_elt.vars[]): 5    save_vars[3].val:    ftrace_event_field name: prev_comm    type: char[16]    size: 256    is_signed: 0

The commands below can be used to clean things up for the next test:

# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger

A couple special cases

While the above covers the basics of the histogram internals, thereare a couple of special cases that should be discussed, since theytend to create even more confusion. Those are field variables on otherhistograms, and aliases, both described below through example testsusing the hist_debug files.

Test of field variables on other histograms

This example is similar to the previous examples, but in this case,the sched_switch trigger references a hist trigger field on anotherevent, namely the sched_waking event. In order to accomplish this, afield variable is created for the other event, but since an existinghistogram can’t be used, as existing histograms are immutable, a newhistogram with a matching variable is created and used, and we’ll seethat reflected in the hist_debug output shown below.

First, we create the wakeup_latency synthetic event. Note theaddition of the prio field:

# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events

As in previous test examples, we set up the sched_waking trigger:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger

Here we set up a hist trigger on sched_switch to send a wakeup_latencyevent using an onmatch handler naming the sched_waking event. Notethat the third param being passed to thewakeup_latency() is prio,which is a field name that needs to have a field variable created forit. There isn’t however any prio field on the sched_switch event soit would seem that it wouldn’t be possible to create a field variablefor it. The matching sched_waking event does have a prio field, so itshould be possible to make use of it for this purpose. The problemwith that is that it’s not currently possible to define a new variableon an existing histogram, so it’s not possible to add a new prio fieldvariable to the existing sched_waking histogram. It is howeverpossible to create an additional new ‘matching’ sched_waking histogramfor the same event, meaning that it uses the same key and filters, anddefine the new prio field variable on that.

Here’s the sched_switch trigger:

# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger

And here’s the output of the hist_debug information for thesched_waking hist trigger. Note that there are two histogramsdisplayed in the output: the first is the normal sched_wakinghistogram we’ve seen in the previous examples, and the second is thespecial histogram we created to provide the prio field variable.

Looking at the second histogram below, we see a variable with the namesynthetic_prio. This is the field variable created for the prio fieldon that sched_waking histogram:

# cat events/sched/sched_waking/hist_debug# event histogram## trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]#hist_data: 00000000349570e4n_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: ts0    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 8    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: pid    type: pid_t    size: 8    is_signed: 1# event histogram## trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active]#hist_data: 000000006920cf38n_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    ftrace_event_field name: prio    var.name: synthetic_prio    var.idx (into tracing_map_elt.vars[]): 0    type: int    size: 4    is_signed: 1key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: pid    type: pid_t    size: 8    is_signed: 1

Looking at the sched_switch histogram below, we can see a reference tothe synthetic_prio variable on sched_waking, and looking at theassociated hist_data address we see that it is indeed associated withthe new histogram. Note also that the other references are to anormal variable, wakeup_lat, and to a normal field variable, next_pid,the details of which are in the field variables section:

# cat events/sched/sched_switch/hist_debug# event histogram## trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active]#hist_data: 00000000a73b67dfn_vals: 2n_keys: 1n_fields: 3val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    var.name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    type: u64    size: 0    is_signed: 0key fields:  hist_data->fields[2]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: next_pid    type: pid_t    size: 8    is_signed: 1variable reference fields:  hist_data->var_refs[0]:    flags:      HIST_FIELD_FL_VAR_REF    name: ts0    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 00000000349570e4    var_ref_idx (into hist_data->var_refs[]): 0    type: u64    size: 8    is_signed: 0  hist_data->var_refs[1]:    flags:      HIST_FIELD_FL_VAR_REF    name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 00000000a73b67df    var_ref_idx (into hist_data->var_refs[]): 1    type: u64    size: 0    is_signed: 0  hist_data->var_refs[2]:    flags:      HIST_FIELD_FL_VAR_REF    name: next_pid    var.idx (into tracing_map_elt.vars[]): 1    var.hist_data: 00000000a73b67df    var_ref_idx (into hist_data->var_refs[]): 2    type: pid_t    size: 4    is_signed: 0  hist_data->var_refs[3]:    flags:      HIST_FIELD_FL_VAR_REF    name: synthetic_prio    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 000000006920cf38    var_ref_idx (into hist_data->var_refs[]): 3    type: int    size: 4    is_signed: 1field variables:  hist_data->field_vars[0]:    field_vars[0].var:    flags:      HIST_FIELD_FL_VAR    var.name: next_pid    var.idx (into tracing_map_elt.vars[]): 1    field_vars[0].val:    ftrace_event_field name: next_pid    type: pid_t    size: 4    is_signed: 1action tracking variables (for onmax()/onchange()/onmatch()):  hist_data->actions[0].match_data.event_system: sched  hist_data->actions[0].match_data.event: sched_waking

The commands below can be used to clean things up for the next test:

# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger# echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events

Alias test

This example is very similar to previous examples, but demonstratesthe alias flag.

First, we create the wakeup_latency synthetic event:

# echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events

Next, we create a sched_waking trigger similar to previous examples,but in this case we save the pid in the waking_pid variable:

# echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger

For the sched_switch trigger, instead of using $waking_pid directly inthe wakeup_latency synthetic event invocation, we create an alias of$waking_pid named $woken_pid, and use that in the synthetic eventinvocation instead:

# echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger

Looking at the sched_waking hist_debug output, in addition to thenormal fields, we can see the waking_pid variable:

# cat events/sched/sched_waking/hist_debug# event histogram## trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]#hist_data: 00000000a250528cn_vals: 3n_keys: 1n_fields: 4val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR    ftrace_event_field name: pid    var.name: waking_pid    var.idx (into tracing_map_elt.vars[]): 0    type: pid_t    size: 4    is_signed: 1  hist_data->fields[2]:    flags:      HIST_FIELD_FL_VAR    var.name: ts0    var.idx (into tracing_map_elt.vars[]): 1    type: u64    size: 8    is_signed: 0key fields:  hist_data->fields[3]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: pid    type: pid_t    size: 8    is_signed: 1

The sched_switch hist_debug output shows that a variable namedwoken_pid has been created but that it also has theHIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flagset, which is why it appears in the val field section.

Despite that implementation detail, an alias variable is actually morelike a variable reference; in fact it can be thought of as a referenceto a reference. The implementation copies the var_ref->fn() from thevariable reference being referenced, in this case, the waking_pidfn(), which ishist_field_var_ref() and makes that thefn() of thealias. Thehist_field_var_ref()fn() requires the var_ref_idx of thevariable reference it’s using, so waking_pid’s var_ref_idx is alsocopied to the alias. The end result is that when the value of aliasis retrieved, in the end it just does the same thing the originalreference would have done and retrieves the same value from thevar_ref_vals[] array. You can verify this in the output by notingthat the var_ref_idx of the alias, in this case woken_pid, is the sameas the var_ref_idx of the reference, waking_pid, in the variablereference fields section.

Additionally, once it gets that value, since it is also a variable, itthen saves that value into its var.idx. So the var.idx of thewoken_pid alias is 0, which it fills with the value from var_ref_idx 0when itsfn() is called to update itself. You’ll also notice thatthere’s a woken_pid var_ref in the variable refs section. That is thereference to the woken_pid alias variable, and you can see that itretrieves the value from the same var.idx as the woken_pid alias, 0,and then in turn saves that value in its own var_ref_idx slot, 3, andthe value at this position is finally what gets assigned to the$woken_pid slot in the trace event invocation:

# cat events/sched/sched_switch/hist_debug# event histogram## trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active]#hist_data: 0000000055d65ed0n_vals: 3n_keys: 1n_fields: 4val fields:  hist_data->fields[0]:    flags:      VAL: HIST_FIELD_FL_HITCOUNT    type: u64    size: 8    is_signed: 0  hist_data->fields[1]:    flags:      HIST_FIELD_FL_VAR      HIST_FIELD_FL_ALIAS    var.name: woken_pid    var.idx (into tracing_map_elt.vars[]): 0    var_ref_idx (into hist_data->var_refs[]): 0    type: pid_t    size: 4    is_signed: 1  hist_data->fields[2]:    flags:      HIST_FIELD_FL_VAR    var.name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 1    type: u64    size: 0    is_signed: 0key fields:  hist_data->fields[3]:    flags:      HIST_FIELD_FL_KEY    ftrace_event_field name: next_pid    type: pid_t    size: 8    is_signed: 1variable reference fields:  hist_data->var_refs[0]:    flags:      HIST_FIELD_FL_VAR_REF    name: waking_pid    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 00000000a250528c    var_ref_idx (into hist_data->var_refs[]): 0    type: pid_t    size: 4    is_signed: 1  hist_data->var_refs[1]:    flags:      HIST_FIELD_FL_VAR_REF    name: ts0    var.idx (into tracing_map_elt.vars[]): 1    var.hist_data: 00000000a250528c    var_ref_idx (into hist_data->var_refs[]): 1    type: u64    size: 8    is_signed: 0  hist_data->var_refs[2]:    flags:      HIST_FIELD_FL_VAR_REF    name: wakeup_lat    var.idx (into tracing_map_elt.vars[]): 1    var.hist_data: 0000000055d65ed0    var_ref_idx (into hist_data->var_refs[]): 2    type: u64    size: 0    is_signed: 0  hist_data->var_refs[3]:    flags:      HIST_FIELD_FL_VAR_REF    name: woken_pid    var.idx (into tracing_map_elt.vars[]): 0    var.hist_data: 0000000055d65ed0    var_ref_idx (into hist_data->var_refs[]): 3    type: pid_t    size: 4    is_signed: 1  hist_data->var_refs[4]:    flags:      HIST_FIELD_FL_VAR_REF    name: next_comm    var.idx (into tracing_map_elt.vars[]): 2    var.hist_data: 0000000055d65ed0    var_ref_idx (into hist_data->var_refs[]): 4    type: char[16]    size: 256    is_signed: 0field variables:  hist_data->field_vars[0]:    field_vars[0].var:    flags:      HIST_FIELD_FL_VAR    var.name: next_comm    var.idx (into tracing_map_elt.vars[]): 2    field_vars[0].val:    ftrace_event_field name: next_comm    type: char[16]    size: 256    is_signed: 0action tracking variables (for onmax()/onchange()/onmatch()):  hist_data->actions[0].match_data.event_system: sched  hist_data->actions[0].match_data.event: sched_waking

The commands below can be used to clean things up for the next test:

# echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger# echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events