Using ftrace to hook to functions¶
Written for: 4.14
Introduction¶
The ftrace infrastructure was originally created to attach callbacks to thebeginning of functions in order to record and trace the flow of the kernel.But callbacks to the start of a function can have other use cases. Eitherfor live kernel patching, or for security monitoring. This document describeshow to use ftrace to implement your own function callbacks.
The ftrace context¶
Warning
The ability to add a callback to almost any function within thekernel comes with risks. A callback can be called from any context(normal, softirq, irq, and NMI). Callbacks can also be called just beforegoing to idle, during CPU bring up and takedown, or going to user space.This requires extra care to what can be done inside a callback. A callbackcan be called outside the protective scope of RCU.
The ftrace infrastructure has some protections against recursions and RCUbut one must still be very careful how they use the callbacks.
The ftrace_ops structure¶
To register a function callback, a ftrace_ops is required. This structureis used to tell ftrace what function should be called as the callbackas well as what protections the callback will perform and not requireftrace to handle.
There is only one field that is needed to be set when registeringan ftrace_ops with ftrace:
structftrace_opsops={.func=my_callback_func,.flags=MY_FTRACE_FLAGS.private=any_private_data_structure,};
Both .flags and .private are optional. Only .func is required.
To enable tracing call:
register_ftrace_function(&ops);
To disable tracing call:
unregister_ftrace_function(&ops);
The above is defined by including the header:
#include <linux/ftrace.h>
The registered callback will start being called some time after theregister_ftrace_function() is called and before it returns. The exact timethat callbacks start being called is dependent upon architecture and schedulingof services. The callback itself will have to handle any synchronization if itmust begin at an exact moment.
The unregister_ftrace_function() will guarantee that the callback isno longer being called by functions after the unregister_ftrace_function()returns. Note that to perform this guarantee, the unregister_ftrace_function()may take some time to finish.
The callback function¶
The prototype of the callback function is as follows (as of v4.14):
voidcallback_func(unsignedlongip,unsignedlongparent_ip,structftrace_ops*op,structpt_regs*regs);
- @ip
- This is the instruction pointer of the function that is being traced.(where the fentry or mcount is within the function)
- @parent_ip
- This is the instruction pointer of the function that called thethe function being traced (where the call of the function occurred).
- @op
- This is a pointer to ftrace_ops that was used to register the callback.This can be used to pass data to the callback via the private pointer.
- @regs
- If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTEDflags are set in the ftrace_ops structure, then this will be pointingto the pt_regs structure like it would be if an breakpoint was placedat the start of the function where ftrace was tracing. Otherwise iteither contains garbage, or NULL.
The ftrace FLAGS¶
The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.Some of the flags are used for internal infrastructure of ftrace, but theones that users should be aware of are the following:
- FTRACE_OPS_FL_SAVE_REGS
- If the callback requires reading or modifying the pt_regspassed to the callback, then it must set this flag. Registeringa ftrace_ops with this flag set on an architecture that does notsupport passing of pt_regs to the callback will fail.
- FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
- Similar to SAVE_REGS but the registering of aftrace_ops on an architecture that does not support passing of regswill not fail with this flag set. But the callback must check ifregs is NULL or not to determine if the architecture supports it.
- FTRACE_OPS_FL_RECURSION_SAFE
By default, a wrapper is added around the callback tomake sure that recursion of the function does not occur. That is,if a function that is called as a result of the callback’s executionis also traced, ftrace will prevent the callback from being calledagain. But this wrapper adds some overhead, and if the callback issafe from recursion, it can set this flag to disable the ftraceprotection.
Note, if this flag is set, and recursion does occur, it could causethe system to crash, and possibly reboot via a triple fault.
It is OK if another callback traces a function that is called by acallback that is marked recursion safe. Recursion safe callbacksmust never trace any function that are called by the callbackitself or any nested functions that those functions call.
If this flag is set, it is possible that the callback will alsobe called with preemption enabled (when CONFIG_PREEMPTION is set),but this is not guaranteed.
- FTRACE_OPS_FL_IPMODIFY
Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to “hijack”the traced function (have another function called instead of thetraced function), it requires setting this flag. This is what livekernel patches uses. Without this flag the pt_regs->ip can not bemodified.
Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may beregistered to any given function at a time.
- FTRACE_OPS_FL_RCU
If this is set, then the callback will only be called by functionswhere RCU is “watching”. This is required if the callback functionperforms any
rcu_read_lock()operation.RCU stops watching when the system goes idle, the time when a CPUis taken down and comes back online, and when entering from kernelto user space and back to kernel space. During these transitions,a callback may be executed and RCU synchronization will not protectit.
- FTRACE_OPS_FL_PERMANENT
If this is set on any ftrace ops, then the tracing cannot disabled bywriting 0 to the proc sysctl ftrace_enabled. Equally, a callback withthe flag set cannot be registered if ftrace_enabled is 0.
Livepatch uses it not to lose the function redirection, so the systemstays protected.
Filtering which functions to trace¶
If a callback is only to be called from specific functions, a filter must beset up. The filters are added by name, or ip if it is known.
intftrace_set_filter(structftrace_ops*ops,unsignedchar*buf,intlen,intreset);
- @ops
- The ops to set the filter with
- @buf
- The string that holds the function filter text.
- @len
- The length of the string.
- @reset
- Non-zero to reset all filters before applying this filter.
Filters denote which functions should be enabled when tracing is enabled.If @buf is NULL and reset is set, all functions will be enabled for tracing.
The @buf can also be a glob expression to enable all functions thatmatch a specific pattern.
See Filter Commands inDocumentation/trace/ftrace.rst.
To just trace the schedule function:
ret=ftrace_set_filter(&ops,"schedule",strlen("schedule"),0);
To add more functions, call the ftrace_set_filter() more than once with the@reset parameter set to zero. To remove the current filter set and replace itwith new functions defined by @buf, have @reset be non-zero.
To remove all the filtered functions and trace all functions:
ret=ftrace_set_filter(&ops,NULL,0,1);
Sometimes more than one function has the same name. To trace just a specificfunction in this case, ftrace_set_filter_ip() can be used.
ret=ftrace_set_filter_ip(&ops,ip,0,0);
Although the ip must be the address where the call to fentry or mcount islocated in the function. This function is used by perf and kprobes thatgets the ip address from the user (usually using debug info from the kernel).
If a glob is used to set the filter, functions can be added to a “notrace”list that will prevent those functions from calling the callback.The “notrace” list takes precedence over the “filter” list. If thetwo lists are non-empty and contain the same functions, the callback will notbe called by any function.
An empty “notrace” list means to allow all functions defined by the filterto be traced.
intftrace_set_notrace(structftrace_ops*ops,unsignedchar*buf,intlen,intreset);
This takes the same parameters as ftrace_set_filter() but will add thefunctions it finds to not be traced. This is a separate list from thefilter list, and this function does not modify the filter list.
A non-zero @reset will clear the “notrace” list before adding functionsthat match @buf to it.
Clearing the “notrace” list is the same as clearing the filter list
ret=ftrace_set_notrace(&ops,NULL,0,1);
The filter and notrace lists may be changed at any time. If only a set offunctions should call the callback, it is best to set the filters beforeregistering the callback. But the changes may also happen after the callbackhas been registered.
If a filter is in place, and the @reset is non-zero, and @buf contains amatching glob to functions, the switch will happen during the time ofthe ftrace_set_filter() call. At no time will all functions call the callback.
ftrace_set_filter(&ops,"schedule",strlen("schedule"),1);register_ftrace_function(&ops);msleep(10);ftrace_set_filter(&ops,"try_to_wake_up",strlen("try_to_wake_up"),1);
is not the same as:
ftrace_set_filter(&ops,"schedule",strlen("schedule"),1);register_ftrace_function(&ops);msleep(10);ftrace_set_filter(&ops,NULL,0,1);ftrace_set_filter(&ops,"try_to_wake_up",strlen("try_to_wake_up"),0);
As the latter will have a short time where all functions will callthe callback, between the time of the reset, and the time of thenew setting of the filter.