Incomputer science, thefunarg problem(function argument problem) refers to the difficulty in implementingfirst-class functions (functions asfirst-class objects) in programming language implementations so as to usestack-based memory allocation of the functions.
The difficulty only arises if the body of anested function refers directly (i.e., not by argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call.[1] A standard resolution is either to forbid such references or to createclosures.[2]
There are two subtly different versions of the funarg problem. Theupwards funarg problem arises from returning (or otherwise transmitting "upwards") a function from a function call. Thedownwards funarg problem arises from passing a function as a parameter to another function call.
When one function calls another during a typical program's execution, the local state of the caller (includingparameters andlocal variables) must be preserved in order for execution to proceed after the callee returns. In most compiled programs, this local state is stored on thecall stack in a data structure called astack frame oractivation record. This stack frame is pushed, or allocated, as prelude to calling another function, and is popped, or deallocated, when the other function returns to the function that did the call. The upwards funarg problem arises when the calling function refers to the called/exited function's state after that function has returned. Therefore, the stack frame containing the called function's state variables must not be deallocated when the function returns, violating thestack-based function call paradigm.
One solution to the upwards funarg problem is to simply allocate all activation records from theheap instead of the stack and rely on some form ofgarbage collection orreference counting to deallocate them when they are no longer needed. Managing activation records on the heap has historically been perceived to be less efficient than on the stack (although this is partially contradicted[3]) and has been perceived to impose significant implementation complexity. Most functions in typical programs (less so for programs infunctional programming languages) do not create upwards funargs, adding to concerns about potential overhead associated with their implementation. Furthermore, this approach is genuinely difficult in languages that do not support garbage collection.
Some efficiency-minded compilers employ a hybrid approach in which the activation records for a function are allocated from the stack if the compiler is able to deduce, throughstatic program analysis, that the function creates no upwards funargs. Otherwise, the activation records are allocated from the heap.
Another solution is to simply copy the value of the variables into the closure at the time the closure is created. This will cause a different behavior in the case of mutable variables, because the state will no longer be shared between closures. But if it is known that the variables are constant, then this approach will be equivalent. TheML languages take this approach, since variables in those languages are bound to values—i.e. variables cannot be changed.Java also takes this approach with respect to anonymous classes (and lambdas since Java 8), in that it only allows one to refer to variables in the enclosing scope that are effectivelyfinal
(i.e. constant).
Some languages allow the programmer to explicitly choose between the two behaviors.PHP 5.3's anonymous functions require one to specify which variables to include in the closure using theuse ()
clause; if the variable is listed by reference, it includes a reference to the original variable; otherwise, it passes the value. In Apple'sBlocks anonymous functions, captured local variables are by default captured by value; if one wants to share the state between closures or between the closure and the outside scope, the variable must be declared with the__block
modifier, in which case that variable is allocated on the heap.
The followingHaskell-likepseudocode definesfunction composition:
composefg=λx→f(gx)
λ
is the operator for constructing a new function, which in this case has one argument,x
, and returns the result of first applyingg
tox
, then applyingf
to that. This λ function carries the functionsf
andg
(or pointers to them) as internal state.
The problem in this case exists if the compose function allocates the parameter variablesf
andg
on the stack. Whencompose
returns, the stack frame containingf
andg
is discarded. When the internal functionλx
attempts to accessg
, it will access a discarded memory area.
A downwards funarg may also refer to a function's state when that function is not actually executing. However, because, by definition, the existence of a downwards funarg is contained in the execution of the function that creates it, the stack frame for the function can usually still be stored on the stack. Nonetheless, the existence of downwards funargs implies a tree structure of closures and stack frames that can complicate human and machine reasoning about the program state.
The downwards funarg problem complicates the efficient compilation oftail calls and code written incontinuation-passing style. In these special cases, the intent of the programmer is (usually) that the function run in limited stack space, so the "faster" behavior may actually be undesirable.[clarification needed]
Historically, the upwards funarg problem has proven to be more difficult. For example, thePascal programming language allows functions to be passed as arguments but not returned as results; thus implementations of Pascal are required to address the downwards funarg problem but not the upwards one. TheModula-2 andOberon programming languages (descendants of Pascal) allow functions both as parameters and return values, but the assigned function may not be a nested function. TheC programming language historically avoids the main difficulty of the funarg problem by not allowing function definitions to be nested; because the environment of every function is the same, containing just the statically allocated global variables and functions, a pointer to a function's code describes the function completely.Apple has proposed and implemented aclosure syntax for C that solves the upwards funarg problem by dynamically moving closures from the stack to the heap as necessary.[citation needed] TheJava programming language deals with it by requiring that context used by nested functions in anonymous inner and local classes be declaredfinal
, and context used bylambda expressions be effectively final.C# andD have lambdas (closures) that encapsulate a function pointer and related variables.
Infunctional languages, functions are first-class values that can be passed anywhere. Thus, implementations ofScheme orStandard ML must address both the upwards and downwards funarg problems. This is usually accomplished by representing function values asheap-allocated closures, as previously described. TheOCaml compiler employs a hybrid technique (based onstatic program analysis) to maximize efficiency.[citation needed]