Movatterモバイル変換

[0]ホーム

Jump to content

Chain rule

Edit links

From Wikipedia, the free encyclopedia

Formula in calculus

For other uses, seeChain rule (disambiguation).

Part of a series of articles about

Calculus

\int _{a}^{b}f'(t)\,dt=f(b)-f(a)

Fundamental theorem

Differential

Definitions
Derivative (generalizations) Differential infinitesimal of a function total
Concepts
Differentiation notation Second derivative Implicit differentiation Logarithmic differentiation Related rates Taylor's theorem
Rules and identities
Sum Product Chain Power Quotient L'Hôpital's rule Inverse General Leibniz Faà di Bruno's formula Reynolds

Integral

Definitions
Lists of integrals Integral transform Leibniz integral rule
Antiderivative Integral (improper) Riemann integral Lebesgue integration Contour integration Integral of inverse functions
Integration by
Parts Discs Cylindrical shells Substitution (trigonometric,tangent half-angle,Euler) Euler's formula Partial fractions (Heaviside's method) Changing order Reduction formulae Differentiating under the integral sign Risch algorithm

Series

Convergence tests
Geometric (arithmetico-geometric) Harmonic Alternating Power Binomial Taylor
Summand limit (term test) Ratio Root Integral Direct comparison Limit comparison Alternating series Cauchy condensation Dirichlet Abel

Vector

Theorems
Gradient Divergence Curl Laplacian Directional derivative Identities
Gradient Green's Stokes' Divergence Generalized Stokes Helmholtz decomposition

Multivariable

Formalisms
Matrix Tensor Exterior Geometric
Definitions
Partial derivative Multiple integral Line integral Surface integral Volume integral Jacobian Hessian

Advanced

Specialized

Miscellanea

Incalculus, thechain rule is aformula that expresses thederivative of thecomposition of twodifferentiable functionsf andg in terms of the derivatives off andg. More precisely, if $h=f\circ g$ is the function such that $h(x)=f(g(x))$ for everyx, then the chain rule is, inLagrange's notation, $h'(x)=f'(g(x))g'(x).$ or, equivalently, $h'=(f\circ g)'=(f'\circ g)\cdot g'.$

The chain rule may also be expressed inLeibniz's notation. If a variablez depends on the variabley, which itself depends on the variablex (that is,y andz aredependent variables), thenz depends onx as well, via the intermediate variabley. In this case, the chain rule is expressed as ${\frac {dz}{dx}}={\frac {dz}{dy}}\cdot {\frac {dy}{dx}},$ and $\left.{\frac {dz}{dx}}\right|_{x}=\left.{\frac {dz}{dy}}\right|_{y(x)}\cdot \left.{\frac {dy}{dx}}\right|_{x},$ for indicating at which points the derivatives have to be evaluated.

Inintegration, the counterpart to the chain rule is thesubstitution rule.

Intuitive explanation

[edit]

Intuitively, the chain rule states that knowing the instantaneous rate of change ofz relative toy and that ofy relative tox allows one to calculate the instantaneous rate of change ofz relative tox as the product of the two rates of change.

As put byGeorge F. Simmons: "If a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 × 4 = 8 times as fast as the man."^[1]^[2]

The relationship between this example and the chain rule is as follows. Letz,y andx be the (variable) positions of the car, the bicycle, and the walking man, respectively. The rate of change of relative positions of the car and the bicycle is ${\textstyle {\frac {dz}{dy}}=2.}$ Similarly, ${\textstyle {\frac {dy}{dx}}=4.}$ So, the rate of change of the relative positions of the car and the walking man is ${\frac {dz}{dx}}={\frac {dz}{dy}}\cdot {\frac {dy}{dx}}=2\cdot 4=8.$

The rate of change of positions is the ratio of the speeds, and the speed is the derivative of the position with respect to the time; that is, ${\frac {dz}{dx}}={\frac {\frac {dz}{dt}}{\frac {dx}{dt}}},$ or, equivalently, ${\frac {dz}{dt}}={\frac {dz}{dx}}\cdot {\frac {dx}{dt}},$ which is also an application of the chain rule.

History

[edit]

The chain rule seems to have first been used byGottfried Wilhelm Leibniz. He used it to calculate the derivative of ${\sqrt {a+bz+cz^{2}}}$ as the composite of the square root function and the function $a+bz+cz^{2}\!$ . He first mentioned it in a 1676 memoir (with a sign error in the calculation).^[3] The common notation of the chain rule is due to Leibniz.^[4]Guillaume de l'Hôpital used the chain rule implicitly in hisAnalyse des infiniment petits. The chain rule does not appear in any ofLeonhard Euler's analysis books, even though they were written over a hundred years after Leibniz's discovery.^{[citation needed]}. It is believed that the first "modern" version of the chain rule appears in Lagrange's 1797Théorie des fonctions analytiques; it also appears in Cauchy's 1823Résumé des Leçons données a L’École Royale Polytechnique sur Le Calcul Infinitesimal.^[4]

Statement

[edit]

The simplest form of the chain rule is for real-valued functions of onereal variable. It states that ifg is a function that is differentiable at a pointc (i.e. the derivativeg′(c) exists) andf is a function that is differentiable atg(c), then the composite function $f\circ g$ is differentiable atc, and the derivative is^[5] $(f\circ g)'(c)=f'(g(c))\cdot g'(c).$ The rule is sometimes abbreviated as $(f\circ g)'=(f'\circ g)\cdot g'.$

Ify =f(u) andu =g(x), then this abbreviated form is written inLeibniz notation as: ${\frac {dy}{dx}}={\frac {dy}{du}}\cdot {\frac {du}{dx}}.$

The points where the derivatives are evaluated may also be stated explicitly: $\left.{\frac {dy}{dx}}\right|_{x=c}=\left.{\frac {dy}{du}}\right|_{u=g(c)}\cdot \left.{\frac {du}{dx}}\right|_{x=c}.$

Carrying the same reasoning further, givenn functions $f_{1},\ldots ,f_{n}\!$ with the composite function $f_{1}\circ (f_{2}\circ \cdots (f_{n-1}\circ f_{n}))\!$ , if each function $f_{i}\!$ is differentiable at its immediate input, then the composite function is also differentiable by the repeated application of Chain Rule, where the derivative is (in Leibniz's notation): ${\frac {df_{1}}{dx}}={\frac {df_{1}}{df_{2}}}{\frac {df_{2}}{df_{3}}}\cdots {\frac {df_{n}}{dx}}.$

Applications

[edit]

Composites of more than two functions

[edit]

The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite off,g, andh (in that order) is the composite off withg ∘h. The chain rule states that to compute the derivative off ∘g ∘h, it is sufficient to compute the derivative off and the derivative ofg ∘h. The derivative off can be calculated directly, and the derivative ofg ∘h can be calculated by applying the chain rule again.^{[citation needed]}

For concreteness, consider the function $y=e^{\sin(x^{2})}.$ This can be decomposed as the composite of three functions: ${\begin{aligned}y&=f(u)=e^{u},\\u&=g(v)=\sin v,\\v&=h(x)=x^{2}.\end{aligned}}$ So that $y=f(g(h(x)))$ .

Their derivatives are: ${\begin{aligned}{\frac {dy}{du}}&=f'(u)=e^{u},\\{\frac {du}{dv}}&=g'(v)=\cos v,\\{\frac {dv}{dx}}&=h'(x)=2x.\end{aligned}}$

The chain rule states that the derivative of their composite at the pointx =a is: ${\begin{aligned}(f\circ g\circ h)'(a)&=f'((g\circ h)(a))\cdot (g\circ h)'(a)\\&=f'((g\circ h)(a))\cdot g'(h(a))\cdot h'(a)\\&=(f'\circ g\circ h)(a)\cdot (g'\circ h)(a)\cdot h'(a).\end{aligned}}$

InLeibniz's notation, this is: ${\frac {dy}{dx}}=\left.{\frac {dy}{du}}\right|_{u=g(h(a))}\cdot \left.{\frac {du}{dv}}\right|_{v=h(a)}\cdot \left.{\frac {dv}{dx}}\right|_{x=a},$ or for short, ${\frac {dy}{dx}}={\frac {dy}{du}}\cdot {\frac {du}{dv}}\cdot {\frac {dv}{dx}}.$ The derivative function is therefore: ${\frac {dy}{dx}}=e^{\sin(x^{2})}\cdot \cos(x^{2})\cdot 2x.$

Another way of computing this derivative is to view the composite functionf ∘g ∘h as the composite off ∘g andh. Applying the chain rule in this manner would yield: ${\begin{aligned}(f\circ g\circ h)'(a)&=(f\circ g)'(h(a))\cdot h'(a)\\&=f'(g(h(a)))\cdot g'(h(a))\cdot h'(a).\end{aligned}}$

This is the same as what was computed above. This should be expected because(f ∘g) ∘h =f ∘ (g ∘h).

Sometimes, it is necessary to differentiate an arbitrarily long composition of the form $f_{1}\circ f_{2}\circ \cdots \circ f_{n-1}\circ f_{n}\!$ . In this case, define $f_{a\,.\,.\,b}=f_{a}\circ f_{a+1}\circ \cdots \circ f_{b-1}\circ f_{b}$ where $f_{a\,.\,.\,a}=f_{a}$ and $f_{a\,.\,.\,b}(x)=x$ when $b<a$ . Then the chain rule takes the form ${\begin{aligned}Df_{1\,.\,.\,n}&=(Df_{1}\circ f_{2\,.\,.\,n})(Df_{2}\circ f_{3\,.\,.\,n})\cdots (Df_{n-1}\circ f_{n\,.\,.\,n})Df_{n}\\&=\prod _{k=1}^{n}\left[Df_{k}\circ f_{(k+1)\,.\,.\,n}\right]\end{aligned}}$ or, in the Lagrange notation, ${\begin{aligned}f_{1\,.\,.\,n}'(x)&=f_{1}'\left(f_{2\,.\,.\,n}(x)\right)\;f_{2}'\left(f_{3\,.\,.\,n}(x)\right)\cdots f_{n-1}'\left(f_{n\,.\,.\,n}(x)\right)\;f_{n}'(x)\\[1ex]&=\prod _{k=1}^{n}f_{k}'\left(f_{(k+1\,.\,.\,n)}(x)\right)\end{aligned}}$

Quotient rule

[edit]

Derivatives of inverse functions

[edit]

Main article:Inverse functions and differentiation

Suppose thaty =g(x) has aninverse function. Call its inverse functionf so that we havex =f(y). There is a formula for the derivative off in terms of the derivative ofg. To see this, note thatf andg satisfy the formula $f(g(x))=x.$

And because the functions $f(g(x))$ andx are equal, their derivatives must be equal. The derivative ofx is the constant function with value 1, and the derivative of $f(g(x))$ is determined by the chain rule. Therefore, we have that: $f'(g(x))g'(x)=1.$

To expressf' as a function of an independent variabley, we substitute $f(y)$ forx wherever it appears. Then we can solve forf'. ${\begin{aligned}f'(g(f(y)))g'(f(y))&=1\\f'(y)g'(f(y))&=1\\f'(y)={\frac {1}{g'(f(y))}}.\end{aligned}}$

For example, consider the functiong(x) =e^x. It has an inversef(y) = lny. Becauseg′(x) =e^x, the above formula says that ${\frac {d}{dy}}\ln y={\frac {1}{e^{\ln y}}}={\frac {1}{y}}.$

This formula is true wheneverg is differentiable and its inversef is also differentiable. This formula can fail when one of these conditions is not true. For example, considerg(x) =x³. Its inverse isf(y) =y^1/3, which is not differentiable at zero. If we attempt to use the above formula to compute the derivative off at zero, then we must evaluate1/g′(f(0)). Sincef(0) = 0 andg′(0) = 0, we must evaluate 1/0, which is undefined. Therefore, the formula fails in this case. This is not surprising becausef is not differentiable at zero.

Back propagation

[edit]

The chain rule forms the basis of theback propagation algorithm, which is used ingradient descent ofneural networks indeep learning (artificial intelligence).^[6]

Higher derivatives

[edit]

Faà di Bruno's formula generalizes the chain rule to higher derivatives. Assuming thaty =f(u) andu =g(x), then the first few derivatives are: ${\begin{aligned}{\frac {dy}{dx}}&={\frac {dy}{du}}{\frac {du}{dx}}\\{\frac {d^{2}y}{dx^{2}}}&={\frac {d^{2}y}{du^{2}}}\left({\frac {du}{dx}}\right)^{2}+{\frac {dy}{du}}{\frac {d^{2}u}{dx^{2}}}\\{\frac {d^{3}y}{dx^{3}}}&={\frac {d^{3}y}{du^{3}}}\left({\frac {du}{dx}}\right)^{3}+3\,{\frac {d^{2}y}{du^{2}}}{\frac {du}{dx}}{\frac {d^{2}u}{dx^{2}}}+{\frac {dy}{du}}{\frac {d^{3}u}{dx^{3}}}\\{\frac {d^{4}y}{dx^{4}}}&={\frac {d^{4}y}{du^{4}}}\left({\frac {du}{dx}}\right)^{4}+6\,{\frac {d^{3}y}{du^{3}}}\left({\frac {du}{dx}}\right)^{2}{\frac {d^{2}u}{dx^{2}}}+{\frac {d^{2}y}{du^{2}}}\left(4\,{\frac {du}{dx}}{\frac {d^{3}u}{dx^{3}}}+3\,\left({\frac {d^{2}u}{dx^{2}}}\right)^{2}\right)+{\frac {dy}{du}}{\frac {d^{4}u}{dx^{4}}}.\end{aligned}}$

Proofs

[edit]

First proof

[edit]

One proof of the chain rule begins by defining the derivative of the composite functionf ∘g, where we take thelimit of thedifference quotient forf ∘g asx approachesa: $(f\circ g)'(a)=\lim _{x\to a}{\frac {f(g(x))-f(g(a))}{x-a}}.$

Assume for the moment that $g(x)\!$ does not equal $g(a)$ for any $x {\displaystyle x}$ near $a {\displaystyle a}$ . Then the previous expression is equal to the product of two factors: $\lim _{x\to a}{\frac {f(g(x))-f(g(a))}{g(x)-g(a)}}\cdot {\frac {g(x)-g(a)}{x-a}}.$

If $g {\displaystyle g}$ oscillates neara, then it might happen that no matter how close one gets toa, there is always an even closerx such thatg(x) =g(a). For example, this happens neara = 0 for thecontinuous functiong defined byg(x) = 0 forx = 0 andg(x) =x² sin(1/x) otherwise. Whenever this happens, the above expression is undefined because it involvesdivision by zero. To work around this, introduce a function $Q {\displaystyle Q}$ as follows: $Q(y)={\begin{cases}\displaystyle {\frac {f(y)-f(g(a))}{y-g(a)}},&y\neq g(a),\\f'(g(a)),&y=g(a).\end{cases}}$ We will show that the difference quotient forf ∘g is always equal to: $Q(g(x))\cdot {\frac {g(x)-g(a)}{x-a}}.$

Wheneverg(x) is not equal tog(a), this is clear because the factors ofg(x) −g(a) cancel. Wheng(x) equalsg(a), then the difference quotient forf ∘g is zero becausef(g(x)) equalsf(g(a)), and the above product is zero because it equalsf′(g(a)) times zero. So the above product is always equal to the difference quotient, and to show that the derivative off ∘g ata exists and to determine its value, we need only show that the limit asx goes toa of the above product exists and determine its value.

To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors areQ(g(x)) and(g(x) −g(a)) / (x −a). The latter is the difference quotient forg ata, and becauseg is differentiable ata by assumption, its limit asx tends toa exists and equalsg′(a).

As forQ(g(x)), notice thatQ is defined whereverf is. Furthermore,f is differentiable atg(a) by assumption, soQ is continuous atg(a), by definition of the derivative. The functiong is continuous ata because it is differentiable ata, and thereforeQ ∘g is continuous ata. So its limit asx goes toa exists and equalsQ(g(a)), which isf′(g(a)).

This shows that the limits of both factors exist and that they equalf′(g(a)) andg′(a), respectively. Therefore, the derivative off ∘g ata exists and equalsf′(g(a))g′(a).

Second proof

[edit]

Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A functiong is differentiable ata if there exists a real numberg′(a) and a functionε(h) that tends to zero ash tends to zero, and furthermore $g(a+h)-g(a)=g'(a)h+\varepsilon (h)h.$ Here the left-hand side represents the true difference between the value ofg ata and ata +h, whereas the right-hand side represents the approximation determined by the derivative plus an error term.

In the situation of the chain rule, such a functionε exists becauseg is assumed to be differentiable ata. Again by assumption, a similar function also exists forf atg(a). Calling this functionη, we have $f(g(a)+k)-f(g(a))=f'(g(a))k+\eta (k)k.$ The above definition imposes no constraints onη(0), even though it is assumed thatη(k) tends to zero ask tends to zero. If we setη(0) = 0, thenη is continuous at 0.

Proving the theorem requires studying the differencef(g(a +h)) −f(g(a)) ash tends to zero. The first step is to substitute forg(a +h) using the definition of differentiability ofg ata: $f(g(a+h))-f(g(a))=f(g(a)+g'(a)h+\varepsilon (h)h)-f(g(a)).$ The next step is to use the definition of differentiability off atg(a). This requires a term of the formf(g(a) +k) for somek. In the above equation, the correctk varies withh. Setk_h =g′(a)h +ε(h)h and the right hand side becomesf(g(a) +k_h) −f(g(a)). Applying the definition of the derivative gives: $f(g(a)+k_{h})-f(g(a))=f'(g(a))k_{h}+\eta (k_{h})k_{h}.$ To study the behavior of this expression ash tends to zero, expandk_h. After regrouping the terms, the right-hand side becomes: $f'(g(a))g'(a)h+[f'(g(a))\varepsilon (h)+\eta (k_{h})g'(a)+\eta (k_{h})\varepsilon (h)]h.$ Becauseε(h) andη(k_h) tend to zero ash tends to zero, the first two bracketed terms tend to zero ash tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the differencef(g(a +h)) −f(g(a)), by the definition of the derivativef ∘g is differentiable ata and its derivative isf′(g(a))g′(a).

The role ofQ in the first proof is played byη in this proof. They are related by the equation: $Q(y)=f'(g(a))+\eta (y-g(a)).$ The need to defineQ atg(a) is analogous to the need to defineη at zero.

Third proof

[edit]

Constantin Carathéodory's alternative definition of the differentiability of a function can be used to give an elegant proof of the chain rule.^[7]

Under this definition, a functionf is differentiable at a pointa if and only if there is a functionq, continuous ata and such thatf(x) −f(a) =q(x)(x −a). There is at most one such function, and iff is differentiable ata thenf ′(a) =q(a).

Given the assumptions of the chain rule and the fact that differentiable functions and compositions of continuous functions are continuous, we have that there exist functionsq, continuous atg(a), andr, continuous ata, and such that, $f(g(x))-f(g(a))=q(g(x))(g(x)-g(a))$ and $g(x)-g(a)=r(x)(x-a).$ Therefore, $f(g(x))-f(g(a))=q(g(x))r(x)(x-a),$ but the function given byh(x) =q(g(x))r(x) is continuous ata, and we get, for thisa $(f(g(a)))'=q(g(a))r(a)=f'(g(a))g'(a).$ A similar approach works for continuously differentiable (vector-)functions of many variables. This method of factoring also allows a unified approach to stronger forms of differentiability, when the derivative is required to beLipschitz continuous,Hölder continuous, etc. Differentiation itself can be viewed as thepolynomial remainder theorem (the littleBézout theorem, or factor theorem), generalized to an appropriate class of functions.^{[citation needed]}

Multivariable case

[edit]

The full generalization of the chain rule tomulti-variable functions (such as $f\colon \mathbb {R} ^{m}\to \mathbb {R} ^{n}$ ) is rather technical. However, it is simpler to write in the case of functions of the form $f(g_{1}(x),\dots ,g_{k}(x)),$ where $f\colon \mathbb {R} ^{k}\to \mathbb {R}$ , and $g_{i}\colon \mathbb {R} \to \mathbb {R}$ for each $i=1,2,\dots ,k.$

As this case occurs often in the study of functions of a single variable, it is worth describing it separately.

Case of scalar-valued multivariate functions

[edit]

Let $f\colon \mathbb {R} ^{k}\to \mathbb {R}$ , and $g_{i}\colon \mathbb {R} \to \mathbb {R}$ for each $i=1,2,\dots ,k.$ To write the chain rule for the composition of functions $x\mapsto f(g_{1}(x),\dots ,g_{k}(x)),$ one needs thepartial derivatives off with respect to itsk arguments. The usual notations for partial derivatives involve names for the arguments of the function. As these arguments are not named in the above formula, it is simpler and clearer to useD-Notation, and to denote by $D_{i}f$ the partial derivative off with respect to itsith argument, and by $D_{i}f(z)$ the value of this derivative atz.

With this notation, the chain rule is ${\frac {d}{dx}}f(g_{1}(x),\dots ,g_{k}(x))=\sum _{i=1}^{k}\left({\frac {d}{dx}}{g_{i}}(x)\right)D_{i}f(g_{1}(x),\dots ,g_{k}(x)).$

Example: arithmetic operations

[edit]

If the functionf is addition, that is, if $f(u,v)=u+v,$ then ${\textstyle D_{1}f={\frac {\partial f}{\partial u}}=1}$ and ${\textstyle D_{2}f={\frac {\partial f}{\partial v}}=1}$ . Thus, the chain rule gives ${\frac {d}{dx}}(g(x)+h(x))=\left({\frac {d}{dx}}g(x)\right)D_{1}f+\left({\frac {d}{dx}}h(x)\right)D_{2}f={\frac {d}{dx}}g(x)+{\frac {d}{dx}}h(x).$

For multiplication $f(u,v)=uv,$ the partials are $D_{1}f=v$ and $D_{2}f=u$ . Thus, ${\frac {d}{dx}}(g(x)h(x))=h(x){\frac {d}{dx}}g(x)+g(x){\frac {d}{dx}}h(x).$

The case of exponentiation $f(u,v)=u^{v}$ is slightly more complicated, as $D_{1}f=vu^{v-1},$ and, as $u^{v}=e^{v\ln u},$ $D_{2}f=u^{v}\ln u.$ It follows that ${\frac {d}{dx}}\left(g(x)^{h(x)}\right)=h(x)g(x)^{h(x)-1}{\frac {d}{dx}}g(x)+g(x)^{h(x)}\ln g(x)\,{\frac {d}{dx}}h(x).$

General rule: Vector-valued multivariate functions

[edit]

The simplest way for writing the chain rule in the general case is to use thetotal derivative, which is a linear transformation that captures alldirectional derivatives in a single formula. Consider differentiable functionsf :R^m →R^k andg :Rⁿ →R^m, and a pointa inRⁿ. LetD_ag denote the total derivative ofg ata andD_g(a)f denote the total derivative off atg(a). These two derivatives are linear transformationsRⁿ →R^m andR^m →R^k, respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative off ∘g ata: $D_{\mathbf {a} }(f\circ g)=D_{g(\mathbf {a} )}f\circ D_{\mathbf {a} }g,$ or for short, $D(f\circ g)=Df\circ Dg.$ The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.^[8]

Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called aJacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says: $J_{f\circ g}(\mathbf {a} )=J_{f}(g(\mathbf {a} ))J_{g}(\mathbf {a} ),$ or for short, $J_{f\circ g}=(J_{f}\circ g)J_{g}.$

That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points).

The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. Ifk,m, andn are 1, so thatf :R →R andg :R →R, then the Jacobian matrices off andg are1 × 1. Specifically, they are: ${\begin{aligned}J_{g}(a)&={\begin{pmatrix}g'(a)\end{pmatrix}},\\J_{f}(g(a))&={\begin{pmatrix}f'(g(a))\end{pmatrix}}.\end{aligned}}$ The Jacobian off ∘g is the product of these1 × 1 matrices, so it isf′(g(a))⋅g′(a), as expected from the one-dimensional chain rule. In the language of linear transformations,D_a(g) is the function which scales a vector by a factor ofg′(a) andD_g(a)(f) is the function which scales a vector by a factor off′(g(a)). The chain rule says that the composite of these two linear transformations is the linear transformationD_a(f ∘g), and therefore it is the function that scales a vector byf′(g(a))⋅g′(a).

Another way of writing the chain rule is used whenf andg are expressed in terms of their components asy =f(u) = (f₁(u), …,f_k(u)) andu =g(x) = (g₁(x), …,g_m(x)). In this case, the above rule for Jacobian matrices is usually written as: ${\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (x_{1},\ldots ,x_{n})}}={\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (u_{1},\ldots ,u_{m})}}{\frac {\partial (u_{1},\ldots ,u_{m})}{\partial (x_{1},\ldots ,x_{n})}}.$

The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in thei-th coordinate direction is found by multiplying the Jacobian matrix by thei-th basis vector. By doing this to the formula above, we find: ${\frac {\partial (y_{1},\ldots ,y_{k})}{\partial x_{i}}}={\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (u_{1},\ldots ,u_{m})}}{\frac {\partial (u_{1},\ldots ,u_{m})}{\partial x_{i}}}.$ Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get: ${\frac {\partial (y_{1},\ldots ,y_{k})}{\partial x_{i}}}=\sum _{\ell =1}^{m}{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial u_{\ell }}}{\frac {\partial u_{\ell }}{\partial x_{i}}}.$ More conceptually, this rule expresses the fact that a change in thex_i direction may change all ofg₁ throughg_m, and any of these changes may affectf.

In the special case wherek = 1, so thatf is a real-valued function, then this formula simplifies even further: ${\frac {\partial y}{\partial x_{i}}}=\sum _{\ell =1}^{m}{\frac {\partial y}{\partial u_{\ell }}}{\frac {\partial u_{\ell }}{\partial x_{i}}}.$ This can be rewritten as adot product. Recalling thatu = (g₁, …,g_m), the partial derivative∂u / ∂x_i is also a vector, and the chain rule says that: ${\frac {\partial y}{\partial x_{i}}}=\nabla y\cdot {\frac {\partial \mathbf {u} }{\partial x_{i}}}.$

Example

[edit]

Givenu(x,y) =x² + 2y wherex(r,t) =r sin(t) andy(r,t) = sin²(t), determine the value of∂u / ∂r and∂u / ∂t using the chain rule.^{[citation needed]} ${\frac {\partial u}{\partial r}}={\frac {\partial u}{\partial x}}{\frac {\partial x}{\partial r}}+{\frac {\partial u}{\partial y}}{\frac {\partial y}{\partial r}}=(2x)(\sin(t))+(2)(0)=2r\sin ^{2}(t),$ and ${\begin{aligned}{\frac {\partial u}{\partial t}}&={\frac {\partial u}{\partial x}}{\frac {\partial x}{\partial t}}+{\frac {\partial u}{\partial y}}{\frac {\partial y}{\partial t}}\\&=(2x)(r\cos(t))+(2)(2\sin(t)\cos(t))\\&=(2r\sin(t))(r\cos(t))+4\sin(t)\cos(t)\\&=2(r^{2}+2)\sin(t)\cos(t)\\&=(r^{2}+2)\sin(2t).\end{aligned}}$

Higher derivatives of multivariable functions

[edit]

Main article:Faà di Bruno's formula § Multivariate version

Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. Ify =f(u) is a function ofu =g(x) as above, then the second derivative off ∘g is: ${\frac {\partial ^{2}y}{\partial x_{i}\partial x_{j}}}=\sum _{k}\left({\frac {\partial y}{\partial u_{k}}}{\frac {\partial ^{2}u_{k}}{\partial x_{i}\partial x_{j}}}\right)+\sum _{k,\ell }\left({\frac {\partial ^{2}y}{\partial u_{k}\partial u_{\ell }}}{\frac {\partial u_{k}}{\partial x_{i}}}{\frac {\partial u_{\ell }}{\partial x_{j}}}\right).$

Further generalizations

[edit]

All extensions of calculus have a chain rule. In most of these, the formula remains the same, though the meaning of that formula may be vastly different.

One generalization is tomanifolds. In this situation, the chain rule represents the fact that the derivative off ∘g is the composite of the derivative off and the derivative ofg. This theorem is an immediate consequence of the higher dimensional chain rule given above, and it has exactly the same formula.

The chain rule is also valid forFréchet derivatives inBanach spaces. The same formula holds as before.^[9] This case and the previous one admit a simultaneous generalization toBanach manifolds.

Indifferential algebra, the derivative is interpreted as a morphism of modules ofKähler differentials. Aring homomorphism ofcommutative ringsf :R →S determines a morphism of Kähler differentialsDf : Ω_R → Ω_S which sends an elementdr tod(f(r)), the exterior differential off(r). The formulaD(f ∘g) =Df ∘Dg holds in this context as well.

The common feature of these examples is that they are expressions of the idea that the derivative is part of afunctor. A functor is an operation on spaces and functions between them. It associates to each space a new space and to each function between two spaces a new function between the corresponding new spaces. In each of the above cases, the functor sends each space to itstangent bundle and it sends each function to its derivative. For example, in the manifold case, the derivative sends aC^r-manifold to aC^r−1-manifold (its tangent bundle) and aC^r-function to its total derivative. There is one requirement for this to be a functor, namely that the derivative of a composite must be the composite of the derivatives. This is exactly the formulaD(f ∘g) =Df ∘Dg.

There are also chain rules instochastic calculus. One of these,Itō's lemma, expresses the composite of an Itō process (or more generally asemimartingale)dX_t with a twice-differentiable functionf. In Itō's lemma, the derivative of the composite function depends not only ondX_t and the derivative off but also on the second derivative off. The dependence on the second derivative is a consequence of the non-zeroquadratic variation of the stochastic process, which broadly speaking means that the process can move up and down in a very rough way. This variant of the chain rule is not an example of a functor because the two functions being composed are of different types.

References

[edit]

^George F. Simmons,Calculus with Analytic Geometry (1985), p. 93.
^Simmons, George F. (1996).Calculus with Analytic Geometry. McGraw-Hill. p. 115. ISBN 9780070575295.Internet Archive
^Child, J. M. (1917)."THE MANUSCRIPTS OF LEIBNIZ ON HIS DISCOVERY OF THE DIFFERENTIAL CALCULUS. PART II (Continued)".The Monist.27 (3):411–454.doi:10.5840/monist191727324.ISSN 0026-9662.JSTOR 27900650.
^^a ^bRodríguez, Omar Hernández; López Fernández, Jorge M. (2010)."A Semiotic Reflection on the Didactics of the Chain Rule".The Mathematics Enthusiast.7 (2):321–332.doi:10.54870/1551-3440.1191.S2CID 29739148. Retrieved2019-08-04.
^Apostol, Tom (1974).Mathematical analysis (2nd ed.). Addison Wesley. Theorem 5.5.
^Goodfellow, Ian;Bengio, Yoshua; Courville, Aaron (2016),Deep learning, MIT, pp=197–217.
^Kuhn, Stephen (1991). "The Derivative á la Carathéodory".The American Mathematical Monthly.98 (1):40–44.doi:10.2307/2324035.JSTOR 2324035.
^Spivak, Michael (1965).Calculus on Manifolds. Boston: Addison-Wesley. pp. 19–20.ISBN 0-8053-9021-9.
^Cheney, Ward (2001). "The Chain Rule and Mean Value Theorems".Analysis for Applied Mathematics. New York: Springer. pp. 121–125.ISBN 0-387-95279-9.

External links

[edit]

"Leibniz rule",Encyclopedia of Mathematics,EMS Press, 2001 [1994]
Weisstein, Eric W."Chain Rule".MathWorld.

Calculus

Precalculus

Limits

Differential calculus

Integral calculus

Vector calculus

Derivatives
Basic theorems

Multivariable calculus

Sequences and series

Special functions
and numbers

History of calculus

Lists

Integrals	rational functions irrational algebraic functions exponential functions logarithmic functions hyperbolic functions inverse trigonometric functions inverse Secant Secant cubed
List of limits List of derivatives