Commit7501ee0

committed

fix a typo

fix dark mode

1 parent3d8f6d9 commit7501ee0Copy full SHA for 7501ee0

File tree

1 file changed

+12

-13

lines changed

content/11-machine-learning
- neural-net-derivation.md

1 file changed

+12

-13

lines changed

`‎content/11-machine-learning/neural-net-derivation.md‎`

Lines changed: 12 additions & 13 deletions

Original file line number	Diff line number	Diff line change
`@@ -10,7 +10,7 @@ and activation function, $g(\xi)$.`
`10`	`10`
`11`	`11`	`Let's start with our cost function:`
`12`	`12`
`13`		`-$$\mathcal{L}(A_{ij}) = \sum_{i=1}^{N_\mathrm{out}} (z_i - y_i^k)^2 = \sum_{i=1}^{N_\mathrm{out}}`
	`13`	`+$$\mathcal{L}(A_{ij}) = \sum_{i=1}^{N_\mathrm{out}} (z_i - y_i^k)^2 = \sum_{i=1}^{N_\mathrm{out}}`
`14`	`14`	`\Biggl [ g\biggl (\underbrace{\sum_{j=1}^{N_\mathrm{in}} A_{ij} x^k_j}_{\equiv \alpha_i} \biggr ) - y^k_i \Biggr ]^2$$`
`15`	`15`
`16`	`16`	`where we'll refer to the product ${\boldsymbol \alpha} \equiv {\bf`
`@@ -21,16 +21,16 @@ element, $A_{pq}$ by applying the chain rule:`
`21`	`21`
`22`	`22`	`$$\frac{\partial \mathcal{L}}{\partial A_{pq}} =`
`23`	`23`	`2 \sum_{i=1}^{N_\mathrm{out}} (z_i - y^k_i) \left . \frac{\partial g}{\partial \xi} \right \|_{\xi=\alpha_i} \frac{\partial \alpha_i}{\partial A_{pq}}$$`
`24`		`-`
	`24`	`+`
`25`	`25`
`26`	`26`	`with`
`27`	`27`
`28`	`28`	`$$\frac{\partial \alpha_i}{\partial A_{pq}} = \sum_{j=1}^{N_\mathrm{in}} \frac{\partial A_{ij}}{\partial A_{pq}} x^k_j = \sum_{j=1}^{N_\mathrm{in}} \delta_{ip} \delta_{jq} x^k_j = \delta_{ip} x^k_q$$`
`29`	`29`
`30`	`30`	`and for $g(\xi)$, we will assume the sigmoid function,so`
`31`	`31`
`32`		`-$$\frac{\partial g}{\partial \xi}`
`33`		`- = \frac{\partial}{\partial \xi} \frac{1}{1 + e^{-\xi}}`
	`32`	`+$$\frac{\partial g}{\partial \xi}`
	`33`	`+ = \frac{\partial}{\partial \xi} \frac{1}{1 + e^{-\xi}}`
`34`	`34`	`=- (1 + e^{-\xi})^{-2} (- e^{-\xi})`
`35`	`35`	`= g(\xi) \frac{e^{-\xi}}{1+ e^{-\xi}} = g(\xi) (1 - g(\xi))$$`
`36`	`36`
`@@ -41,7 +41,7 @@ which gives us:`
`41`	`41`	`(z_i - y^k_i) z_i (1 - z_i) \delta_{ip} x^k_q\\`
`42`	`42`	`&= 2 (z_p - y^k_p) z_p (1- z_p) x^k_q`
`43`	`43`	`\end{align*}`
`44`		`-`
	`44`	`+`
`45`	`45`	`where we used the fact that the $\delta_{ip}$ means that only a single term contributes to the sum.`
`46`	`46`
`47`	`47`	```{note}
`@@ -57,7 +57,7 @@ Observe that:`
`57`	`57`
`58`	`58`	`Now ${\bf z}$ and ${\bf y}^k$ are all vectors of size $N_\mathrm{out} \times 1$ and ${\bf x}^k$ is a vector of size $N_\mathrm{in} \times 1$, so we can write this expression for the matrix as a whole as:`
`59`	`59`
`60`		`-$$\frac{\partialf}{\partial {\bf A}} = 2 ({\bf z} - {\bf y}^k) \circ {\bf z} \circ (1 - {\bf z}) \cdot ({\bf x}^k)^\intercal$$`
	`60`	`+$$\frac{\partial\mathcal{L}}{\partial {\bf A}} = 2 ({\bf z} - {\bf y}^k) \circ {\bf z} \circ (1 - {\bf z}) \cdot ({\bf x}^k)^\intercal$$`
`61`	`61`
`62`	`62`	`where the operator $\circ$ represents_element-by-element_ multiplication (the[Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices))).`
`63`	`63`
`@@ -76,19 +76,18 @@ descent suggests, scaled by a _learning rate_ $\eta$.`
`76`	`76`
`77`	`77`	`The overall minimization appears as:`
`78`	`78`
`79`		`-<divstyle="border:solid;padding:10px;width:80%;margin:0auto;background:#eeeeee">`
	`79`	+```{card} Minimization
`80`	`80`	`* Loop over epochs`
`81`	`81`
`82`	`82`	`* Loop over the training data, $\{ ({\bf x}^0, {\bf y}^0), ({\bf x}^1, {\bf y}^1), \ldots \}$. We'll refer to the current training`
`83`	`83`	`pair as $({\bf x}^k, {\bf y}^k)$`
`84`		`-`
	`84`	`+`
`85`	`85`	`* Propagate ${\bf x}^k$ through the network, getting the output`
`86`	`86`	${\bf z} = g({\bf A x}^k)$
`87`		`-`
	`87`	`+`
`88`	`88`	`* Compute the error on the output layer, ${\bf e}^k = {\bf z} - {\bf y}^k$`
`89`		`-`
	`89`	`+`
`90`	`90`	`* Update the matrix ${\bf A}$ according to:`
`91`		`-`
`92`		`-$${\bf A} \leftarrow {\bf A} - 2 \,\eta\, {\bf e}^k \circ {\bf z} \circ (1 - {\bf z}) \cdot ({\bf x}^k)^\intercal$$`
`93`		`-</div>`
`94`	`91`
	`92`	`+ $${\bf A} \leftarrow {\bf A} - 2 \,\eta\, {\bf e}^k \circ {\bf z} \circ (1 - {\bf z}) \cdot ({\bf x}^k)^\intercal$$`
	`93`	+```

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit7501ee0

File tree

1 file changed

1 file changed

`‎content/11-machine-learning/neural-net-derivation.md‎`

0 commit comments