Movatterモバイル変換

Gated recurrent unit

From Wikipedia, the free encyclopedia

Memory unit used in neural networks

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences AAAI ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Inartificial neural networks, thegated recurrent unit (GRU) is agating mechanism used inrecurrent neural networks, introduced in 2014 byKyunghyun Cho et al.^[1] The GRU is like along short-term memory (LSTM) with a gating mechanism to input or forget certain features,^[2] but lacks a context vector or output gate, resulting in fewer parameters than LSTM.^[3] GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.^[4]^[5] GRUs showed that gating is indeed helpful in general, andBengio's team came to no concrete conclusion on which of the two gating units was better.^[6]^[7]

Architecture

[edit]

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.^[8]

In the following, the operator $\odot$ denotes theHadamard product.

Fully gated unit

[edit]

Gated Recurrent Unit, fully gated version

Initially, for $t=0$ , the output vector is $h_{0}=0$ .

{\begin{aligned}z_{t}&=\sigma (W_{z}x_{t}+U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma (W_{r}x_{t}+U_{r}h_{t-1}+b_{r})\\{\hat {h}}_{t}&=\phi (W_{h}x_{t}+U_{h}(r_{t}\odot h_{t-1})+b_{h})\\h_{t}&=(1-z_{t})\odot h_{t-1}+z_{t}\odot {\hat {h}}_{t}\end{aligned}}

Variables ( $d {\displaystyle d}$ denotes the number of input features and $e {\displaystyle e}$ the number of output features):

$x_{t}\in \mathbb {R} ^{d}$ : input vector
$h_{t}\in \mathbb {R} ^{e}$ : output vector
${\hat {h}}_{t}\in \mathbb {R} ^{e}$ : candidate activation vector
$z_{t}\in (0,1)^{e}$ : update gate vector
$r_{t}\in (0,1)^{e}$ : reset gate vector
$W\in \mathbb {R} ^{e\times d}$ , $U\in \mathbb {R} ^{e\times e}$ and $b\in \mathbb {R} ^{e}$ : parameter matrices and vector which need to be learned during training

Activation functions

Alternative activation functions are possible, provided that $\sigma (x)\in [0,1]$ .

Alternate forms can be created by changing $z_{t}$ and $r_{t}$ ^[9]

Type 1: each gate depends only on the previous hidden state and the bias.
${\begin{aligned}z_{t}&=\sigma (U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma (U_{r}h_{t-1}+b_{r})\\\end{aligned}}$
Type 2: each gate depends only on the previous hidden state.
${\begin{aligned}z_{t}&=\sigma (U_{z}h_{t-1})\\r_{t}&=\sigma (U_{r}h_{t-1})\\\end{aligned}}$
Type 3: each gate is computed using only the bias.
${\begin{aligned}z_{t}&=\sigma (b_{z})\\r_{t}&=\sigma (b_{r})\\\end{aligned}}$

Minimal gated unit

[edit]

The minimal gated unit (MGU) is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed:^[10]

{\begin{aligned}f_{t}&=\sigma (W_{f}x_{t}+U_{f}h_{t-1}+b_{f})\\{\hat {h}}_{t}&=\phi (W_{h}x_{t}+U_{h}(f_{t}\odot h_{t-1})+b_{h})\\h_{t}&=(1-f_{t})\odot h_{t-1}+f_{t}\odot {\hat {h}}_{t}\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
${\hat {h}}_{t}$ : candidate activation vector
$f_{t}$ : forget vector
$W {\displaystyle W}$ , $U {\displaystyle U}$ and $b {\displaystyle b}$ : parameter matrices and vector

Light gated recurrent unit

[edit]

The light gated recurrent unit (LiGRU)^[4] removes the reset gate altogether, replacestanh with theReLU activation, and appliesbatch normalization (BN):

{\begin{aligned}z_{t}&=\sigma (\operatorname {BN} (W_{z}x_{t})+U_{z}h_{t-1})\\{\tilde {h}}_{t}&=\operatorname {ReLU} (\operatorname {BN} (W_{h}x_{t})+U_{h}h_{t-1})\\h_{t}&=z_{t}\odot h_{t-1}+(1-z_{t})\odot {\tilde {h}}_{t}\end{aligned}}

LiGRU has been studied from a Bayesian perspective.^[11] This analysis yielded a variant called light Bayesian recurrent unit (LiBRU), which showed slight improvements over the LiGRU onspeech recognition tasks.

References

[edit]

^Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation".Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP):1724–1734.arXiv:1406.1078.doi:10.3115/v1/D14-1179.
^Felix Gers;Jürgen Schmidhuber; Fred Cummins (1999). "Learning to forget: Continual prediction with LSTM".9th International Conference on Artificial Neural Networks: ICANN '99. Vol. 1999. pp. 850–855.doi:10.1049/cp:19991218.ISBN 0-85296-721-7.
^"Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML".Wildml.com. 2015-10-27. Archived fromthe original on 2021-11-10. RetrievedMay 18, 2016.
^^a ^bRavanelli, Mirco; Brakel, Philemon; Omologo, Maurizio;Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition".IEEE Transactions on Emerging Topics in Computational Intelligence.2 (2):92–102.arXiv:1803.10225.Bibcode:2018ITECI...2...92R.doi:10.1109/TETCI.2017.2762739.S2CID 4402991.
^Su, Yuahang; Kuo, Jay (2019). "On extended long short-term memory and dependent bidirectional recurrent neural network".Neurocomputing.356:151–161.arXiv:1803.01686.doi:10.1016/j.neucom.2019.04.044.S2CID 3675055.
^Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling".arXiv:1412.3555 [cs.NE].
^Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?",Frontiers in Artificial Intelligence,3 40,doi:10.3389/frai.2020.00040,PMC 7861254,PMID 33733157,S2CID 220252321
^Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling".arXiv:1412.3555 [cs.NE].
^Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks".arXiv:1701.05923 [cs.NE].
^Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks".arXiv:1701.03452 [cs.NE].
^Bittar, Alexandre; Garner, Philip N. (May 2021)."A Bayesian Interpretation of the Light Gated Recurrent Unit".ICASSP 2021. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, ON, Canada: IEEE. pp. 2965–2969. 10.1109/ICASSP39728.2021.9414259.

Artificial intelligence (AI)

Concepts

Applications

Implementations

Audio–visual	AlexNet WaveNet Human image synthesis HWR OCR Computer vision Speech synthesis 15.ai ElevenLabs Speech recognition Whisper Facial recognition AlphaFold Text-to-image models Aurora DALL-E Firefly Flux GPT Image Ideogram Imagen Midjourney Recraft Stable Diffusion Text-to-video models Dream Machine Runway Gen Hailuo AI Kling Sora Veo Music generation Riffusion Suno AI Udio
Text	Word2vec Seq2seq GloVe BERT T5 Llama Chinchilla AI PaLM GPT 1 2 3 J ChatGPT 4 4o o1 o3 4.5 4.1 o4-mini 5 5.1 5.2 Claude Gemini Gemini (language model) Gemma Grok LaMDA BLOOM DBRX Project Debater IBM Watson IBM Watsonx Granite PanGu-Σ DeepSeek Qwen
Decisional	AlphaGo AlphaZero OpenAI Five Self-driving car MuZero Action selection AutoGPT Robot control

People

Architectures

Political

Social and economic

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Gated_recurrent_unit&oldid=1324038139"

Categories:

Hidden categories:

[8]ページ先頭