Movatterモバイル変換

Visual temporal attention

From Wikipedia, the free encyclopedia

Video frames of theParallel Bars action category in the UCF-101 dataset^[1] (a) The highest ranking four frames invideo temporal attention weights, in which the athlete is performing on the parallel bars; (b) The lowest ranking four frames invideo temporal attention weights, in which the athlete is standing on the ground. All weights are predicted by the ATW CNN algorithm.^[2] The highly weighted video frames generally captures the most distinctive movements relevant to the action category.

Visual temporal attention is a special case ofvisual attention that involves directing attention to specific instant of time. Similar to its spatial counterpartvisual spatial attention, these attention modules have been widely implemented invideo analytics incomputer vision to provide enhanced performance and human interpretable explanation^[3] ofdeep learning models.

As visual spatial attention mechanism allows human and/orcomputer vision systems to focus more on semantically more substantial regions in space, visual temporal attention modules enablemachine learning algorithms to emphasize more on critical video frames invideo analytics tasks, such ashuman action recognition. Inconvolutional neural network-based systems, the prioritization introduced by the attention mechanism is regularly implemented as a linear weighting layer with parameters determined by labeled training data.^[3]

Application in Action Recognition

[edit]

ATW CNN architecture.^[4] Three CNN streams are used to process spatial RGB images, temporal optical flow images, and temporal warped optical flow images, respectively. An attention model is employed to assign temporal weights between snippets for each stream/modality. Weighted sum is used to fuse predictions from the three streams/modalities.

Recent video segmentation algorithms often exploits both spatial and temporal attention mechanisms.^[2]^[4] Research inhuman action recognition has accelerated significantly since the introduction of powerful tools such asConvolutional Neural Networks (CNNs). However, effective methods for incorporation of temporal information into CNNs are still being actively explored. Motivated by the popular recurrent attention models innatural language processing, the Attention-aware Temporal Weighted CNN (ATW CNN) is proposed^[4] in videos, which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is implemented as temporal weighting and it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized bystochastic gradient descent (SGD) withback-propagation. Experimental results show that the ATW CNN attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.

Literature

[edit]

Seibold VC, Balke J and Rolke B (2023):Temporal attention. Front. Cognit. 2:1168320. doi: 10.3389/fcogn.2023.1168320.

References

[edit]

^Center, UCF (2013-10-17)."UCF101 - Action Recognition Data Set".CRCV. Retrieved2018-09-12.
^^a ^bZang, Jinliang; Wang, Le; Liu, Ziyi; Zhang, Qilin; Hua, Gang; Zheng, Nanning (2018). "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition".IFIP Advances in Information and Communication Technology. Cham: Springer International Publishing. pp. 97–108.arXiv:1803.07179.doi:10.1007/978-3-319-92007-8_9.ISBN 978-3-319-92006-1.ISSN 1868-4238.S2CID 4058889.
^^a ^b"NIPS 2017".Interpretable ML Symposium. 2017-10-20. Retrieved2018-09-12.
^^a ^b ^cWang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-06-21)."Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network"(PDF).Sensors.18 (7). MDPI AG: 1979.Bibcode:2018Senso..18.1979W.doi:10.3390/s18071979.ISSN 1424-8220.PMC 6069475.PMID 29933555. Material was copied from this source, which is available under aCreative Commons Attribution 4.0 International License.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Visual_temporal_attention&oldid=1329658045"

Categories:

[8]ページ先頭

Movatterモバイル変換

Application in Action Recognition

Literature

See also

References