- Zixi Jia ORCID:orcid.org/0000-0002-6110-92401,
- Jiao Li1,
- Zhengjun Du2,
- Jingyu Ru1,
- Yating Wang1,
- Chengdong Wu1,
- Yutong Zhang1,
- Shuangjiang Yu1,
- Zhou Wang1,
- Changsheng Sun1 &
- …
- Ao Lyu1
387Accesses
1Altmetric
Abstract
Semanticsentence matching is a crucial task of natural language processing. However, semantic sentence matching is mainly used in text domain. For video clip and mixing, it explored less. Existing methods mainly focus on mapping text and video into latent spaces in video clip and mixing, but their extractor lack the ability to get effective information. So, we present aM ultiF eatureF usion semantic sentence matching model (MFF), which forms the double filtering. The double filtering is designed for filtering to the similar semantic fragments in video clip and mixing, reducing the burden of heavy manual video editing. Experiments are conducted on two datasets, namely, SNLI and Quora Question Pairs, to verify that MFF can significantly improve the accuracy. Results show that MMF improves the performance of SNLI and Quora Question Pairs datasets to 75.3% and 76.7% (accuracy), respectively.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences, pp 4144–4150
Ghaeini R, Hasan S A, Datla V, Liu J, Lee K, Qadir A, Ling Y, Prakash A, Fern X, Farri O (2018) Dr-bilstm: Dependent reading bidirectional lstm for natural language inference, pp 1460– 1469
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. Association for Computational Linguistics, pp 670–680
Álvarez-Carmona M A, Franco-Salvador M, Villatoro-Tello E, Montes-y Gómez M, Rosso P, Villaseñor-Pineda L (2018) Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J Intell Fuzzy Syst 34(5):2983–2990
Choi E, Hewlett D, Uszkoreit J, Polosukhin I, Lacoste A, Berant J (2017) Coarse-to-fine question answering for long documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 209–220
Shen D, Min M R, Li Y, Carin L (2020) Learning context-aware convolutional filters for text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP 2018. Association for Computational Linguistics, pp 1839–1848
Shen T, Zhou T, Long G, Jiang J, Wang S, Zhang C (2018) Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling, pp 4345–4352
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Bowman S R, Angeli G, Potts C, Manning C D (2015) A large annotated corpus for learning natural language inference. Association for Computational Linguistics (ACL), pp 632–642
Pennington J, Socher R, Manning C D (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Citeseer, pp 1532–1543
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge, pp 2406–2417
Tan C, Wei F, Wang W, Lv W, Zhou M (2018) Multiway attention networks for modeling sentence pairs.. In: IJCAI, pp 4411–4417
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Li P, Yu H, Zhang W, Xu G, Sun X (2020) Sa-nli: A supervised attention based framework for natural language inference. Neurocomputing 407:72–82
Wang Z, Yan D (2021) Sentence matching with deep self-attention and co-attention features. In: International conference on knowledge science, engineering and management. Springer, pp 550–561
Liu M, Zhang Y, Xu J, Chen Y (2021) Deep bi-directional interaction network for sentence matching. Appl Intell:1–25
Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences, vol 27, pp 2042–2050
Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 373–382
Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272
Meng L, Li Y, Liu M, Shu P (2016) Skipping word: A character-sequential representation based framework for question answering. In: Proceedings of the 25th acm international on conference on information and knowledge management, pp 1869–1872
Dauphin Y N, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941
Dai Z, Yang Z, Yang Y, Carbonell J G, Le Q, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context, pp 2978–2988
Wei X, Zhang T, Li Y, Zhang Y, Wu F (2020) Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10941–10950
Vu N T, Adel H, Gupta P, et al. (2016) Combining recurrent and convolutional neural networks for relation classification. In: Proceedings of NAACL-HLT, pp 534–539
Bai Q, Wu Y, Zhou J, He L (2021) Aligned variational autoencoder for matching danmaku and video storylines. Neurocomputing 454:228–237
Chen J, Hu H, Wu H, Jiang Y, Wang C (2021) Learning the best pooling strategy for visual semantic embedding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15789–15798
Chen Z, Ma L, Luo W, Tang P, Wong K-Y K (2020) Look closer to ground better: weakly-supervised temporal grounding of sentence in video
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J (2013) Distributed representations of words and phrases and their compositionality, vol 26, pp 3111–3119
Schuster M, Paliwal K K (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 6586–6593
Shen D, Zhang Y, Henao R, Su Q, Carin L (2018) Deconvolutional latent-variable model for text sequence matching. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Bromley J, Bentz J. W., Bottou L, Guyon I, LeCun Y, Moore C, Säckinger E, Shah R (1993) Signature verification using a ’siamese’ time delay neural network. Int J Pattern Recogn Artif Intell 7(04):669–688
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
Acknowledgements
This research was funded by Shandong Province Major Science and Technology Innovation Project Grants 2019 JZZY0101128; the National Natural Science Foundation of China (61872073); the Fundamental Research Funds for the Central Universities (N2126005, N2126002); National Natural Science Foundation of Liaoning (2021-MS-101).
Author information
Authors and Affiliations
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, 110819, China
Zixi Jia, Jiao Li, Jingyu Ru, Yating Wang, Chengdong Wu, Yutong Zhang, Shuangjiang Yu, Zhou Wang, Changsheng Sun & Ao Lyu
SIASUN Robot Automation CO. Ltd., Shenyang, 110819, China
Zhengjun Du
- Zixi Jia
You can also search for this author inPubMed Google Scholar
- Jiao Li
You can also search for this author inPubMed Google Scholar
- Zhengjun Du
You can also search for this author inPubMed Google Scholar
- Jingyu Ru
You can also search for this author inPubMed Google Scholar
- Yating Wang
You can also search for this author inPubMed Google Scholar
- Chengdong Wu
You can also search for this author inPubMed Google Scholar
- Yutong Zhang
You can also search for this author inPubMed Google Scholar
- Shuangjiang Yu
You can also search for this author inPubMed Google Scholar
- Zhou Wang
You can also search for this author inPubMed Google Scholar
- Changsheng Sun
You can also search for this author inPubMed Google Scholar
- Ao Lyu
You can also search for this author inPubMed Google Scholar
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jia, Z., Li, J., Du, Z.et al. Automatic video clip and mixing based on semantic sentence matching.Appl Intell53, 2133–2146 (2023). https://doi.org/10.1007/s10489-022-03226-8
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative