1229Accesses
7Citations
Abstract
Recent advancement in learning and teaching methodology experimented with virtual reality (VR)-based presentation form to create immersive learning and training environment. The quality of such educational VR applications not only relies on the virtual model, but the 2D presentation materials such as text, diagrams and figures. However, manual designing or seeking these educational resources is both labor intensive and time-consuming. In this paper, we introduce a new automatic algorithm to detect and extract presentation slides in educational videos, which will provide abundant resources for creating slide-based immersive presentation environment. The proposed approach mainly involves five core components: shot boundary detection, training instances collection, shot classification, slide region detection and slide transition detection. We conducted comparison experiment to evaluate the performance of the proposed method. The results indicate that, in comparison with peer method, the proposed method improves the precision of slide detection from 81.6 to 92.6% and recall from 74.7 to 86.3% on average. With the detected slides, content analyzer can be employed to further extract reusable elements, which can be used for developing VR-based educational applications.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Change history
15 May 2024
This article has been retracted. Please see the Retraction Notice for more detail:https://doi.org/10.1007/s00521-024-09984-5
References
Franziska P, Wittstock V, Lorenz M, Riedel T (2013) Immersive presentations: enabling engaging virtual reality based training and teaching by merging slide-based and vr-based elements. In: 5th international conference on changeable, agile, reconfigurable and virtual production (CARV 2013), Springer, pp 125–130
Price CB (2008) Unreal powerpoint: immersing powerpoint presentations in a virtual computer game engine world. Comput Hum Behav 24(6):2486–2495
Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on learning@ scale conference, ACM, pp 21–30
Krishnan SS, Sitaraman RK (2013) Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Trans Netw 21(6):2001–2014
Matejka J, Grossman T, Fitzmaurice G (2012) Swift: reducing the effects of latency in online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 637–646
Matejka J, Grossman T, Fitzmaurice G (2013) Swifter: improved online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 1159–1168
Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. In: ACM transactions on graphics (TOG), ACM, vol 5, pp 862–871
Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936
Mei T, Yang B, Yang S-Q, Hua X-S (2009) Video collage: presenting a video sequence using a single image. Vis Comput 25(1):39–51
Adjeroh D, Lee MC, Banda N (2009) Adaptive edge-oriented shot boundary detection. EURASIP J Image Video Process 2009(1):1
Yoo H-W, Ryoo H-J, Jang D-S (2006) Gradual shot boundary detection using localized edge blocks. Multimed Tools Appl 28(3):283–300
Li W-K, Lai S-H (2003) Integrated video shot segmentation algorithm. In: Electronic imaging 2003, international society for optics and photonics, pp 264–271
Zhe-Ming L, Shi Y (2013) Fast video shot boundary detection based on svd and pattern matching. IEEE Trans Image Process 22(12):5136–5145
Boreczky J, Girgensohn A, Golovchinsky G, Uchihashi S (2000) An interactive comic book presentation for exploring video. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 185–192
Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In: IEEE international conference on multimedia and expo, 2004. ICME’04, IEEE, vol 3, pp 2059–2062
Teodosio L, Bender W (2005) Salient stills. ACM Trans Multimed Comput Commun Appl (TOMM) 1(1):16–36
Jing G, Yongtao H, Guo Y, Yizhou Y, Wang W (2015) Content-aware video2comics with manga-style layout. IEEE Trans Multimed 17(12):2122–2133
Chen Y-N, Huang Y, Kong S-Y, Lee L-S (2010) Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken language technology workshop (SLT), 2010 IEEE, pp 265–270
Balasubramanian V, Doraisamy SG, Kanakarajan NK (2016) A multimodal approach for extracting content descriptive metadata from lecture videos. J Intell Inf Syst 46(1):121–145
Haubold A (2004) Analysis and visualization of index words from audio transcripts of instructional videos. In: Proceedings of IEEE sixth international symposium on multimedia software engineering, 2004, pp 570–573. IEEE
Haubold A, Kender JR (2005) Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on multimedia, ACM, pp 51–60
Zhao B, Xu S, Lin S, Luo X, Duan L (2015) A new visual navigation system for exploring biomedical open educational resource (OER) videos. J Am Med Inf Assoc 23:e34
Xiangyu W, Ramanathan S, Kankanhalli M (2009) A robust framework for aligning lecture slides with video. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 249–252
Schroth G, Cheung N-M, Steinbach E, Girod B (2011) Synchronization of presentation slides and lecture videos using bit rate sequences. In: 2011 18th IEEE international conference on image processing, IEEE, pp 925–928
Kao JL, Chen SY, Duh DJ (2013) Detecting handwritten annotation by synchronization of lecture slides and videos. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), pp 1. The steering committee of the world congress in computer science, computer engineering and applied computing (WorldComp)
Adcock J, Cooper M, Denoue L, Pirsiavash H, Rowe LA (2010) Talkminer: a lecture webcast search engine. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 241–250
Tuna T, Subhlok J, Barker L, Varghese V, Johnson O, Shah S (2012) Development and evaluation of indexed captioned searchable videos for stem coursework. In: Proceedings of the 43rd ACM technical symposium on computer science education, ACM, pp 129–134
Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7:142–154
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121
Baltru T, Robinson P, Morency L-P, et al (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1–10
Smith R, Antonova D, Lee D-S (2009) Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the international workshop on multilingual OCR, ACM, pp 1
Khan R, Van de Weijer J, Shahbaz KF, Muselet D, Ducottet C, Barat C (2013) Discriminative color descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2866–2873
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol 1, pp 886–893
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3360–3367
Jeong HJ, Kim T-E, Kim HG, Kim MH (2015) Automatic detection of slide transitions in lecture videos. Multimed Tools Appl 74(18):7537–7554
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No.61572531, 61232011, 61502546, 61402546), Science and Technology Planning Project of Zhongshan (No. 2016A1044).
Author information
Authors and Affiliations
National Engineering Research Center of Digital Life, School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Baoquan Zhao, Xin Qi & Ruomei Wang
School of Communication and Design, Sun Yat-sen University, Guangzhou, 510006, China
Shujin Lin
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
Xiaonan Luo
- Baoquan Zhao
You can also search for this author inPubMed Google Scholar
- Shujin Lin
You can also search for this author inPubMed Google Scholar
- Xin Qi
You can also search for this author inPubMed Google Scholar
- Ruomei Wang
You can also search for this author inPubMed Google Scholar
- Xiaonan Luo
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXiaonan Luo.
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Zhao, B., Lin, S., Qi, X.et al. RETRACTED ARTICLE: A novel approach to automatic detection of presentation slides in educational videos.Neural Comput & Applic29, 1369–1382 (2018). https://doi.org/10.1007/s00521-017-3276-1
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative