Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 2728))
Included in the following conference series:
1257Accesses
Abstract
We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea ofmodal keywords, which are keywords that representperceptual concepts in the following categories:visual (e.g., sky),aural (e.g., scream),olfactory (e.g., vanilla),tactile (e.g., feather), andtaste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Y.A. Aslandogan, et al. “Using semantic contents and WordNet in image retrieval,”ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, 1997
Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A., “The Stanford Digital Library Metadata Architecture,”Int. J. Digit. Libraries 1, pp. 108–121, 1997
B. Adams, A. Amir, C. Dorai, S. Ghoshal, G. Iyengar, A. Jaimes, C. Lang, C. Lin, A. Natsev, M. Naphade, C. Neti, H. Permuter, R. Singh, J. Smith, S. Srinivasan, B. Tseng, T. Ashwin, D. Zhang, “IBM Research TREC 2002 Video Retrieval System,” inproceedings NIST TREC 2002, Nov., 2002
K. Barnard and D.A. Forsyth, “Learning the semantics of words and pictures”, inProc. IEEE International Conference on Computer Vision, pp. 408–415, July, 2001
A.B. Benitez, J.R. Smith, and S.-F. Chang, “MediaNet: A Multimedia Information Network for Knowledge Representation,” IS&T/SPIE-2000, Vol. 4210, Boston, MA, Nov. 2000
B. Benitez and S.-F. Chang, “Semantic Knowledge Construction from Annotated Image Collections,”Proc. IEEE ICME-2002, Lausanne, Switzerland, Aug 26–29, 2002
Bhandari, et al. “Computer program product and method for using natural language for the description, search, and retrieval of multi-media objects,” U.S.Pat. 5,895,464, April 1999
Bosco et al. “Context and Multimedia Corpora,”Context’ 01, 3rdInternational and Interdisciplinary Conference on Modeling and Using Context, Dundee, Scotland, July 2001
M. Dastani et al., “Modeling Context Effect in Perceptual Domains,”3rdInternational and Interdisciplinary Conference on Modeling and Using Context, Dundee, Scotland, July 2001
M.J. Denber, “Computer program product for retrieving multi-media objects using natural language having a pronoun,” U.S. Patent 6,233,547, May 2001
N. Dimitrova,Expert Panel Statement, in Borko Furht and Oge Marques edsHandbook of Video Databases, CRC Press, March 2003 (to appear)
J. Durkin,Expert Systems: Design and Development, Prentice Hall, Englewood Cliffs, NJ, 1994
E.j. Guglielmo, et al., “Natural-language retrieval of images based on descriptive captions,”ACM Transactions on Information Systems 14(3), pp. 237–267, July 1996
N. Guarino, “Formal Ontology and Information Systems,” inproc. FOIS’ 98, Trento, Italy, July 1998
S. Harnad, “The Symbol Grounding Problem,” Physica D 42: 335–346, 1990
Jaimes and J.R. Smith, “Semi-automatic, Data-Driven Construction of Multimedia Ontologies,” ICME 2003, Baltimore, MD, 2003
Jaimes. Conceptual Structures and Techniques for Indexing and Organization of Visual Information. Ph.D. thesis, Electrical Engineering Department, Columbia U., February 2003
Jaimes, M. Naphade, H. Nock, J.R. Smith, and B.L. Tseng, “Context Enhanced Video Understanding,”SPIE Storage and Media Databases 2003, Santa Clara, CA, January 2003
IBM Alphaworks (Video Annex and ABLE:http://www.alphaworks.ibm.com)
Informedia project (http://www.informedia.cs.cmu.edu)
Internet Movie Archive (http://www.moviearchive.org)
L. Khan and D. McLeod, “Audio Structuring and Personalzied Retrieval Using Ontologies,” inproc. IEEE Advances in Digital Libraries (ADL 2000), Washington, DC, May 2000
Library of Congress Thesaurus for Graphic Materials I (TGM I), 1995. (http://www.loc.gov/rr/print/tgm1/)
Maedche, et al., “Seal-tying up information integration and web site management by ontologies,”IEEE Data Engineering Bulletin, Vol. 25, March 2002
Miller, George A. “WordNet: a lexical database for English.” In:Communications of the ACM 38(11), November 1995, pp. 39–41
S. Mukherjea et al. “Method and Aparathus for assigning keywords to objects,” U.S. Patent 6,317,740, Nov., 2001
E.d.S. Moreira, “Embedded Video Content and Context Awareness,” Proceedings of the CHI 2000 Workshop on “The What, Who, Where, When, Why and How of Context-Awareness,” The Hague, Netherlands, 2000
M. McKeown, et al., “Ruled-based interpretation of aerial imagery,”IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 7, No.5, pp. 570–585, 1985
M. Naphade, I.V. Kozintsev, and T.S. Huang, “Factor graph framework for semantic video indexing, ”IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12 No. 1, pp. 40–52, Jan. 2002
Natsev, M. Naphade, and J.R. Smith, “Exploring Semantic Dependencies for Scalable Concept Detection,”IEEE ICIP 2003, Barcelona, Spain, October 2003
N.C. Rowe, “Marie-4: A High-Recall, Self-Improving Web Crawler That Finds Images Using Captions,”IEEE Intelligent Systems, July/August, 2002
D.K. Roy, “Learning Visually-Grounded Words and Syntax for a Scene Description Task,”submitted to Computer Speech and Language, 2002
A.F. Smeaton, and I. Quigley, “Experiments on using semantic distances between words in image caption retrieval,” InProceedings of the 19th Annual International Conference on Research and Development in Information Retrieval, pp. 174–180, Zürich, 1996
A.Th. Schreiber, et al., “Ontology-Based Photo Annotation,”IEEE Intelligent Systems, May–June 2001
http://www.semanticweb.org
N. Shiotani and S. Miyamoto, “Image Retrieval System Using an Iconic Thesaurus,” in Proc.IEEE Int. Conf. On Intelligent Processing Systems, Oct. 1997
S. Russell and P. Norvig,Artificial Intelligence: A Modern Approach, Prentice Hall, Englewood Cliffs, N.J., 1995
G. Salton and M. J. McGill.Introduction to Modern Information Retrieval, McGraw Hill Computer Science Series, New York, 1983
R. Tansley, “The Multimedia Thesaurus: An Aid for Multimedia Retrieval and Navigation,” Master Thesis, Computer Science, University of Southhampton, UK, 1998
TREC Video Retrieval Track (http://www-nlpir.nist.gov/projects/trecvid/)
B.L. Tseng, C. Lin, M. Naphade, A. Natsev, J.R. Smith, “Normalized Classifier Fusion for Semantic Visual Detection,” IEEE International Conference on Image Processing, 2003
J. Yang, et al., “Thesaurus-Aided Approach for Image Browsing and Retreival,” inproc. IEEE ICME 2001, pp. 313–316, Tokyo, Japan, Aug. 2001
Weissman., et al. “Meaning-based information organization and retrieval,” United States Patent 6,453,315, Sept. 2002
Author information
Authors and Affiliations
Pervasive Media Management, IBM T.J. Watson Research Center, Hawthorne, NY, 10532, USA
Alejandro Jaimes, Belle L. Tseng & John R. Smith
- Alejandro Jaimes
You can also search for this author inPubMed Google Scholar
- Belle L. Tseng
You can also search for this author inPubMed Google Scholar
- John R. Smith
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
LIACS Media Lab, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Erwin M. Bakker & Michael S. Lew &
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 N. Mathews Avenue, Urbana, IL, 61801, USA
Thomas S. Huang
University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Nicu Sebe
Siemens Corporate Research, 755 College Road East, Princeton, NJ, 08540, USA
Xiang Sean Zhou
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jaimes, A., Tseng, B.L., Smith, J.R. (2003). Modal Keywords, Ontologies, and Reasoning for Video Understanding. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_25
Download citation
Published:
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-40634-1
Online ISBN:978-3-540-45113-6
eBook Packages:Springer Book Archive
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative