Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 7675))
Included in the following conference series:
1244Accesses
Abstract
This paper addresses the problem of classifying documents using the kernel approaches based on topic sequences. Previously, the string kernel uses the ordered subsequence of characters as features and the word sequence kernel is proposed to use words as the subsequences. However, they both face the problem of computational complexity because of the large amount of symbols (characters or words). This paper, therefore, proposes to use sequences of topics rather than characters or words to reduce the number of symbols, thus increasing the computational efficiency. Documents that exhibit similar posterior topic proportions are expected to have similar topic sequence and then should be classified into the same category. Experiments conducted on the Reuters-21578 datasets have proven this hypothesis.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Joachims, T.: Text Categorization with Support Vector Machines. Technical report, LS VIII NO. 23. University of Dortmund (1997)
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 130–136 (1997)
Wang, J.Y.: Application of Support Vector Machines in Bioinformatics. Master’s thesis, Dept. Computer Sci. Info. Eng., National Taiwan University (2002)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. The Journal of Machine Learning Research 2, 419–444 (2002)
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. Journal of Machine Learning Research 3, 1059–1082 (2003)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society London (A) 209, 415–446 (1909)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Blei, D., Lafferty, J.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
Blei, D., Lafferty, J.: A correlated topic model of science. Annals of Applied Statistics 1(1), 17–35 (2007)
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Uncertainty in Artificial Intelligence, UAI (2002)
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum (2006)
Teh, Y., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Neural Information Processing Systems (2006)
Author information
Authors and Affiliations
Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Jian Xu, Qin Lu, Zhengzhong Liu & Junyi Chai
- Jian Xu
You can also search for this author inPubMed Google Scholar
- Qin Lu
You can also search for this author inPubMed Google Scholar
- Zhengzhong Liu
You can also search for this author inPubMed Google Scholar
- Junyi Chai
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
School of computer Science and Technology, Tianjin University, Tianjin, 300072, China
Yuexian Hou
DIRO, University of Montreal, CP. 6128, succursale Centre-ville, H3C 3J7, Montreal, QC, Canada
Jian-Yun Nie
Institute of Software, Storage & Information Retrieval Laboratory, Chinese Academy of Sciences, 100190, Beijing, China
Le Sun
School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Bo Wang
School of Computing, Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Peng Zhang
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, J., Lu, Q., Liu, Z., Chai, J. (2012). Topic Sequence Kernel. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_41
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-35340-6
Online ISBN:978-3-642-35341-3
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative