Movatterモバイル変換

Alexander Ypma¹⁰ &
Tom Heskes¹⁰

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 2703))

Included in the following conference series:

International Workshop on Mining Web Data for Discovering Usage Patterns and Profiles

457Accesses
47Citations

Abstract

We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumbersome) manual categorization. We provide an EM algorithm for training a mixture of HMMs and show that additional static user data can be incorporated easily to possibly enhance the labelling of users. Furthermore, we use prior knowledge to enhance generalization and avoid numerical problems. We use parameter tying to decrease the danger of overfitting and to reduce computational overhead. We put a flat prior on the parameters to deal with the problem that certain transitions between page categories occur very seldom or not at all, in order to ensure that a nonzero transition probability between these categories nonetheless remains. In applications to artificial data and real-world web logs we demonstrate the usefulness of our approach. We train a mixture of HMMs on artificial navigation patterns, and show that the correct model is being learned. Moreover, we show that the use of static ’satellite data’ may enhance the labeling of shorter navigation patterns. When applying a mixture of HMMs to real-world web logs from a large Dutch commercial web site, we demonstrate that sensible page categorizations are being learned.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Ensemble hidden Markov models with application to landmine detection

ArticleOpen access19 August 2015

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

ArticleOpen access07 June 2018

Model selection for mixture hidden Markov models: an application to clickstream data

ArticleOpen access19 October 2024

References

Cadez, I., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals. Technical report, Univ. Calif., Irvine (March 2000)
Google Scholar
Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of navigation patterns on a web site using model-based clustering. Technical report, Univ. Calif., Irvine (March 2000)
Google Scholar
Cooley, R.W.: Web usage mining: discovery and application of interesting patterns from web data. PhD thesis, University of Minnesota, USA (2000)
Google Scholar
Huberman, B.A., Pirolli, P.L.T., Pitkow, J.E., Lukose, R.M.: Strong regularities in world wide web surfing. Science 280, 95–97 (1998)
Article Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Learning in graphical models. Kluwer Academic Publishers, Dordrecht (1998)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proc. Of 9th ACM-SIAM Symposium on Discrete Algorithms (1998)
Google Scholar
Levene, M., Loizou, G.: Computing the entropy of user navigation in the web. Technical report, Department of Computer Science, University College London (1999)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285 (1989)
Article Google Scholar
Ramoni, M., Sebastiani, P., Cohen, P.: Bayesian clustering by dynamics. Machine learning, 91–121 (2002)
Google Scholar
Sarukkai, R.R.: Link prediction and path analysis using markov chains. In: Proceedings of the Ninth International World Wide Web Conference, Amsterdam (2000)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Smyth, P.: Clustering sequences with hidden markov models. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in NIPS 9 (1997)
Google Scholar
Smyth, P.: Probabilistic model-based clustering of multivariate and sequential data. In: Proc. of 7th Int. Workshop AI and Statistics, pp. 299–304 (1999)
Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2) (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

SNN, University of Nijmegen, Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands
Alexander Ypma & Tom Heskes

Authors

Alexander Ypma
View author publications
Search author on:PubMed Google Scholar
Tom Heskes
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

University of Alberta, Canada
Osmar R. Zaïane
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou
Data Miners Inc., 77 North Washington Street, MA 02114, Boston, USA
Brij Masand

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ypma, A., Heskes, T. (2003). Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds) WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles. WebKDD 2002. Lecture Notes in Computer Science(), vol 2703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39663-5_3

Download citation

DOI:https://doi.org/10.1007/978-3-540-39663-5_3
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-20304-9
Online ISBN:978-3-540-39663-5
eBook Packages:Springer Book Archive

Publish with us

Policies and ethics

Movatterモバイル変換

Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models

Abstract

Access this chapter

Preview

Similar content being viewed by others

Ensemble hidden Markov models with application to landmine detection

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Model selection for mixture hidden Markov models: an application to clickstream data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter