Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms

  • Conference paper
Web Engineering(ICWE 2013)

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 7977))

Included in the following conference series:

  • 3775Accesses

Abstract

The logical hierarchies of Web sites (i.e. Web site taxonomies) are obvious to humans, because humans can distinguish different menu levels and their relationships. But such accurate information about the logical structure is not yet available to machines. Many applications would benefit if Web site taxonomies could be mined from menus, but it was an almost unsolvable problem in the past. While a tag newly introduced in HTML5 and novel mining methods allow to distinguish menus from other contents today, it has not yet been researched, how the underlying taxonomies can be extracted, given the menus. In this paper we present the first detailed analysis of the problem and introduce rule-based concepts for addressing each identified sub problem. We report on a large-scale study on mining hierarchical menus of 350 randomly selected domains. Our methods allow extracting Web site taxonomy information that was not available before with high precision and high recall.

Similar content being viewed by others

Keywords

References

  1. Morville, P., Rosenfeld, L.: Information architecture for the World Wide Web. O’Reilly, Sebastopol (2006)

    Google Scholar 

  2. Kalbach, J.: Designing Web navigation. O’Reilly, Sebastopol (2007)

    Google Scholar 

  3. Lin, S.-H., Chu, K.-P., Chiu, C.-M.: Automatic sitemaps generation: Exploring website structures using block extraction and hyperlink analysis. Expert Systems with Applications 38, 3944–3958 (2011)

    Article  Google Scholar 

  4. Yang, Q., Jiang, P., Zhang, C., Niu, Z.: Reconstruct Logical Hierarchical Sitemap for Related Entity Finding. In: Voorhees, E.M., Buckland, L.P. (eds.) The Nineteenth Text Retrieval Conf (TREC 2010). National Institute of Standards and Technology, NIST (2010)

    Google Scholar 

  5. Pavan Kumar, G.M., Leela, K.P., Parsana, M., Garg, S.: Learning website hierarchies for keyword enrichment in contextual advertising. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 425–434. ACM, Hong Kong (2011)

    Google Scholar 

  6. Amitay, E., Carmel, D., Darlow, A., Lempel, R., Soffer, A.: The connectivity sonar: detecting site functionality by structural patterns. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, pp. 38–47. ACM, Nottingham (2003)

    Chapter  Google Scholar 

  7. Keller, M., Nussbaumer, M.: MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques. In: Proceedings of the 21st Int’l. Conf. Companion on World Wide Web, pp. 1025–1034. ACM, Lyon (2012)

    Chapter  Google Scholar 

  8. Rossi, G., Schwabe, D., Lyardet, O., Puc-rio, D.D.I., MarquêS, R., Vicente, S.: Improving Web information systems with navigational patterns. Computer Networks 31 (1999)

    Google Scholar 

  9. Ceri, S., Fraternali, P., Bongio, A.: Web Modeling Language (WebML): a modeling language for designing Web sites. Computer Networks 33, 137–157 (2000)

    Article  Google Scholar 

  10. Schwabe, D., Rossi, G., Barbosa, S.D.J.: Systematic hypermedia application design with OOHDM. In: Proc. of the the Seventh ACM Conf. on Hypertext, pp. 116–128. ACM, Bethesda (1996)

    Chapter  Google Scholar 

  11. Koch, N., Knapp, A., Zhang, G., Baumeister, H.: Uml-Based Web Engineering. In: Rossi, G., Pastor, O., Schwabe, D., Olsina, L. (eds.) Web Engineering: Modelling and Implementing Web Applications, pp. 157–191. Springer London, London (2008)

    Chapter  Google Scholar 

  12. Jones, W.P., Furnas, G.W.: Pictures of relevance: a geometric analysis of similarity measures. J. Am. Soc. Inf. Sci. 38, 420–442 (1987)

    Article  Google Scholar 

  13. Ho, Q., Eisenstein, J., Xing, E.P.: Document hierarchies from text and links. In: Proceedings of the 21st International Conference on World Wide Web, pp. 739–748. ACM, Lyon (2012)

    Chapter  Google Scholar 

  14. Zheng, X., Gu, Y., Li, Y.: Data extraction from web pages based on structural-semantic entropy. In: Proc. of the 21st Int’l. Conf. Companion on World Wide Web, pp. 93–102. ACM, Lyon (2012)

    Chapter  Google Scholar 

  15. Bernardi, M., Di Lucca, G., Distante, D.: The RE-UWA approach to recover user centered conceptual models from Web applications. International Journal on Software Tools for Technology Transfer 11, 485–501 (2009)

    Article  Google Scholar 

  16. Yang, C.C., Liu, N.: Web site topic-hierarchy generation based on link structure. J. Am. Soc. Inf. Sci. Technol. 60, 495–508 (2009)

    Article  Google Scholar 

  17. Kumar, R., Punera, K., Tomkins, A.: Hierarchical topic segmentation of websites. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 257–266. ACM, Philadelphia (2006)

    Chapter  Google Scholar 

  18. Cheung, W.K., Sun, Y.: Identifying a hierarchy of bipartite subgraphs for web site abstraction. Web Intelli. and Agent Sys. 5, 343–355 (2007)

    Google Scholar 

  19. Bose, A., Beemanapalli, K., Srivastava, J., Sahar, S.: Incorporating concept hierarchies into usage mining based recommendations. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds.) WebKDD 2006. LNCS (LNAI), vol. 4811, pp. 110–126. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Wang, C., Lu, J., Zhang, G.: Mining key information of web pages: A method and its application. Expert Syst. Appl. 33, 425–433 (2007)

    Article MathSciNet  Google Scholar 

  21. Liu, Z., Ng, W.K., Lim, E.-P.: An Automated Algorithm for Extracting Website Skeleton. In: Lee, Y., Li, J., Whang, K.-Y., Lee, D. (eds.) DASFAA 2004. LNCS, vol. 2973, pp. 799–811. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  22. Keller, M., Nussbaumer, M.: Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs. In: Proc. of the 2011 Int’l. Conf. on Emerging Intelligent Data and Web Technologies, pp. 99–106. IEEE Computer Society, Tirana (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Steinbuch Centre for Computing, Karlsruhe Institute of Technology, D-76128, Karlsruhe, Germany

    Matthias Keller & Hannes Hartenstein

Authors
  1. Matthias Keller

    You can also search for this author inPubMed Google Scholar

  2. Hannes Hartenstein

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. University of Trento, Via Sommarive 5, 38123, Povo, TN, Italy

    Florian Daniel

  2. Department of Computer Science, Aalborg University, Selma Lagerloefs Vej 300, 9220, Aalborg, Denmark

    Peter Dolog

  3. Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong, China

    Qing Li

Rights and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Keller, M., Hartenstein, H. (2013). Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms. In: Daniel, F., Dolog, P., Li, Q. (eds) Web Engineering. ICWE 2013. Lecture Notes in Computer Science, vol 7977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39200-9_23

Download citation

Publish with us


[8]ページ先頭

©2009-2025 Movatter.jp