Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Architecture for a Parallel Focused Crawler for Clickstream Analysis

  • Conference paper

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 6591))

Included in the following conference series:

Abstract

The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s).

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ahmadi-Abkenari, F., Selamat, A.: Application of Clickstream Analysis in a Tailored Focused Web Crawler. Journal of Communications of SIWN,The Systemic and Informatics World Network (2010)

    Google Scholar 

  2. Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proceeding of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111 (1998)

    Google Scholar 

  3. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks 30(1-7), 107–117 (1998)

    Google Scholar 

  4. Chakrabarti, S.: Mining the Web. In: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  5. Chackrabarti, S.: Integrating Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction. In: Proceeding of the 13th international World Wide Web Conference (WWW 2001), pp. 211–220 (2001)

    Google Scholar 

  6. Chackrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Mining the Link Structure of the World Wide Web. IEEE Computer 32(8), 60–67 (1999)

    Article  Google Scholar 

  7. Chackrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In: Proceeding of the 7th international World Wide Web Conference, WWW 2007 (1998)

    Google Scholar 

  8. Chakrabarti, S., Van den Berg, M., Dom, B.: Focused Crawling: A New Approach to Topic Specific Web Resource Discovery. Computer Networks 31(11-16), 1623–1640 (1999)

    Article  Google Scholar 

  9. Cho, J., Garcia-Molina, H.: Parallel Crawlers. In: Procceding of 11th International Conference on World Wide Web. ACM Press, New York (2002)

    Google Scholar 

  10. Cho, J., Garcia-Molina, H., Page, L.: Efficient Crawling through URL Ordering. In: Procceeding of 7th international Conference on World Wide Web (1998)

    Google Scholar 

  11. Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., Gori, M.: Focused Crawling using Context Graph. In: Procceeding of the 26th VLDB Conference, Cairo, Egypt, pp. 527–534 (2000)

    Google Scholar 

  12. Giudici, P.: Applied Data Mining, Web Clickstream Analysis. ch.8, pp. 229–253. Wiley Press, Chichester (2003) ISBN: 0-470-84678-X

    Google Scholar 

  13. Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)

    Article MathSciNet MATH  Google Scholar 

  14. Liu, B.: Web Data Mining, Information Retrieval and Web Search. ch.6, pp. 183–215. Springer Press, Heidelberg (2007) ISBN: 3-540-37881-2

    Google Scholar 

  15. McCallum, A., Nigam, K.: A Comparison of Event Models for Naïve Baes Text Classification. In: Procceeding of the AAAI-1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  16. Menczer, F., Pant, G., Srinivasan, P.: Topical Web Crawlers: Evaluating Adaptive Algorithms. ACM Transactions on Internet Technology 4(4), 378–419 (2004)

    Article  Google Scholar 

  17. Selamat, A., Ahmadi-Abkenari, F.: Application of Clickstream Analysis as Web Page Importance Metric in Parallel Crawlers. In: Procceeding of the International Symposium on Information Technology (ITSIM 2010), Kuala Lumpur, Malaysia (2010)

    Google Scholar 

  18. Srivastava, A.N., Sahami, M.: Text Mining, Classification, Custering and Applications. CRC Press, Boca Raton (2009)

    Book MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Software Engineering Research Group, UTM Knowledge Economy Research Alliance & Software Engineering Department, Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia (UTM), 81310, UTM Johor Baharu Campus, Johor, Malaysia

    Ali Selamat & Fatemeh Ahmadi-Abkenari

Authors
  1. Ali Selamat

    You can also search for this author inPubMed Google Scholar

  2. Fatemeh Ahmadi-Abkenari

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. Wroclaw University of Technology, 50-370, Wroclaw, Poland

    Ngoc Thanh Nguyen

  2. Department of Computer Engineering, Yeungnam University, 712-749, Dae-Dong, Gyeungsan, Korea

    Chong-Gun Kim

  3. Institute of Informatics, Automation and Robotics, Wroclaw University of Technology, 50-370, Wrocław, Poland

    Adam Janiak

Rights and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Selamat, A., Ahmadi-Abkenari, F. (2011). Architecture for a Parallel Focused Crawler for Clickstream Analysis. In: Nguyen, N.T., Kim, CG., Janiak, A. (eds) Intelligent Information and Database Systems. ACIIDS 2011. Lecture Notes in Computer Science(), vol 6591. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20039-7_3

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp