- Notifications
You must be signed in to change notification settings - Fork0
xchengyu/Web_Crawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Simple web crawler
1 . Objective
I work with a simple web crawler to measure aspects of a crawl, study thecharacteristics of the crawl, download web pages from the crawl and gather webpage metadata, allfrom pre-selected news websites.
2 . Preliminaries
To begin I will make use of an existing open source Java web crawler called crawler4j. Thiscrawler is built upon the open source crawler4j library which is located on github. For completedetails on downloading and compiling seehttps://github.com/yasserg/crawler4jAlso see the following document for help installing Eclipse and crawler4jhttp://www-scf.usc.edu/~csci572/2017Spring/hw2/Crawler4jinstallation.pdf