- Notifications
You must be signed in to change notification settings - Fork0
Elegant Scraper and Crawler Framework for Golang
License
fresh8/colly
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
- Clean API
- Fast (>1k request/sec on a single core)
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync/async/parallel scraping
- Caching
- Automatic encoding of non-unicode responses
- Robots.txt support
- Distributed scraping
- Configuration via environment variables
- Extensions
funcmain() {c:=colly.NewCollector()// Find and visit all linksc.OnHTML("a[href]",func(e*colly.HTMLElement) {e.Request.Visit(e.Attr("href"))})c.OnRequest(func(r*colly.Request) {fmt.Println("Visiting",r.URL)})c.Visit("http://go-colly.org/")}
Seeexamples folder for more detailed examples.
Add colly to yourgo.mod
file:
module github.com/x/ygo 1.14require ( github.com/fresh8/colly latest)
Bugs or suggestions? Visit theissue tracker or join#colly
on freenode
Below is a list of public, open source projects that use Colly:
- greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
- altsab/gowap Wappalyzer implementation in Go.
- jesuiscamille/goquotes A quotes scrapper, making your day a little better!
- jivesearch/jivesearch A search engine that doesn't track you.
- Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
- lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
- yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.
- gamedb/gamedb A database of Steam games.
- lawzava/scrape CLI for email scraping from any website.
- eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scrapper
- Go-phie/gophie Search, Download and Stream movies from your terminal
- imthaghost/goclone Clone websites to your computer within seconds.
- superiss/spidy Crawl the web and collect expired domains.
- docker-slim/docker-slim Optimize your Docker containers to make them smaller and better.
- seversky/gachifinder an agent for asynchronous scraping, parsing and writing to some storages(elasticsearch for now)
- eval-exec/goodreads crawl all tags and all pages of quotes from goodreads.
If you are using Colly in a project please send a pull request to add it to the list.
This project exists thanks to all the people who contribute.[Contribute].
Thank you to all our backers! 🙏 [Become a backer]
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]
About
Elegant Scraper and Crawler Framework for Golang
Resources
License
Stars
Watchers
Forks
Packages0
Languages
- Go99.4%
- HTML0.6%