A tool that crawl websites to find domain names and checks thier availiabity.
git clone https://github.com/twiny/spidy.gitcd ./spidy# buildgo build -o bin/spidy -v cmd/spidy/main.go# run./bin/spidy -c config/config.yaml -u https://github.com
NAME: Spidy - Domain name scraperUSAGE: spidy [global options]command [command options] [arguments...]VERSION: 2.0.0COMMANDS: help, h Shows a list of commands orhelpfor onecommandGLOBAL OPTIONS: --config path, -c path path to config file --help, -h showhelp (default: false) --urls urls, -u urls urls of page to scrape (accepts multiple inputs) --version, -v print the version (default: false)
# main crawler configcrawler:max_depth:10# max depth of pages to visit per website.# filter: [] # regexp filterrate_limit:"1/5s"# 1 request per 5 secmax_body_size:"20MB"# max page body sizeuser_agents:# array of user-agents -"Spidy/2.1; +https://github.com/ twiny/spidy"# proxies: [] # array of proxy. http(s), SOCKS5# Logslog:rotate:7# log rotationpath:"./log"# log directory# Storestore:ttl:"24h"# keep cache for 24hpath:"./store"# store directory# Resultsresult:path:./result# result directoryparralle:3# number of concurrent workerstimeout:"5m"# request timeouttlds:["biz", "cc", "com", "edu", "info", "net", "org", "tv"]# array of domain extension to check.
NOTE: This package is provided "as is" with no guarantee. Use it at your own risk and always test it yourself before using it in a production environment. If you find any issues, please create a new issue.