- Notifications
You must be signed in to change notification settings - Fork537
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
License
hakluke/hakrawler
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Fast golang web crawler for gathering URLs and JavaScript file locations. This is basically a simple implementation of the awesome Gocolly library.
Single URL:
echo https://google.com | hakrawlerMultiple URLs:
cat urls.txt | hakrawlerTimeout for each line of stdin after 5 seconds:
cat urls.txt | hakrawler -timeout 5Send all requests through a proxy:
cat urls.txt | hakrawler -proxy http://localhost:8080Include subdomains:
echo https://google.com | hakrawler -subsNote: a common issue is that the tool returns no URLs. This usually happens when a domain is specified (https://example.com), but it redirects to a subdomain (https://www.example.com). The subdomain is not included in the scope, so the no URLs are printed. In order to overcome this, either specify the final URL in the redirect chain or use the
-subsoption to include subdomains.
Get all subdomains of google, find the ones that respond to http(s), crawl them all.
echo google.com | haktrails subdomains | httpx | hakrawlerFirst, you'll need toinstall go.
Then run this command to download + compile hakrawler:
go install github.com/hakluke/hakrawler@latestYou can now run~/go/bin/hakrawler. If you'd like to just runhakrawler without the full path, you'll need toexport PATH="~/go/bin/:$PATH". You can also add this line to your~/.bashrc file if you'd like this to persist.
echo https://www.google.com | docker run --rm -i hakluke/hakrawler:v2 -subsIt's much easier to use the dockerhub method above, but if you'd prefer to run it locally:
git clone https://github.com/hakluke/hakrawlercd hakrawlersudo docker build -t hakluke/hakrawler .sudo docker run --rm -i hakluke/hakrawler --helpNote: This will install an older version of hakrawler without all the features, and it may be buggy. I recommend using one of the other methods.
sudo apt install hakrawler
Then, to run hakrawler:
echo https://www.google.com | docker run --rm -i hakluke/hakrawler -subsUsage of hakrawler: -d int Depth to crawl. (default 2) -dr Disable following HTTP redirects. -h string Custom headers separated by two semi-colons. E.g. -h "Cookie: foo=bar;;Referer: http://example.com/" -iOnly crawl inside path -insecure Disable TLS verification. -json Output as JSON. -proxy string Proxy URL. E.g. -proxy http://127.0.0.1:8080 -sShow the source of URL based on where it was found. E.g. href, form, script, etc. -size int Page size limit, in KB. (default -1) -subs Include subdomains for crawling. -t int Number of threads to utilise. (default 8) -timeout int Maximum time to crawl each URL from stdin, in seconds. (default -1) -uShow only unique urls. -wShow at which link the URL is found.About
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.