Posted on • Originally published atcalebhearth.com on
Send a From Header When You Crawl
Sending a From header is part of building a polite crawler, along with respecting Robots.txt and sending a unique User-Agent. The From header simply contains an email address that can be used by the site’s owner to reach out if your bot is creating any issues for them.
RFC 9110 which describes HTTP Semantics says thata From header SHOULD be sent for robotic user-agents. MDN says that theFrom header “must” be sent in that circumstance, but doesn’t have a citation for a spec that defines that.
I’ve been sending From headers for a while now, and I’m even including it as a requirement in a Swift type-driven API client I’ve been using across a few projects. I’m probably sending it in more places than I need to or that it would be expected in, but as I use scraping and API requests toautomate my /now page, it seems prudent in case that causes issues forthe services I’m using.
Setting a From header is easy. Its syntax isFrom: <email>
. If you’re building any kind of scraper.
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse