archive.today (formerlyarchive.is) is aweb archiving website that savessnapshots on demand. It has support forJavaScript-heavy sites such asGoogle Maps andTwitter.[3] Archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is ascreenshot of the page.[4]
Archive.today was founded in 2012. The site originally branded itself as archive.today, but changed the primarymirror to archive.is in May 2015.[5] It began to deprecate the archive.is domain in favor of other mirrors in January 2019.[6]
In 2021, archive.today had saved about 500 million pages.[7]
Archive.today can capture individual pages in response to explicit user requests.[8][9][10] Since its beginning, it has supportedcrawling pages withURLs containing the now-deprecatedhash-bang fragment (#!).[11]
Archive.today records only text and images, excludingXML,RTF,spreadsheet (xls orods) and othernon-static content. However, videos for certain sites, likeTwitter, are saved.[12] It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page.[13][14]
Pages are captured at a browser width of 1,024 pixels.CSS is converted toinline CSS, removingresponsive web design and selectors such as:hover and:active. Content generated usingJavaScript during the crawling process appears in a frozen state.[15]HTML class names are preserved inside theold-classattribute.Whentext is selected, a JavaScript applet generates aURL fragment seen in the browser'saddress bar that automatically highlights that portion of the text when visited again.
Web pages can beduplicated from archive.today toweb.archive.org assecond-level backup, but archive.today does not save its snapshots inWARC format. The reverse—from web.archive.org to archive.today—is also possible,[16] but the copy usually takes more time than a direct capture. Historically, website owners had the option to opt out ofWayback Machine through the use of therobots exclusion standard (robots.txt), and these exclusions were also applied retroactively.[17] Archive.today does not obey robots.txt because it acts "as a direct agent of the human user."[10] As of 2019, theWayback Machine also no longer obeys robots.txt.
The research toolbar enables advanced keywords operators, using* as thewildcard character. A couple ofquotation marks address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas theinsite operator restricts it to a specific Internet domain.[18]
Once a web page is archived, it cannot be deleted directly by any Internet user.[19]
Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog.[20]
While saving adynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page).[21] The other web pages saved are filtered, and sometimes may be found by one of their occurrences.[13][clarification needed]
The search feature is backed by Google CustomSearch. If it delivers no results, archive.today attempts to utilizeYandex Search.[22]
While saving a page, a list of URLs for individual page elements and their content sizes,HTTP statuses andMIME types is shown. This list can only be viewed during the crawling process.[citation needed]
Users can download archived pages as a ZIP file, except pages archived since 29 November 2019,[update][23] when archive.today changed their browser engine fromPhantomJS toChromium (non-headless).[24]
In March 2019, the site was blocked for six months by several internet providers inAustralia andNew Zealand in the aftermath of theChristchurch mosque shootings in an attempt to limit distribution of the footage of the attack.[27][28]
According toGreatFire.org, archive.today has been blocked in mainland China since March 2016,[update][29] archive.li since September 2017,[update][30] archive.fo since July 2018,[update][31] as well as archive.ph since December 2019.[update][32]
On 21 July 2015, the operators blocked access to the service from allFinnishIP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government.[33]
Since May 2018[36][37]Cloudflare's1.1.1.1DNS service would not resolve archive.today's web addresses, making it inaccessible to users of the Cloudflare DNS service. Both organizations claimed the other was responsible for the issue. Cloudflare staff stated that the problem was on archive.today's DNS infrastructure, as itsauthoritative nameservers return invalid records when Cloudflare's network systems made requests to archive.today. archive.today countered that the issue was due to Cloudflare requests not being compliant with DNS standards, as Cloudflare does not sendEDNS Client Subnet information in its DNS requests.[38][39]