Movatterモバイル変換

[0]ホーム

Jump to content

Link rot

Edit links

From Wikipedia, the free encyclopedia

URLs ceasing to function

"Broken Link" redirects here. For theStar Trek: Deep Space Nine episode, seeBroken Link (Star Trek: Deep Space Nine).

For link rot on Wikipedia, seeWikipedia:Link rot.

Page Not Found — A broken link usually leads to an error message.

Link rot (also calledlink death,link breaking, orreference rot) is the phenomenon ofhyperlinks tending over time to cease to point to their originally targetedfile,web page, orserver due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer resolves at the intended target may be calledbroken ordead.

The rate of link rot is a subject of study and research due to its significance to theinternet's ability to preserve information. Estimates of that rate vary dramatically between studies. Information professionals have warned that link rot could make important archival data disappear, potentially impacting the legal system and scholarship.

Prevalence

[edit]

A number of studies have examined the prevalence of link rot within theWorld Wide Web, in academic literature that usesURLs to cite web content, and withindigital libraries.

In a 2023 study of theMillion Dollar Homepage external links, it was found that 27% of the links resulted in a site loading with no redirects, 45% of links have been redirected, and 28% returned various error messages.^[1]

A 2002 study suggested that link rot within digital libraries is considerably slower than on the web. The article found that about 3% of the objects were no longer accessible after one year,^[2] equating to ahalf-life of nearly 23 years.

A 2003 study found that on the Web, about one link out of every 200 broke each week,^[3] suggesting ahalf-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links inYahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.^[4]

A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institutions) could have dramatically different half-lives.^[5] The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,^[6] generally confirming a 2005 study that found that half of theURLs cited inD-Lib Magazine articles were active 10 years after publication.^[7] Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.^[8]^[9] A 2013 study inBMC Bioinformatics analyzed nearly 15,000 links in abstracts from Thomson Reuters'sWeb of Science citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.^[10] A 2021 study of external links inNew York Times articles published between 1996 and 2019 found a half-life of about 15 years (with significant variance among content topics) but noted that 13% of functional links no longer lead to the original content—a phenomenon calledcontent drift.^[11]

A 2013 study found that 49% of links in U.S. Supreme Court opinions are dead.^[12]

A 2023 study looking at United StatesCOVID-19 dashboards found that 23% of the state dashboards available in February 2021 were no longer available at the previous URLs in April 2023.^[13]

Pew Research found that, in 2023, 38% of pages from 2013 went missing. Also, in 2023, 54% ofEnglish Wikipedia articles had a dead link in the 'references' section and 23% ofnews articles linked to a dead URL.^[14]

Causes

[edit]

Link rot can result for several reasons. A target web page may be removed. The server that hosts the target page could fail, be removed from service, or relocate to a newdomain name. As far back as 1999, it was noted that with the amount of material that can be stored on a hard drive, "asingle disk failure could be like the burning of thelibrary at Alexandria."^[15] A domain name's registration may lapse or be transferred to another party. Some causes will result in the link failing to find any target and returning an error such asHTTP 404. Other causes will cause a link to target content other than that which was intended by the link's author.

Prevention and detection

[edit]

This sectionneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this section. Unsourced material may be challenged and removed.(May 2024) (Learn how and when to remove this message)

Strategies for preventing link rot can focus on placing content where its likelihood of persisting is higher, authoring links that are less likely to be broken, taking steps to preserve existing links, or repairing links whose targets have been relocated or removed.^{[citation needed]}

The creation of URLs that will not change with time is the fundamental method of preventing link rot. Preventive planning has been championed byTim Berners-Lee and other web pioneers.^[16]

Strategies pertaining to the authorship of links include:

linking to primary rather than secondary sources^{[citation needed]} and prioritizing stable sites^[5]
avoiding links that point to resources on researchers' personal pages^[7]
usingclean URLs or otherwise employingURL normalization orURL canonicalization^[17]
usingpermalinks andpersistent identifiers such as ARKs,DOIs, Handle System references,PURLs,^{[citation needed]} orcontent addressing^[18]
avoiding linking to documents other than web pages^[17]
avoidingdeep linking^{[citation needed]}
linking toweb archives such as theInternet Archive,^[19]WebCite,^[20]archive.today,Perma.cc,^[21] Amber,^[22] or Arweave^[23]

Strategies pertaining to the protection of existing links include:

usingredirection mechanisms such asHTTP 301 to automatically refer browsers and crawlers to relocated content.^{[citation needed]}
usingcontent management systems which can automatically update links when content within the same site is relocated or automatically replace links with canonical URLs^[24]
integrating search resources intoHTTP 404 pages^[25]

The detection of broken links may be done manually or automatically. Automated methods includeplug-ins forcontent management systems as well as standalone broken-link checkers such as likeXenu's Link Sleuth. Automatic checking may not detect links that return asoft 404 or links that return a200 OK response but point to content that has changed.^[26]