| Web ARChive | |
|---|---|
| Filename extensions | warc |
| Internet media type | application/warc |
| Extended from | ARC[1] |
| Standard | ISO 28500:2017[2] |
| Open format? | Yes |
| Website | iipc |
TheWARC (Web ARChive)archive format specifies a method for combining multiple digital resources into an aggregatearchive file together with related information. These combined resources are saved as a WARCfile which can be replayed using appropriate software such asReplayWeb.page, or used by archive websites such as theWayback Machine.
The WARC format is a revision of theInternet Archive'sARC_IA File Format[3] that has traditionally been used to store "web crawls" as sequences of content blocks harvested from theWorld Wide Web. The WARC format generalizes the older format to better support the harvesting, access, and exchange needs of archiving organizations. Besides the primary content currently recorded, the revision accommodates related secondary content, such as assignedmetadata, abbreviated duplicate detection events (see §7.6 "revisit"), and later-date transformations.[4] The WARC format is inspired by HTTP/1.0 streams, with a similar header and the use of CRLFs as delimiters, making it very conducive to crawler implementations.
First specified in 2008,[5] WARC is now recognised by mostnational library systems as the standard to follow for web archiving,[6] though some have also started to listWACZ as an acceptable format.[7][8]
ThisWorld Wide Web–related article is astub. You can help Wikipedia byadding missing information. |