Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

WARC (file format)

From Wikipedia, the free encyclopedia
(Redirected fromWeb ARChive)
File format
Not to be confused withARC (file format) orWAR (file format).
For other uses, seeWeb archive (disambiguation).
Web ARChive
Filename extensions
warc
Internet media type
application/warc
Extended fromARC[1]
StandardISO 28500:2017[2]
Open format?Yes
Websiteiipc.github.io/warc-specifications/specifications/warc-format/warc-1.1-annotated/

TheWARC (Web ARChive)archive format specifies a method for combining multiple digital resources into an aggregatearchive file together with related information. These combined resources are saved as a WARCfile which can be replayed using appropriate software such asReplayWeb.page, or used by archive websites such as theWayback Machine.

The WARC format is a revision of theInternet Archive'sARC_IA File Format[3] that has traditionally been used to store "web crawls" as sequences of content blocks harvested from theWorld Wide Web. The WARC format generalizes the older format to better support the harvesting, access, and exchange needs of archiving organizations. Besides the primary content currently recorded, the revision accommodates related secondary content, such as assignedmetadata, abbreviated duplicate detection events (see §7.6 "revisit"), and later-date transformations.[4] The WARC format is inspired by HTTP/1.0 streams, with a similar header and the use of CRLFs as delimiters, making it very conducive to crawler implementations.

First specified in 2008,[5] WARC is now recognised by mostnational library systems as the standard to follow for web archiving,[6] though some have also started to listWACZ as an acceptable format.[7][8]

Software

[edit]

See also

[edit]

References

[edit]
  1. ^"Introduction".SourceForge. Archived fromthe original on 16 February 2015. Retrieved5 March 2015.
  2. ^"Information and documentation -- WARC file format". Retrieved16 March 2018.
  3. ^"ARC_IA, Internet Archive ARC file format".www.digitalpreservation.gov. 14 February 2008. Retrieved2015-05-09.
  4. ^"WARC, Web ARChive file format".www.digitalpreservation.gov. 31 August 2009. Retrieved2015-05-09.
  5. ^Arvidson, Allan; Kunze, John; Mohr, Gordon; Stack, Michael (5 July 2008)."The WARC File Format".IETF. Retrieved2021-04-29.
  6. ^Allegrezza, Stefano (21 April 2016)."Nuove prospettive per il Web archiving: Gli standard ISO 28500 (Formato WARC) e ISO/TR 14873 sulla qualità del Web archiving".Digitalia (in Italian).2015:49–61.
  7. ^"Web Archive Collection Zipped".www.loc.gov. 2023-05-19. Retrieved2025-03-28.
  8. ^"Preferred file formats".digitalpreservation.no. 2024-12-05. Retrieved2025-03-28.
  9. ^"ArchiveBox".ArchiveBox. Retrieved2025-03-06.
  10. ^"ArchiveWeb.page • Webrecorder".Webrecorder. 2025-01-10. Retrieved2025-03-28.
  11. ^"Frequently Asked Questions".Conifer User Guide. Retrieved2025-03-27.
  12. ^webrecorder/har2warc, Webrecorder, 2025-01-25, retrieved2025-03-28
  13. ^"User Guide - Replay Webpage Docs".replayweb.page. Retrieved2025-03-28.
  14. ^harvard-lil/scoop, Harvard Library Innovation Laboratory, 2025-03-26, retrieved2025-03-28
  15. ^Scrivano, Giuseppe (August 6, 2012)."GNU wget 1.14 released".GNU wget 1.14 released. Free Software Foundation, Inc. RetrievedFebruary 25, 2016.

External links

[edit]
Archiving only
Compressing only
Archiving
and compressing
Software packaging
and distributing
Document packaging
and distributing


Stub icon

ThisWorld Wide Web–related article is astub. You can help Wikipedia byadding missing information.

Retrieved from "https://en.wikipedia.org/w/index.php?title=WARC_(file_format)&oldid=1323065112"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp