Defining Web pages, Web sites and Web captures

Posted on October 23, 2016 byVinay Goel

blog-thoughtbubble
The Internet Archive has been archiving the web for20 years and has preserved billions of webpages from millions of websites. These webpages are often made up of, and link to, many images, videos, style sheets, scripts and other web objects. Over the years, the Archive has saved over510 billion suchtime-stamped web objects,which we term web captures.

We define awebpage as avalid web capture that is an HTML document, a plain text document, or a PDF.

Adomain on the web is an owned section of the internet namespace, such as google.com or archive.org or bbc.co.uk. Ahost on the web is identified by afully qualified domain name or FQDNthat specifies its exact location in the tree hierarchy of the Domain Name System. The FQDN consists of the following parts: hostname and domain name. As an example, in case of the hostblog.archive.org, its hostname isblog and the host is located within the domainarchive.org.

We define awebsiteto be a host that has served webpages and has at least one incoming link from a webpage belonging to a different domain.

As of today, the Internet Archive officially holds273 billion webpages from over361 million websites,taking up 15 petabytes of storage.

4 thoughts on “Defining Web pages, Web sites and Web captures”

Mihai PintilieOctober 24, 2016 at 11:09 am
Good job guys! Interesting facts about archiving!
Pingback:Beta Wayback Machine – Now with Site Search! | Internet Archive Blogs
Pingback:WOW! New Beta Allows Users to Keyword Search a Limited Amount of Material in The Wayback Machine | LJ INFOdocket
Pingback:Internet Archive – Treasure | Web Search Guide and Internet News

Comments are closed.

Movatterモバイル変換

Internet Archive Blogs

A blog from the team at archive.org

Defining Web pages, Web sites and Web captures

4 thoughts on “Defining Web pages, Web sites and Web captures”

Latest Posts

Upcoming Events

Stereo Vision: The Art of 3-D Photography

Book Talk: Lucky Day with Chuck Tingle (IN-PERSON)

Recent Comments

Categories

Archives

Meta