Movatterモバイル変換


[0]ホーム

URL:


US20060112089A1 - Methods and apparatus for assessing web page decay - Google Patents

Methods and apparatus for assessing web page decay
Download PDF

Info

Publication number
US20060112089A1
US20060112089A1US10/995,770US99577004AUS2006112089A1US 20060112089 A1US20060112089 A1US 20060112089A1US 99577004 AUS99577004 AUS 99577004AUS 2006112089 A1US2006112089 A1US 2006112089A1
Authority
US
United States
Prior art keywords
web page
hyperlink
assessing
web
testing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/995,770
Inventor
Andrei Broder
Ziv Bar-Yossef
Shanmagasundaram Ravikumar
Andrew Tomkins
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines CorpfiledCriticalInternational Business Machines Corp
Priority to US10/995,770priorityCriticalpatent/US20060112089A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATIONreassignmentINTERNATIONAL BUSINESS MACHINES CORPORATIONASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: BAR-YOSSEF, ZIV, BRODER, ANDREI ZARY, RAVIKUMAR, SHANMAGASUNDARAM, TOMKINS, ANDREW
Publication of US20060112089A1publicationCriticalpatent/US20060112089A1/en
Priority to US11/955,458prioritypatent/US7818312B2/en
Priority to US11/955,481prioritypatent/US20080097978A1/en
Priority to US11/955,471prioritypatent/US20080097977A1/en
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined.

Description

Claims (54)

1. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform operations for assessing the currency of a web page, the operations comprising:
establishing a date threshold, wherein web pages older than the date threshold will be assessed at not being current;
accessing a web page;
extracting date information from the web page identifying the age of the web page; and
comparing the date information extracted from the web page to the date threshold.
2. The signal-bearing medium ofclaim 1 further comprising:
identifying the web page as lacking currency if the date information identifying the age of the web page is older than the date threshold.
3. The signal-bearing medium ofclaim 1 further comprising:
identifying the web page as being current if the date information identifying the age of the web page is younger than the date threshold.
4. A signal-bearing medium tangibly embodying a program of machine-readable executable by a digital processing apparatus of a computer system to perform operations for assessing the currency of a web page, the operations comprising:
receiving a user-specified topicality threshold, where the topicality threshold concerns the topicality of material content of the web page;
accessing a web page;
extracting topicality information from the web page; and
comparing the topicality information extracted from the web page to the topicality threshold.
5. The signal-bearing medium ofclaim 4 further comprising:
identifying the web page as lacking currency if the topicality information extracted from the web page lack topicality when compared to the topicality threshold.
6. The signal-bearing medium ofclaim 4 further comprising:
identifying the web page as being current if the topicality information extracted from the web page is topical when compared to the topicality threshold.
7. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform operations for assessing the currency of a web page, the operations comprising:
establishing a link threshold, wherein a web page will be assessed as lacking currency if a percentage of hyperlinks contained in the web page that link to an active page is less than the link threshold;
accessing a web page containing hyperlinks;
testing the hyperlinks;
calculating the percentage of hyperlinks that return active web pages; and
comparing the percentage of hyperlinks that return active web pages with the link threshold.
8. The signal-bearing medium ofclaim 7 where the operations further comprise:
identifying the web page as lacking currency if the percentage of hyperlinks that return active web pages is less than the link threshold.
9. The signal-bearing medium ofclaim 7 where the operations further comprise:
identifying the web page as being current if the percentage of hyperlinks that return active web pages is greater than the link threshold.
10. The signal-bearing medium ofclaim 7 where testing the hyperlinks further comprises:
establishing a time out limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the time out limit is exceeded;
selecting a hyperlink; and
monitoring the elapsed time until a web page is returned, if at all.
11. The signal-bearing medium ofclaim 10 where testing the hyperlinks further comprises:
assessing the hyperlink as linking to a dead page if the time out limit is exceeded.
12. The signal-bearing medium ofclaim 7 where testing the hyperlinks further comprises:
selecting a hyperlink; and
assessing the hyperlink as linking to a dead page based on an HTTP code returned in response to an HTTP request targeting the selected hyperlink.
13. The signal-bearing medium ofclaim 7 where testing the hyperlinks further comprises:
establishing a redirect limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the redirect limit is exceeded;
selecting a hyperlink; and
monitoring the number of redirects before the desired web page is returned, if at all.
14. The signal-bearing medium ofclaim 13 where testing the hyperlinks further comprises:
assessing the hyperlink as linking to a dead page if the redirect limit is exceeded.
15. The signal-bearing medium ofclaim 7 where testing the hyperlinks further comprises:
selecting a first hyperlink;
saving the web page returned in response to the selection of the first hyperlink;
formulating a web page request to a host of the first hyperlink, where the web page request is of a form that will not return an active web page with a high degree of probability; and
issuing the web page request.
16. The signal-bearing medium ofclaim 15 where testing the hyperlinks further comprises:
assessing the first hyperlink as linking to an active web page if an HTTP error code is returned in response to the web page request.
17. The signal-bearing medium ofclaim 15 where testing the hyperlinks further comprises:
saving a web page returned in response to the web page request;
comparing the web page returned in response to the web page request to the web page returned in response to the selection of the first hyperlink; and
assessing the first hyperlink as linking to a dead web page if the web page returned in response to the selection of the first hyperlink is identical to the web page returned in response to the web page request.
18. The signal-bearing medium ofclaim 7 where the currency of a web page is assessed by additionally testing the link status of hyperlinks contained in web pages linked through a chain of at least one hyperlink to the web page whose currency is being tested, and where:
establishing a link threshold further comprises applying a sliding scale weighting factor to hyperlinks contained in web pages linked to the web page whose currency is being tested, where the weight given to a dead link decreases with the distance of the web page containing the dead link from the web page whose currency is being tested in terms of intermediate web pages; and where
testing the hyperlinks further comprises testing hyperlinks in web pages linked to the web page whose currency is being tested.
19. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform operations for assessing the decay of a web page, the operations comprising:
accessing a subject web page containing hyperlinks;
assessing the decay of the subject web page by following a random walk away from the subject web page, where the random walk consists of a testing of links on the subject web page and web pages linked to the subject web page under test; and
assigning a decay score to the subject web page in dependence on dead links encountered in the random walk, wherein the decay score is a weighted sliding scale, where a dead link encountered relatively close in the random walk to the subject web page in terms of intermediate web pages results in a higher decay score than a dead link encountered relatively farther away from the subject web page.
20. The signal-bearing medium ofclaim 19 where testing of links further comprises:
establishing a time out limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the time out limit is exceeded;
selecting a hyperlink; and
monitoring the elapsed time until a web is returned, if at all.
21. The signal-bearing medium ofclaim 20 where testing the links further comprises:
assessing the hyperlink as linking to a dead page if the time out limit is exceeded.
22. The signal-bearing medium ofclaim 19 where testing the links further comprises:
selecting a hyperlink; and
assessing the hyperlink as linking to a dead page based on an HTTP code returned in response to an HTTP request targeting the selected hyperlink.
23. The signal-bearing medium ofclaim 19 where testing the links further comprises:
establishing a redirect limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the redirect limit is exceeded;
selecting a hyperlink; and
monitoring the number of redirects before the desired web page is returned, if at all.
24. The signal-bearing medium ofclaim 23 where testing the links further comprises:
assessing the hyperlink as linking to a dead page if the redirect limit is exceeded.
25. The signal-bearing medium ofclaim 19 where testing the links further comprises:
selecting a first hyperlink;
saving the web page returned in response to the selection of the first hyperlink;
formulating a web page request to the host of the first hyperlink, where the request is of a form that will not return an active web page with a high degree of probability; and
issuing the web page request.
26. The signal-bearing medium ofclaim 25 where testing the links further comprises:
assessing the first hyperlink as linking to an active web page if an HTTP error code is returned in response to the web page request.
27. The signal-bearing medium ofclaim 25 where testing the links further comprises:
saving a web page returned in response to the web page request;
comparing the web page returned in response to the web page request to the web page returned in response to the selection of the first hyperlink; and
assessing the first hyperlink as linking to a dead web page if the web page returned in response to the selection of the first hyperlink is identical to the web page returned in response to the web page request.
28. A computer system for assessing the currency of a web page, the computer system comprising:
an internet connection for connecting to the internet and for accessing web pages available on the internet;
at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the currency of a web page;
at least one processor coupled to the internet connection and the at least one memory, where the at least one processor performs the following operations when the at least one program is executed:
retrieving a date threshold, wherein web pages older than the date threshold will be assessed as not being current;
accessing a web page;
extracting date information from the web page identifying the age of the web page; and
comparing the date information extracted from the web page to the date threshold.
29. The computer system ofclaim 28 where the operations further comprise:
identifying the web page as lacking currency if the date information identifying the age of the web page is older than the date threshold.
30. The computer system ofclaim 28 where the operations further comprise:
identifying the web page as being current if the date information identifying the age of the web page is younger than the date threshold.
31. A computer system for assessing the currency of a web page, the computer system comprising:
an internet connection for connecting to the internet and for accessing web pages available on the internet;
at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the currency of a web page;
at least one processor coupled to the internet connection and the at least one memory, where the at least one processor performs the following operations when the at least one program is executed:
retrieving a predetermined topicality threshold, where the topicality threshold, where the topicality threshold concerns the topicality of material comprising a web page;
extracting topicality information from the web page; and
comparing the topicality information extracted from the web page to the topicality threshold.
32. The computer system ofclaim 31 where the operations further comprise:
identifying the web page as lacking currency if the topicality information extracted from the web page lacks topicality when compared to the topicality threshold.
33. The computer system ofclaim 31 where the operations further comprise:
identifying the web page as being current if the topicality information extracted from the web page is topical when compared to the topicality threshold.
34. A computer system for assessing the currency of a web page, the computer system comprising:
an internet connection for connecting to the internet and for accessing web pages available on the internet;
at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the currency of a web page;
at least one processor coupled to internet connection and the at least one memory, where the at least processor performs the following operations when the at least one program is executed;
establishing a link threshold, wherein a web page will be assessed as lacking currency if a percentage of hyperlinks contained in the web page that link to an active page is less than the link threshold;
accessing a web page containing hyperlinks;
testing the hyperlinks;
calculating the percentage of hyperlinks that return active web pages; and
comparing the percentage of hyperlinks that return active web pages with the link threshold.
35. The computer system for assessing the currency of a web page ofclaim 34 where the operations further comprise:
identifying the web page as lacking currency if the percentage of hyperlinks that return active web pages is less than the link threshold.
36. The computer system for assessing the currency of a web page ofclaim 34 where the operations further comprise:
identifying the web page as being current if the percentage of hyperlinks that return active web pages is greater than the link threshold.
37. The computer system for assessing the currency of a web page ofclaim 34 where testing the hyperlinks further comprises:
establishing a time out limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page of the time out limit is exceeded;
selecting a hyperlink; and
monitoring the elapsed time until a web page is returned, if at all.
38. The computer system for assessing the currency of a web page ofclaim 37 where testing the hyperlinks further comprises:
assessing the hyperlink as linking to a dead page if the time out limit is exceeded.
39. The computer system for assessing the currency of a web page ofclaim 34 where testing the hyperlinks further comprises:
selecting a hyperlink; and
assessing the hyperlink as linking to a dead page based on an HTTP code returned in response to an HTTP request targeting the selected hyperlink.
40. The computer system for assessing the currency of a web page ofclaim 34 where testing the hyperlinks further comprises:
establishing a redirect limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the redirect limit is exceeded;
selecting a hyperlink; and
monitoring a number of redirects before the desired web page is returned, if at all.
41. The computer system for assessing the currency of a web page ofclaim 40 where testing the hyperlinks further comprises:
assessing the hyperlink as linking to a dead web page if the redirect limit is exceeded.
42. The computer system for assessing the currency of a web page ofclaim 34 where testing the hyperlinks further comprises:
selecting a first hyperlink;
saving the web page returned in response to the selection of the first hyperlink;
formulating a web page request to the parent directory of the address corresponding to the first hyperlink, where the web page request is of a form that will not return an active web page with a high degree of probability; and
issuing the web page request.
43. The computer system for assessing the currency of a web page ofclaim 42 where testing the hyperlinks further comprises:
assessing the first hyperlink as linking to an active web page if an HTTP error code is returned in response to the web page request.
44. The computer system for assessing the currency of a web page ofclaim 42 where testing the hyperlinks further comprises:
saving a web page returned in response to the web page request;
comparing the web page returned in response to the web page request to the web page returned in response to the selection of the first hyperlink; and
assessing the first hyperlink as linking to a dead web page if the web page returned in response to the selection of the first hyperlink is identical to the web page returned in response to the web page request.
45. The computer system for assessing the currency of a web page ofclaim 34 where the currency of a web page is assessed by additionally testing the link status of hyperlinks contained in web pages linked through a chain of at least one hyperlink to the web page whose currency is being tested, and where:
establishing a link threshold further comprises applying a sliding scale weighting factor to hyperlinks contained in web pages linked to the web page whose currency is being tested, where the weight given to a dead link decreases with the distance of the web page containing the dead link from the web page whose currency is being tested in terms of intermediate web pages; and where
testing the hyperlinks further comprises testing hyperlinks in web pages linked from the web page whose currency is being tested.
46. A computer system for assessing the decay of a web page comprising:
an internet connection for connecting to the internet and for accessing web pages available on the internet;
at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the decay of web page;
at least one processor coupled to the internet connection and the at least one memory, where the at least one processor performs the following operations when the at least one program is executed:
accessing a subject web page containing hyperlinks;
assessing the decay of the subject web page by following a random walk away from the subject web page, where the random walk consists of a testing of links on the subject web page and web pages linked to the subject web page under test; and
assigning a decay score to the subject web page in dependence on dead links encountered in the random walk, wherein the decay score is a weighted sliding scale, where a dead link encountered relatively close in the random walk to the subject web page in terms of intermediate web pages results in a higher decay score than a dead link encountered relatively farther away from the subject web page.
47. The computer system for assessing the decay of a web page ofclaim 46 where testing of links further comprises:
establishing a time out limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the time out limit is exceeded;
selecting a hyperlink; and
monitoring the elapsed time until a web is returned, if at all.
48. The computer system for assessing the decay of a web page ofclaim 47 where testing the links further comprises:
assessing the hyperlink as linking to a dead page if the time out limit is exceeded.
49. The computer system for assessing the decay of a web page ofclaim 46 where testing the links further comprises:
selecting a hyperlink; and
assessing the hyperlink as linking to a dead page based on the HTTP code returned in response to an HTTP request targeting the selected hyperlink.
50. The computer system for assessing the decay of a web page ofclaim 46 where testing the links further comprises:
establishing a redirect limit for testing a hyperlink, where when the hyperlink is tested, the hyperlink will be assessed as linking to a dead web page if the redirect limit is exceeded;
selecting a hyperlink; and
monitoring the number of redirects before the desired web page is returned, if at all.
51. The computer system for assessing the decay of a web page ofclaim 50 where testing the links further comprises:
assessing the hyperlink as linking to a dead page if the redirect limit is exceeded.
52. The computer system for assessing the decay of a web page ofclaim 46 where testing the links further comprises:
selecting a first hyperlink;
saving the web page returned in response to the selection of the first hyperlink;
formulating a web page request to the host of the first hyperlink, where the request is of a form that will not return an active web page with a high degree of probability; and
issuing the web page request.
53. The computer system for assessing the decay of a web page ofclaim 52 where testing the links further comprises:
assessing the first hyperlink as linking to an active web page if an HTTP error code is returned in response to the web page request.
54. The computer system for assessing the decay of a web page ofclaim 52 where testing the links further comprises:
saving a web page returned in response to the web page request;
comparing the web page returned in response to the web page request to the web page returned in response to the selection of the first hyperlink; and
assessing the first hyperlink as linking to a dead web page if the web page returned in response to the selection of the first hyperlink is identical to the web page returned in response to the web page request.
US10/995,7702004-11-222004-11-22Methods and apparatus for assessing web page decayAbandonedUS20060112089A1 (en)

Priority Applications (4)

Application NumberPriority DateFiling DateTitle
US10/995,770US20060112089A1 (en)2004-11-222004-11-22Methods and apparatus for assessing web page decay
US11/955,458US7818312B2 (en)2004-11-222007-12-13Methods and apparatus for assessing web page decay
US11/955,481US20080097978A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay
US11/955,471US20080097977A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US10/995,770US20060112089A1 (en)2004-11-222004-11-22Methods and apparatus for assessing web page decay

Related Child Applications (3)

Application NumberTitlePriority DateFiling Date
US11/955,481DivisionUS20080097978A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay
US11/955,458DivisionUS7818312B2 (en)2004-11-222007-12-13Methods and apparatus for assessing web page decay
US11/955,471DivisionUS20080097977A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay

Publications (1)

Publication NumberPublication Date
US20060112089A1true US20060112089A1 (en)2006-05-25

Family

ID=36462123

Family Applications (4)

Application NumberTitlePriority DateFiling Date
US10/995,770AbandonedUS20060112089A1 (en)2004-11-222004-11-22Methods and apparatus for assessing web page decay
US11/955,458Expired - Fee RelatedUS7818312B2 (en)2004-11-222007-12-13Methods and apparatus for assessing web page decay
US11/955,481AbandonedUS20080097978A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay
US11/955,471AbandonedUS20080097977A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay

Family Applications After (3)

Application NumberTitlePriority DateFiling Date
US11/955,458Expired - Fee RelatedUS7818312B2 (en)2004-11-222007-12-13Methods and apparatus for assessing web page decay
US11/955,481AbandonedUS20080097978A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay
US11/955,471AbandonedUS20080097977A1 (en)2004-11-222007-12-13Methods and Apparatus for Assessing Web Page Decay

Country Status (1)

CountryLink
US (4)US20060112089A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20060190225A1 (en)*2005-02-182006-08-24Brand Matthew ECollaborative filtering using random walks of Markov chains
US20060294052A1 (en)*2005-06-282006-12-28Parashuram KulkamiUnsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
US20090006424A1 (en)*2005-05-052009-01-01Gregory Frank CoppolaSystem, method and program product for determining if a user has received a redirected web page
US7536389B1 (en)2005-02-222009-05-19Yahoo ! Inc.Techniques for crawling dynamic web content
US20090157597A1 (en)*2007-12-132009-06-18Yahoo! Inc.Reduction of annotations to extract structured web data
US20090157607A1 (en)*2007-12-122009-06-18Yahoo! Inc.Unsupervised detection of web pages corresponding to a similarity class
US20090171986A1 (en)*2007-12-272009-07-02Yahoo! Inc.Techniques for constructing sitemap or hierarchical organization of webpages of a website using decision trees
US20110302147A1 (en)*2007-12-052011-12-08Yahoo! Inc.Methods and apparatus for computing graph similarity via sequence similarity
US8671108B2 (en)2011-09-022014-03-11Mastercard International IncorporatedMethods and systems for detecting website orphan content
US8881018B2 (en)2011-08-292014-11-04Mastercard International IncorporatedMethod and system for remediating nonfunctional website content
US20150058335A1 (en)*2006-11-072015-02-26At&T Intellectual Property I, LpDetermining sort order by distance
US9569504B1 (en)*2005-05-312017-02-14Google Inc.Deriving and using document and site quality signals from search query streams
US10454807B2 (en)*2016-10-132019-10-22Futurewei Technologies, Inc.Connection minimization for distributed system
US20190325073A1 (en)*2018-04-182019-10-24Google LlcSystems and Methods for Providing Content Items in Situations Involving Suboptimal Network Conditions

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7809705B2 (en)*2007-02-132010-10-05Yahoo! Inc.System and method for determining web page quality using collective inference based on local and global information
US20080270549A1 (en)*2007-04-262008-10-30Microsoft CorporationExtracting link spam using random walks and spam seeds
US9990652B2 (en)2010-12-152018-06-05Facebook, Inc.Targeting social advertising to friends of users who have interacted with an object associated with the advertising
US20120203831A1 (en)2011-02-032012-08-09Kent SchoenSponsored Stories Unit Creation from Organic Activity Stream
US8799068B2 (en)2007-11-052014-08-05Facebook, Inc.Social advertisements and other informational messages on a social networking website, and advertising model for same
US8296279B1 (en)*2008-06-032012-10-23Google Inc.Identifying results through substring searching
US8296722B2 (en)*2008-10-062012-10-23International Business Machines CorporationCrawling of object model using transformation graph
US8001462B1 (en)*2009-01-302011-08-16Google Inc.Updating search engine document index based on calculated age of changed portions in a document
US8566332B2 (en)*2009-03-022013-10-22Hewlett-Packard Development Company, L.P.Populating variable content slots on web pages
US8332408B1 (en)2010-08-232012-12-11Google Inc.Date-based web page annotation
US8966486B2 (en)*2011-05-032015-02-24Microsoft CorporationDistributed multi-phase batch job processing
US9043434B1 (en)2011-09-122015-05-26Polyvore, Inc.Alternate page determination for a requested target page
KR20140113153A (en)*2013-03-152014-09-24삼성전자주식회사Method and System for Statistical Equivalence Test
US9690760B2 (en)2014-05-152017-06-27International Business Machines CorporationBidirectional hyperlink synchronization for managing hypertexts in social media and public data repository
US10394939B2 (en)*2015-03-312019-08-27Fujitsu LimitedResolving outdated items within curated content
CN108304395B (en)*2016-02-052022-09-06北京迅奥科技有限公司Webpage cheating detection
WO2020040718A1 (en)*2018-08-202020-02-27Google, LlcResource pre-fetch using age threshold
JP7113518B2 (en)*2019-06-282022-08-05サイレックス・テクノロジー株式会社 WWW server and communication control method
US11531822B1 (en)2020-06-302022-12-20Amazon Technologies, Inc.Training models and using the trained models to indicate staleness of content items

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5860071A (en)*1997-02-071999-01-12At&T CorpQuerying and navigating changes in web repositories
US20050256860A1 (en)*2004-05-152005-11-17International Business Machines CorporationSystem and method for ranking nodes in a network

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2729356B2 (en)*1994-09-011998-03-18日本アイ・ビー・エム株式会社 Information retrieval system and method
US6272507B1 (en)*1997-04-092001-08-07Xerox CorporationSystem for ranking search results from a collection of documents using spreading activation techniques
US6665838B1 (en)*1999-07-302003-12-16International Business Machines CorporationWeb page thumbnails and user configured complementary information provided from a server
US6990238B1 (en)*1999-09-302006-01-24Battelle Memorial InstituteData processing, analysis, and visualization system for use with disparate data types
US7085736B2 (en)*2001-02-272006-08-01Alexa InternetRules-based identification of items represented on web pages
US7398271B1 (en)*2001-04-162008-07-08Yahoo! Inc.Using network traffic logs for search enhancement
US7346839B2 (en)*2003-09-302008-03-18Google Inc.Information retrieval based on historical data
US7587398B1 (en)*2004-06-302009-09-08Google Inc.System and method of accessing a document efficiently through multi-tier web caching
US7707229B2 (en)*2007-12-122010-04-27Yahoo! Inc.Unsupervised detection of web pages corresponding to a similarity class

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5860071A (en)*1997-02-071999-01-12At&T CorpQuerying and navigating changes in web repositories
US20050256860A1 (en)*2004-05-152005-11-17International Business Machines CorporationSystem and method for ranking nodes in a network

Cited By (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20060190225A1 (en)*2005-02-182006-08-24Brand Matthew ECollaborative filtering using random walks of Markov chains
US7536389B1 (en)2005-02-222009-05-19Yahoo ! Inc.Techniques for crawling dynamic web content
US20090006424A1 (en)*2005-05-052009-01-01Gregory Frank CoppolaSystem, method and program product for determining if a user has received a redirected web page
US9569504B1 (en)*2005-05-312017-02-14Google Inc.Deriving and using document and site quality signals from search query streams
US20060294052A1 (en)*2005-06-282006-12-28Parashuram KulkamiUnsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
US7610267B2 (en)*2005-06-282009-10-27Yahoo! Inc.Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
US9449108B2 (en)*2006-11-072016-09-20At&T Intellectual Property I, L.P.Determining sort order by distance
US20150058335A1 (en)*2006-11-072015-02-26At&T Intellectual Property I, LpDetermining sort order by distance
US20110302147A1 (en)*2007-12-052011-12-08Yahoo! Inc.Methods and apparatus for computing graph similarity via sequence similarity
US8417657B2 (en)*2007-12-052013-04-09Yahoo! Inc.Methods and apparatus for computing graph similarity via sequence similarity
US20090157607A1 (en)*2007-12-122009-06-18Yahoo! Inc.Unsupervised detection of web pages corresponding to a similarity class
US7941421B2 (en)2007-12-122011-05-10Yahoo! Inc.Unsupervised detection of web pages corresponding to a similarity class
US7707229B2 (en)2007-12-122010-04-27Yahoo! Inc.Unsupervised detection of web pages corresponding to a similarity class
US8046360B2 (en)2007-12-132011-10-25Yahoo! Inc.Reduction of annotations to extract structured web data
US20090157597A1 (en)*2007-12-132009-06-18Yahoo! Inc.Reduction of annotations to extract structured web data
US20090171986A1 (en)*2007-12-272009-07-02Yahoo! Inc.Techniques for constructing sitemap or hierarchical organization of webpages of a website using decision trees
US8881018B2 (en)2011-08-292014-11-04Mastercard International IncorporatedMethod and system for remediating nonfunctional website content
US8671108B2 (en)2011-09-022014-03-11Mastercard International IncorporatedMethods and systems for detecting website orphan content
US10454807B2 (en)*2016-10-132019-10-22Futurewei Technologies, Inc.Connection minimization for distributed system
US20190325073A1 (en)*2018-04-182019-10-24Google LlcSystems and Methods for Providing Content Items in Situations Involving Suboptimal Network Conditions
US11288336B2 (en)*2018-04-182022-03-29Google LlcSystems and methods for providing content items in situations involving suboptimal network conditions

Also Published As

Publication numberPublication date
US20080097977A1 (en)2008-04-24
US20080097978A1 (en)2008-04-24
US20080097988A1 (en)2008-04-24
US7818312B2 (en)2010-10-19

Similar Documents

PublicationPublication DateTitle
US7818312B2 (en)Methods and apparatus for assessing web page decay
Bar-Yossef et al.Sic transit gloria telae: towards an understanding of the web's decay
US6910071B2 (en)Surveillance monitoring and automated reporting method for detecting data changes
US6029192A (en)System and method for locating resources on a network using resource evaluations derived from electronic messages
US8203952B2 (en)Using network traffic logs for search enhancement
US6895551B1 (en)Network quality control system for automatic validation of web pages and notification of author
US8606781B2 (en)Systems and methods for personalized search
US7668812B1 (en)Filtering search results using annotations
US7603350B1 (en)Search result ranking based on trust
Chen et al.Local methods for estimating pagerank values
US8271486B2 (en)System and method for searching a bookmark and tag database for relevant bookmarks
Goh et al.Link decay in leading information science journals
US7809736B2 (en)Importance ranking for a hierarchical collection of objects
US6505197B1 (en)System and method for automatically and iteratively mining related terms in a document through relations and patterns of occurrences
US20050234877A1 (en)System and method for searching using a temporal dimension
US20080040313A1 (en)System and method for providing tag-based relevance recommendations of bookmarks in a bookmark and tag database
US20130144834A1 (en)Uniform resource locator canonicalization
Baeza-Yates et al.Crawling the infinite Web: five levels are enough
WO2002010955A2 (en)Computer method and apparatus for determining content owner of a website
US7424472B2 (en)Search query dominant location detection
Baeza-YatesWeb usage mining in search engines
US20080313167A1 (en)System And Method For Intelligently Indexing Internet Resources
Nunes et al.Using neighbors to date web documents
Baeza-Yates et al.Crawling the infinite web
Pant et al.Predicting web page status

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BRODER, ANDREI ZARY;BAR-YOSSEF, ZIV;RAVIKUMAR, SHANMAGASUNDARAM;AND OTHERS;REEL/FRAME:015445/0114;SIGNING DATES FROM 20041118 TO 20041120

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO PAY ISSUE FEE


[8]ページ先頭

©2009-2025 Movatter.jp