Movatterモバイル変換


[0]ホーム

URL:


US20080313178A1 - Determining searchable criteria of network resources based on commonality of content - Google Patents

Determining searchable criteria of network resources based on commonality of content
Download PDF

Info

Publication number
US20080313178A1
US20080313178A1US12/196,949US19694908AUS2008313178A1US 20080313178 A1US20080313178 A1US 20080313178A1US 19694908 AUS19694908 AUS 19694908AUS 2008313178 A1US2008313178 A1US 2008313178A1
Authority
US
United States
Prior art keywords
words
keywords
hyperlinks
document
making
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/196,949
Inventor
Cary L. Bates
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IndividualfiledCriticalIndividual
Priority to US12/196,949priorityCriticalpatent/US20080313178A1/en
Publication of US20080313178A1publicationCriticalpatent/US20080313178A1/en
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A method, article of manufacture, apparatus for determining keywords to be used by a search engine. In one embodiment, a list of hyperlinks contained in an electronic document is identified by a searching program. The searching program then accesses the resource content (e.g., HTML) from each resource pointed to by the hyperlinks. The resource content of each resource is examined to determine whether a commonality exists in a manner directed to identifying keywords for each resource. These keywords may then be used by a search engine to return more accurate results to user queries.

Description

Claims (22)

1. A method for determining keywords representative of the content of an electronic document located at a source network address, comprising:
analyzing a document definition of the electronic document located at the source network address to identify one or more lists of hyperlinks to other electronic documents located at respective target network addresses;
for at least two hyperlinks in at least one identified list of hyperlinks, accessing the respective electronic documents pointed to by the at least two respective hyperlinks;
for each respective electronic document accessed, analyzing the respective document definition of the respective electronic document to determine a set of words representative of the respective electronic document;
comparing each determined set of words to determine whether a commonality exists between the set of words based on some predefined criteria; and
if the commonality exists, making at least a portion of the set of words keywords for the one or more of respective electronic documents defined by the respective document definitions.
8. A computer readable storage medium containing a program which, when executed, performs an operation for determining keywords representative of the content of an electronic document located at a source network address, the operation comprising:
analyzing a document definition of the electronic document located at the source network address to identify one or more lists of hyperlinks to other electronic documents located at respective target network addresses;
for at least two hyperlinks in at least one identified list of hyperlinks, accessing the respective electronic documents pointed to by the at least two respective hyperlinks;
for each respective electronic document accessed, analyzing the respective document definition of the respective electronic document to determine a set of words representative of the respective electronic document;
comparing each determined set of words to an determine whether a commonality exists between the set of words based on some predefined criteria; and
if the commonality exists, making at least a portion of the set of words keywords for one or more of respective electronic documents defined by the respective document definitions.
15. A computer system, comprising:
a processor configured to:
analyze a document definition of an electronic document located at a source network address to identify one or more lists of hyperlinks to other electronic documents located at respective target network addresses;
for at least two hyperlinks in at least one identified list of hyperlinks, access the respective electronic documents pointed to by the at least two respective hyperlinks;
for each respective electronic document accessed, analyze the respective document definition of the respective electronic document to determine a set of words representative of the respective electronic document;
compare each determined set of words to an determine whether a commonality exists between the set of words based on some predefined criteria; and
if the commonality exists, make at least a portion of the set of words keywords for one or more of respective electronic documents defined by the respective document definitions.
22. A method for determining keywords representative of the content of an electronic document located at a source network address, comprising:
analyzing a document definition of the electronic document located at the source network address to identify one or more lists of hyperlinks to other electronic documents located at respective target network addresses; wherein analyzing the document definition to identify one or more lists of hyperlinks comprises scanning the document definition for predefined markup language tags that define the beginning of a list of hyperlinks;
in response to identifying one or more lists of hyperlinks of the analyzed document:
for at least two hyperlinks in at least one identified list of hyperlinks, accessing the respective electronic documents pointed to by the at least two respective hyperlinks;
for each respective electronic document accessed, analyzing the respective document definition of the respective electronic document to determine a set of words which occur above a predefined frequency in the respective electronic document wherein the determined set of words is representative of the respective electronic document;
comparing each determined set of words to determine whether a commonality exists between the set of words based on some predefined criteria; and
if the commonality exists, making at least a portion of the set of words keywords for the one or more of respective electronic documents defined by the respective document definitions; wherein making at least the portion of the set of words keywords is conditioned on:
comparing the number of the hyperlinks in the at least one identified list to a predefined threshold number; and
determining that the number of the hyperlinks in the at least one identified list is greater than the predefined threshold number,
wherein determining whether a commonality exists between the set of words comprises determining whether there is a common word set between the set of words; and wherein making at least the portion of the set of words keywords comprises making the words in the common word set keywords.
US12/196,9492006-04-132008-08-22Determining searchable criteria of network resources based on commonality of contentAbandonedUS20080313178A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US12/196,949US20080313178A1 (en)2006-04-132008-08-22Determining searchable criteria of network resources based on commonality of content

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US11/279,610US7447684B2 (en)2006-04-132006-04-13Determining searchable criteria of network resources based on a commonality of content
US12/196,949US20080313178A1 (en)2006-04-132008-08-22Determining searchable criteria of network resources based on commonality of content

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US11/279,610ContinuationUS7447684B2 (en)2006-04-132006-04-13Determining searchable criteria of network resources based on a commonality of content

Publications (1)

Publication NumberPublication Date
US20080313178A1true US20080313178A1 (en)2008-12-18

Family

ID=38606027

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US11/279,610Expired - Fee RelatedUS7447684B2 (en)2006-04-132006-04-13Determining searchable criteria of network resources based on a commonality of content
US12/196,949AbandonedUS20080313178A1 (en)2006-04-132008-08-22Determining searchable criteria of network resources based on commonality of content

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
US11/279,610Expired - Fee RelatedUS7447684B2 (en)2006-04-132006-04-13Determining searchable criteria of network resources based on a commonality of content

Country Status (1)

CountryLink
US (2)US7447684B2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20080256049A1 (en)*2007-01-192008-10-16Niraj KatwalaMethod and system for establishing document relevance
US20090327266A1 (en)*2008-06-272009-12-31Microsoft CorporationIndex Optimization for Ranking Using a Linear Model
US20100121838A1 (en)*2008-06-272010-05-13Microsoft CorporationIndex optimization for ranking using a linear model
US20100191725A1 (en)*2009-01-232010-07-29Mehmet Kivanc OzonatA system and method for discovering providers
US20110161323A1 (en)*2009-12-252011-06-30Takehiro HagiwaraInformation Processing Device, Method of Evaluating Degree of Association, and Program
US20110179041A1 (en)*2010-01-152011-07-21Souto Farlon De AlencarMatching service entities with candidate resources
US20110208715A1 (en)*2010-02-232011-08-25Microsoft CorporationAutomatically mining intents of a group of queries
US20130238584A1 (en)*2011-05-102013-09-12Geoff HendrySystems and methods for performing search and retrieval of electronic documents using a big index
US8650173B2 (en)2010-06-232014-02-11Microsoft CorporationPlacement of search results using user intent
US9396276B2 (en)2011-05-102016-07-19Uber Technologies, Inc.Key-value database for geo-search and retrieval of point of interest records
US10339187B2 (en)*2014-06-272019-07-02Yandex Europe AgSystem and method for conducting a search

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP1846815A2 (en)*2005-01-312007-10-24Textdigger, Inc.Method and system for semantic search and retrieval of electronic documents
US8468445B2 (en)*2005-03-302013-06-18The Trustees Of Columbia University In The City Of New YorkSystems and methods for content extraction
US9400838B2 (en)2005-04-112016-07-26Textdigger, Inc.System and method for searching for a query
WO2007081681A2 (en)2006-01-032007-07-19Textdigger, Inc.Search system with query refinement and search method
WO2007114932A2 (en)2006-04-042007-10-11Textdigger, Inc.Search system and method with text function tagging
US7447684B2 (en)*2006-04-132008-11-04International Business Machines CorporationDetermining searchable criteria of network resources based on a commonality of content
US7716179B1 (en)2009-10-292010-05-11Wowd, Inc.DHT-based distributed file system for simultaneous use by millions of frequently disconnected, world-wide users
US8850219B2 (en)*2010-05-132014-09-30Salesforce.Com, Inc.Secure communications
US8892584B1 (en)*2011-03-282014-11-18Symantec CorporationSystems and methods for identifying new words from a meta tag
US8484218B2 (en)*2011-04-212013-07-09Google Inc.Translating keywords from a source language to a target language
US10530746B2 (en)*2017-10-172020-01-07Servicenow, Inc.Deployment of a custom address to a remotely managed computational instance

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5864863A (en)*1996-08-091999-01-26Digital Equipment CorporationMethod for parsing, indexing and searching world-wide-web pages
US20020087515A1 (en)*2000-11-032002-07-04Swannack Christopher MartynData acquisition system
US20020194161A1 (en)*2001-04-122002-12-19Mcnamee J. PaulDirected web crawler with machine learning
US20050138079A1 (en)*2003-12-172005-06-23International Business Machines CorporationProcessing, browsing and classifying an electronic document
US20050171932A1 (en)*2000-02-242005-08-04Nandhra Ian R.Method and system for extracting, analyzing, storing, comparing and reporting on data stored in web and/or other network repositories and apparatus to detect, prevent and obfuscate information removal from information servers
US7130848B2 (en)*2000-08-092006-10-31Gary Martin OostaMethods for document indexing and analysis
US20070244855A1 (en)*2006-04-132007-10-18Bates Cary LDetermining Searchable Criteria of Network Resources Based on a Commonality of Content

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5864863A (en)*1996-08-091999-01-26Digital Equipment CorporationMethod for parsing, indexing and searching world-wide-web pages
US20050171932A1 (en)*2000-02-242005-08-04Nandhra Ian R.Method and system for extracting, analyzing, storing, comparing and reporting on data stored in web and/or other network repositories and apparatus to detect, prevent and obfuscate information removal from information servers
US7130848B2 (en)*2000-08-092006-10-31Gary Martin OostaMethods for document indexing and analysis
US20020087515A1 (en)*2000-11-032002-07-04Swannack Christopher MartynData acquisition system
US20020194161A1 (en)*2001-04-122002-12-19Mcnamee J. PaulDirected web crawler with machine learning
US20050138079A1 (en)*2003-12-172005-06-23International Business Machines CorporationProcessing, browsing and classifying an electronic document
US20070244855A1 (en)*2006-04-132007-10-18Bates Cary LDetermining Searchable Criteria of Network Resources Based on a Commonality of Content
US7447684B2 (en)*2006-04-132008-11-04International Business Machines CorporationDetermining searchable criteria of network resources based on a commonality of content

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7844602B2 (en)*2007-01-192010-11-30Healthline Networks, Inc.Method and system for establishing document relevance
US20080256049A1 (en)*2007-01-192008-10-16Niraj KatwalaMethod and system for establishing document relevance
US8161036B2 (en)*2008-06-272012-04-17Microsoft CorporationIndex optimization for ranking using a linear model
US20090327266A1 (en)*2008-06-272009-12-31Microsoft CorporationIndex Optimization for Ranking Using a Linear Model
US20100121838A1 (en)*2008-06-272010-05-13Microsoft CorporationIndex optimization for ranking using a linear model
US8171031B2 (en)2008-06-272012-05-01Microsoft CorporationIndex optimization for ranking using a linear model
US20100191725A1 (en)*2009-01-232010-07-29Mehmet Kivanc OzonatA system and method for discovering providers
US20110161323A1 (en)*2009-12-252011-06-30Takehiro HagiwaraInformation Processing Device, Method of Evaluating Degree of Association, and Program
US20110179041A1 (en)*2010-01-152011-07-21Souto Farlon De AlencarMatching service entities with candidate resources
US8260763B2 (en)*2010-01-152012-09-04Hewlett-Packard Devlopment Company, L.P.Matching service entities with candidate resources
US20110208715A1 (en)*2010-02-232011-08-25Microsoft CorporationAutomatically mining intents of a group of queries
US8650173B2 (en)2010-06-232014-02-11Microsoft CorporationPlacement of search results using user intent
US20130238584A1 (en)*2011-05-102013-09-12Geoff HendrySystems and methods for performing search and retrieval of electronic documents using a big index
US9396276B2 (en)2011-05-102016-07-19Uber Technologies, Inc.Key-value database for geo-search and retrieval of point of interest records
US9646108B2 (en)*2011-05-102017-05-09Uber Technologies, Inc.Systems and methods for performing geo-search and retrieval of electronic documents using a big index
US10198530B2 (en)2011-05-102019-02-05Uber Technologies, Inc.Generating and providing spelling correction suggestions to search queries using a confusion set based on residual strings
US10210282B2 (en)2011-05-102019-02-19Uber Technologies, Inc.Search and retrieval of electronic documents using key-value based partition-by-query indices
US10339187B2 (en)*2014-06-272019-07-02Yandex Europe AgSystem and method for conducting a search

Also Published As

Publication numberPublication date
US7447684B2 (en)2008-11-04
US20070244855A1 (en)2007-10-18

Similar Documents

PublicationPublication DateTitle
US7447684B2 (en)Determining searchable criteria of network resources based on a commonality of content
JP5638031B2 (en) Rating method, search result classification method, rating system, and search result classification system
US8452766B1 (en)Detecting query-specific duplicate documents
US8515954B2 (en)Displaying autocompletion of partial search query with predicted search results
US7636714B1 (en)Determining query term synonyms within query context
US7716216B1 (en)Document ranking based on semantic distance between terms in a document
US7499940B1 (en)Method and system for URL autocompletion using ranked results
US6560600B1 (en)Method and apparatus for ranking Web page search results
US7260573B1 (en)Personalizing anchor text scores in a search engine
CN100394427C (en)network searching system and method
US7310633B1 (en)Methods and systems for generating textual information
US20070061297A1 (en)Ranking blog documents
US20150172299A1 (en)Indexing and retrieval of blogs
JP5084858B2 (en) Summary creation device, summary creation method and program
US6941293B1 (en)Methods and apparatus for determining equivalent descriptions for an information need
US7886217B1 (en)Identification of web sites that contain session identifiers
Choudhary et al.Role of ranking algorithms for information retrieval
JP4621680B2 (en) Definition system and method
US7490082B2 (en)System and method for searching internet domains
US8495483B1 (en)Using text surrounding hypertext links when indexing and generating page summaries
US20130091166A1 (en)Method and apparatus for indexing information using an extended lexicon
US8375017B1 (en)Automated keyword analysis system and method
US7730074B1 (en)Accelerated large scale optimization
JP3333186B2 (en) Document search system
FadhilEnhancement of Ranking and Query Optimizer in Internet Search Engine

Legal Events

DateCodeTitleDescription
STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION


[8]ページ先頭

©2009-2025 Movatter.jp