BACKGROUND 1. Field of the Invention
This invention pertains in general to advertising on the Internet and in particular to selecting an advertising message for display in response to a given search query.
2. Background Art
Many web sites provide search engines which accept a keyword-based query from a user and produce a list of matching categories or documents to the user in return. For example, the ALTAVISTA web site contains a search engine that enables a user to execute a query on multiple keywords and receive matching information, such as references to web pages, in response.
Such web sites typically generate revenue by displaying advertising messages, such as banner ads, to the user along with the returned search results. However, users often ignore the ads and may even find the ads irritating. Therefore, web sites strive to display ads related to the search query on the assumption that a user is more likely to show interest in such ads. For example, if the user searches for the phrase “Tahoe ski areas,” then the user may tend to respond to ads for products such as skis, hotels near Lake Tahoe, etc. because the user is presumably interested in skiing.
To provide this functionality, web sites typically sell keywords to advertisers. An advertiser's ad is displayed when the user searches on a keyword owned by that advertiser. Since multiple advertisers may desire the same keyword, the web sites often auction desirable keywords to the highest bidders. For example, a web site might auction generic keywords such as “car” and “automobile.”
Still, web sites often display ads that are inappropriate in view of the query received from the user. Sometimes, a particular ad is generally appropriate for many queries and the advertiser therefore buys a large number of keywords for the ad. As a result, other advertisers are blocked from using the keywords for their own, possibly more relevant, ads. In a competitive advertising scenario, for example, a dealer of a first make of automobile might buy the keywords corresponding to other makes, thereby intending to entice buyers away from competing makes. As a result, a user may be annoyed to see ads for the first make of automobile when the user is seeking information about another make.
Moreover, advertisers often have little guidance when selecting keywords and seldom know the characteristics of real-world user queries. As a result, advertisers frequently buy overly general keywords like “cars,” on which users seldom query. Consequently, often no advertisers have purchased the exact keywords utilized by the users. Therefore, the web site displays general rotation ads instead of targeted ads.
Therefore, there is a need in the art for a way to increase the relevancy of the selected ads to the query terms received from users. Preferably, a solution to this need will overcome the deficiencies of selecting ads via keywords.
DISCLOSURE OF THE INVENTION The above need is met by a search engine that probabilistically maps a user query into a category, and then uses the category to select a targeted message. The search engine receives the query terms from the user's client computer via the Internet. In response, the search engine executes a search on a web directory to locate zero or more documents that match the query terms. The search engine may also search other directories in order to identify web pages or other documents that match the query terms.
If the search engine has located one or more documents matching the query terms, the search engine determines the categories corresponding to the retrieved documents. In a preferred embodiment of the present invention, each document in the web directory is assigned to a category in a hierarchical directory. In one embodiment, the hierarchical directory is derived from information available through the Open Directory Project (ODP). In general, the ODP is a hierarchical directory of web pages assembled by human operators who review the web pages and assign the pages to certain categories.
In an alternative embodiment of the present invention, the hierarchy in the web directory is completely or substantially flat. A flatter hierarchy simplifies the process of selecting a message because a documents belongs to only one or a few categories, instead of belonging to a long chain of ancestor categories.
The search engine preferably next selects one of the categories corresponding to the retrieved documents. In one embodiment of the present invention, the search engine selects the category of one of the returned documents at random. Since categories having more documents matching the query terms are more likely to be selected, the random selection probabilistically selects the category best matching the query terms. In alternative embodiments, the search engine uses one or more other heuristics to bias the category selection.
Once the search engine selects a category, it preferably accesses a message database and selects a message associated with the selected category. The message database preferably holds multiple messages, with each message belonging to one or more of the categories enumerated in the web directory. In a preferred embodiment, the messages are banner ads provided by advertisers. Preferably, the advertisers purchase “shares” in one or more of the categories for each banner ad. When the search engine selects the category in which a given ad has shares, the search engine selects that ad a number of times proportional to the shares of that category owned by the ad.
The search engine preferably returns a web page containing the document references retrieved from the web directory and the one or more messages selected from the message database to the client. Typically, the web page includes a subset of the document references (e.g., the10 best matches) and at least one banner ad (i.e., the message). Obviously, the web page can include other message types in addition to or instead of the banner ad and/or a different number of references or messages.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 a block diagram illustrating a typical computing environment according to an embodiment of the present invention;
FIG. 2 is a high-level block diagram illustrating an exemplary embodiment of a computer system for use as a client or web server;
FIG. 3 is a block diagram illustrating a lower-level view of the web server containing the search engine according to an embodiment of the present invention; and
FIG. 4 is a flowchart illustrating steps performed by the search engine according to an embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTSFIG. 1 is a block diagram illustrating atypical computing environment100 according to an embodiment of the present invention.FIG. 1 illustrates threeclient computers110A,110B,110C in communication with the Internet112 using known communications technologies. Multipleother web servers114,116 are also in communication with the Internet using known communications technologies. One of theweb servers114 includes asearch engine118. As is known in the art, the Internet112 is a publicly accessible network of computers and supports communications among the various computers in communication with it.
AlthoughFIG. 1 illustrates only three clients110, embodiments of the present invention can have thousands, or even millions, of clients simultaneously in communication with theweb servers114,116 via Internet112. As is known in the art, a client is typically a personal computer system such as an IBM PC- or Apple Macintosh-compatible computer. A client110 typically use a web browser120 such as NETSCAPE NAVIGATOR from Netscape Communications Corp. or INTERNET EXPLORER from Microsoft Corp. to exchange information with theweb servers114,116. In a preferred embodiment of the present invention, this information takes the form of hypertext markup language (HTML) web pages transmitted via the hypertext transport protocol (HTTP). However, alternative embodiments of the present invention utilize different transmission techniques and/or exchange different types of information.
As is known in the art, aweb server114,116 is typically comprised of one or more computer systems adapted to simultaneously interact with multiple clients110 via theInternet112. The web servers preferably contain one or more documents referenced via uniform resource locators (URLs), such as “http://www.altavista.com”. In one embodiment, a document is a web page containing text, images and/or other types of media. Those of ordinary,skill in the art will recognize that the present invention can also be utilized with other types of documents referenced using techniques other than URLs.
As described above, one114 of the web servers preferably contains asearch engine118. A user preferably uses the web browser120 to download a web page from thesearch engine118 containing a search form. The web browser120 allows the user to enter query terms into the search form and then transmit the query terms to thesearch engine118 via HTTP or another communications protocol. The query terms are typically one or more words or other text strings, but can include other information indicating the types of documents desired by the user.
Upon receiving the search terms, thesearch engine118 preferably searches for documents matching the query terms and provides references (e.g., URLs) to the matching documents to the client110. In a preferred embodiment of the present invention, thesearch engine118 derives the search results from documents available fromweb servers114,116 connected to the Internet. However, the exact data domains searched by thesearch engine118 are not material to the present invention. The search engine is preferably the ALTAVISTA search engine from AltaVista Co. and available at http://www.altavista.com.
Thesearch engine118 preferably provides the search results to the client110 in the form of one or more web pages containing the references to the matching documents. The web pages are typically displayed by the browsing software120 at the client110. In addition to the search results, each web page preferably contains one or more messages selected by thesearch engine118. A “message” is a communication including text, graphics, sounds and/or any other type of information.
Preferably, one or more of the messages are “targeted” to the query terms in order to make the messages more relevant to the user who initially provided the query. As used herein, the term “targeted” means that the messages are selected and/or generated such that the messages are related to the query terms and/or other information known about the user or client110.
In a preferred embodiment, the targeted messages contain advertising and are displayed on behalf of one or more advertisers. In one embodiment, the messages are displayed as banner ads that accompany the search results. In alternative embodiments, the messages are displayed in a “popup” window separate from the search results, displayed as search results, and/or displayed in any other practical location or manner.
FIG. 2 is a high-level block diagram illustrating an exemplary embodiment of a computer system for use as a client110 orweb server114,116. Illustrated are at least oneprocessor202 coupled to abus204. Also coupled to thebus204 are amemory206, astorage device208, akeyboard210, agraphics adapter212, apointing device214, and anetwork adapter216. Adisplay218 is coupled to thegraphics adapter212.
The at least oneprocessor202 may be any specific or general-purpose processor such as an INTEL x86 or POWERPC-compatible central processing unit (CPU). Thestorage device208 may be any device capable of holding large amounts of data, like a hard drive, compact disk read-only memory (CD-ROM), DVD, or some other form of fixed or removable storage device. Thememory206 holds instructions and data used by theprocessor202. Thepointing device214 may be a mouse, track ball, light pen, touch-sensitive display, or other type of pointing device and is used in combination with thekeyboard210 to input data into thecomputer system200. Thenetwork adapter216 couples thecomputer system200 to theInternet112 or another network.
Modules220 for providing the functionality described herein are preferably stored on thestorage device208, loaded into thememory206, and executed by theprocessor202. Alternatively, hardware or software modules may be stored elsewhere within thecomputer system200. As used herein, the term “module” refers to computer program logic and/or any hardware or circuitry utilized to provide the functionality attributed to the modules. The types of hardware and software within thecomputer system200 may vary depending upon how the computer system is utilized. For example, a computer system used as aweb server114,116 is likely to have greater processing power and storage capacity than a computer system used as a client110.
FIG. 3 is a block diagram illustrating a lower level view of theweb server114 containing thesearch engine118.FIG. 4 is a flowchart illustrating steps performed by thesearch engine118 according to an embodiment of the present invention. It will be understood by those of skill in the art that thesearch engine118 is a module and that the actions and functionality attributed to the search engine herein can be performed by the search engine itself or by other modules within theweb server114. Accordingly, the term “search engine” is intended to include any module or other entity that performs the attributed functions.
Initially, thesearch engine118 receives410 the query terms. In response, the search engine executes412 a search on aweb directory312 to locate zero or more documents that match the query terms. As used herein, a document “matches” the query terms if the document is in some way related to the query terms. For example, in one embodiment, documents that contain one or more of the query terms match the query terms. Those of skill in the art will recognize that different embodiments of the present invention can use different thresholds of relatedness to determine whether a document matches the query terms.
Preferably, theweb directory312 contains a searchable index of terms contained in documents available fromweb servers114,116 on theInternet112, and references to those documents. In one embodiment, theweb directory312 contains only a small subset of the total number of documents available. Theweb directory312 is preferably optimized to allow fast retrieval of the references to documents matching the query terms.
In a preferred embodiment of the present invention, each document in theweb directory312 is assigned to a category in a hierarchical directory. In one embodiment, the hierarchical directory and category assignments are derived from information available through the Open Directory Project (ODP). In general, the ODP is a hierarchical directory of web pages assembled by human operators who review the web pages and assign the pages to certain categories. More information about the ODP is available at http://www.dmoz.org. In another embodiment, theweb directory312 is derived from one or more alternative data sets.
For example, a document related to snow skiing might be located at “Sports: Skiing: Regional: North America” in the hierarchy. Likewise, a document related to automobiles might be located at “Home: Consumer Information: Automobiles.” Note that the hierarchies may be arranged so that certain documents are stored at multiple locations. For example, documents related to “skis” can be found in the categories “Home: Consumer Information: Sports and Recreation: Skiing” and “Sports: Skiing: Backcountry: Telemark: Equipment.”
In an alternative embodiment of the present invention, the hierarchy in theweb directory312 is completely or substantially flat. One embodiment flattens the hierarchy by removing regional- and/or geography-based branches. Likewise, another embodiment trims the hierarchy by removing certain broad categories into which many documents fall.
In another embodiment of the present invention, theweb directory312 holds documents in a specialized message hierarchy. The message hierarchy resembles the ODP hierarchy described above except that the hierarchical levels and categories are defined in view of the types of messages available for inclusion with the search results. Thus, in the embodiment wherein the messages contain advertising, the hierarchy categories depend upon the types of ads available. For example, if the ads are placed in broad categories such as “healthcare,” “sporting equipment,” and/or “automobiles,” then theweb directory312 holds the web pages in similar categories. As described below, more specific message categories can lead to more targeted messages.
Thesearch412 performed by thesearch engine118 identifies414 zero or more documents matching the query terms and references to those documents. A focused query might return documents from only a single category, while less focused queries will probably return documents from several categories. Since queries are often vague, the latter type of query should occur more frequently than the former.
Thesearch engine118 preferably next selects418 one or more of the categories corresponding to the identified documents. In one embodiment of the present invention, thesearch engine118 randomly selects a category from the identified documents. This embodiment is referred to as the “dart board” because it operates in a manner similar to simply throwing a dart at the identified documents and selecting the category of the document hit by the dart.
An advantage of the dart board approach is that the probability that a category is selected corresponds to the degree to which the query terms match the category. Consider an extreme example: If the query terms match99 documents in a first category and one document in a second category, then the query terms can be said to match the first category much better than the second category. Correspondingly, there is a 99% chance that thesearch engine118 will select the first category and a one percent chance that it will select the second category. Since the category is selected probabilistically, the search engine will occasionally select different categories in response to the same query terms (assuming matching documents are found in multiple categories).
In alternative embodiments, thesearch engine118 uses one or more other heuristics to select418 the category. In one embodiment, the documents and/or categories in theweb directory312 have weight values that thesearch engine118 uses to bias the selection. For example, the heuristic can bias the selection towards the category containing the greatest number of documents. In another example, the heuristic can bias the selection by utilizing weight values determined from the number of times a document is selected by a user after being returned as part of a search result. In yet another example, the results of the dartboard approach can be monitored and the selections biased to ensure that each category is selected a certain percentage of the time.
Another alternative embodiment of the present invention uses a heuristic that occasionally selects a hierarchical ancestor of the category in which a document was identified. For example, if a document is found in the category “Home: Consumer Information: Automobiles,” this embodiment occasionally selects the “Consumer Information” category. This heuristic accounts for the potential situation wherein a category having children is rarely selected because the query terms frequently match documents in the child category.
Once a category is selected, thesearch engine118 preferably accesses a message database314 and selects420 a message from the selected category. The message database314 preferably holds multiple messages, with each message belonging to one or more of the categories enumerated in theweb directory312. Since the category is selected probabilistically, and the message is selected from the category, the same query terms can result in messages selected from different categories.
In a preferred embodiment, the messages are banner ads provided by advertisers. Preferably, the advertisers purchase “shares” in one or more of the categories for each banner ad. Then, when the category in which the ad owns shares is selected, thesearch engine118 selects the ad a number of times proportional to the number of shares owned by the ad. For example, if an ad owns 10% of the shares in the “Snow Skiing” category, thesearch engine118 will select that ad 10% of the time that the “Snow Skiing” category is selected.
If422 no messages are associated with the category selected by thesearch engine118, or the search engine does not identify414 any documents that match the query terms, in one embodiment of the present invention the search engine selects424 a message from a “general” category. In this embodiment, messages own shares in the general category in addition to the other categories. In this manner, thesearch engine118 is adapted to provide a non-targeted message if there are no documents or messages related to the query terms.
In an alternative embodiment of the present invention, if no messages own shares in the selected category, thesearch engine118 searches the category's hierarchical ancestors for messages. If thesearch engine118 does not find a message through this search, then it preferably selects a general message.
After selecting the message, a preferred embodiment of the search engine also searches426 one or moreother databases316 to identify references to documents matching the query terms. Theseother databases316 are similar to theweb directory312, except that the indexed documents in the other databases are not necessarily associated with categories. In this embodiment, theweb directory312 may contain only a small subset of the documents available fromservers114,116 on theInternet112 while the other databases contain a much larger subset of the documents. Therefore, thesearch engine118 determines the category of the ad from the small set of categorized documents but identifies the majority of the search results from one ormore databases316 of uncategorized documents.
As discussed above, thesearch engine118 preferably returns428 a web page containing the references to the documents matching the query terms and the one or more messages selected from the message database314 to the client110. Typically, the web page includes a subset of the document references (e.g., the 10 best matches) and at least one banner ad (i.e., the message). In an alternative embodiment of the present invention, the references returned by thesearch engine118 are sorted by category and the selected message is targeted to the category of the returned references. For example, if thesearch engine118 identifies100 references in eight different categories, the search engine returns to the client110 all or some of the documents in the first selected category along with a message selected from that category. If the user requests additional results pages, thesearch engine118 returns documents from a second category, along with a message selected from the second category, etc.
In sum, the present invention selects messages by mapping a practically infinite set of potential query terms to a finite set of categories. The probability that the present invention selects a category is determined by how well the category matches the query terms, and the probability that the present invention selects a message is determined by the number of shares in the selected category owned by the message. Thus, the present invention provides targeted messages without the deficiencies that occur when merely associating ads with keywords.
The above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention. The scope of the invention is to be limited only by the following claims. From the above discussion, many variations will be apparent to one skilled in the relevant art that would yet be encompassed by the spirit and scope of the invention.