RELATED APPLICATIONSThis application is a continuation of U.S. patent application Ser. No. 11/702,509, tiled on Feb. 6, 2007, the contents of which is hereby incorporated by reference.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates to a method and system for conducting searches. More particularly, the present invention relates to a method and system for conducting a search for information by selecting search terms from amongst all the terms in the search results themselves.
2. Background of the Related Art
A keyword search of the Internet or other electronic media, using well designed all-purpose search engines such as Google (http://www.google.com), AOL (http://www.aol.com), or Yahoo (http://www.yahoo.com), often return thousands or even millions of hits. In part, this is because users enter only general search terms when they are looking for information.
When users do not anticipate the breadth of the search terms they submit, the search results or “hits” may be unfocused. The hits often span more content than is wanted. Only some of the selected content will be on topic for a particular user at a particular time. For example, a search for “file folder” may return hits about how computers are organized and lists of office supply stores; a search for “bass” returns hits about fish mixed with hits about musical instruments and a particular chain of shoe stores. When users skim such lists of hits, they may become aware that their search was too broad. They know that their results cover several topics and that some of them are not of interest.
Several methods exist for reducing hit lists when hit lists are heterogeneous. The user can construct and enter a new search string that includes words that discriminate between what they want at that particular time and what they do not want. For example, “file folder” might be changed to “file folder computer” and “bass” might be changed to “fish bass.” Even though these phrases are not syntactical, both produce more focused search results.
Google puts a search-within-results option at the bottom of its pages. A user can reduce their hits by typing words that modify their original search. In addition, with most browsers, a user can highlight words that appear on a page of hits and drag them to the search box where they can be added to the original search string. For example, it might be efficient to add “muscular dystrophy” to the keyword, “heart,” by highlighting the phrase when it appears in a hit list and dragging it to the search box. This strategy is slow but it prevents typographical errors.
At this time, there are at least two technologies that help users reduce and focus hit lists without entering new information into the search box. Google provides a link to “Similar pages” at or near the end of the listings on a page of hits generated by the Google search engine. Clicking on the “Similar pages” hyperlink produces a list of 30-35 hits that have content that is similar to the one hit that has been identified as relevant by the user.
However, the user does not have a choice of what words are used to identify hits that are defined as “similar” when they click on Google's “Similar pages” hyperlink. They do not even know for certain what word or words are used by Google to define “similar” hits.
Another technology for helping users select content is to put a list of specific sub-topics, sometimes called clusters or facets, on the page of search results. For example, Yahoo often puts suggested sub-topics at the top of a hit list if the search words submitted by a user are very general. Also, Vivisimo, Inc. (http://www.vivisiomo.com), Endeca, Inc. (www.endeca.com), Siderean, Inc (www.siderean.com) and many other search engines display lists of specific sub-topics as a basic design feature of their search technology. Their sub-topics are developed to meet business and design goals in a variety of situations. For example, LexisNexis Academic (www.lexisnexis.com) recently announced a new user interface, available in the summer of 2007, that will cluster news, legal and business information by subject, industry and company (www.econtentmag.com).
Sub-topics are defined by the algorithms of a particular search engine and their display is under the control of the search engine. The user can select a specific topic that is consistent with the search they intended to make if one of those displayed topics expresses what they are searching for. Doing so will reduce the hit count and focus their search.
Lists of sub-topics, (clusters) as they are shown by sites such as Yahoo, Vivisimo, Endeca and Siderean also have disadvantages. Users may not find a choice that is helpful. There is a practical limit to the number of sub-topics that can reasonably be displayed on a first page. Designers often truncate lists or direct users to other pages by putting a “more” hyperlink at the bottom of a short list of popular sub-topics in order to accommodate as many sub-topics as possible. The user then must take the time to page back and forth to see all their choices. Even when additional pages are used, all possible sub-topics can not be listed if the hit count is large or if the search results are heterogeneous. One of the ways search engines control the numbers of sub-topics they display to the display only the most popular ones.
U.S. Pat. No. 5,278,980 to Pedersen, et al, describes a “phrase oriented” search technique to help reduce and focus the hits returned from a keyword search. The technology identifies one search term in each hit (called a non-stop word) that is immediately adjacent to the keyword used to produce the list of hits. For any of the hits, the user can either select the adjacent search term and reduce the hit list or execute a particular function key to add the next most adjacent search term in that particular hit to the display. Pedersen et al. also include a variety of rules that account for situations where a keyword search starts with multiple words. The process of either selecting or “extending” a display in order to refine a search can be repeated multiple times.
Pedersen et al. “disambiguate” the meaning of the keywords and to avoid distracting the reader by cluttering the display . However, the fragmentation of the display and the interactions required of the user are not as easy to understand or to use as other techniques for reducing hit lists.
SUMMARY OF THE INVENTIONThe present invention allows users to select words or other terms they see on an electronic medium and use them to search electronic data simply by clicking on the terms they want to use when they see them in an original display of information.
One characteristic of the present invention is that it does not rely on predefined categories of information that are produced by the search engines to reduce a list of search results. With this invention, the user picks terms directly from the text that is presented for the purpose of summarizing the content of the search results. The user is guided to reduce the quantity of information in a hit list by identifying and selecting search terms directly from the text that is presented to summarize the content. The user can reduce the size of the search by selecting search terms they recognize in the display of the text. The text may be a hit list or some other form of information, such as a news summary from an online news paper.
This invention has at least four advantages over and above the advantages of other technologies:
1.) More unique combinations of search terms arc possible with these self-constructed search strings than with methods that pre-define sub-topics.
2.) Users know exactly what terms are being used to refine their search.
3.) Choices are made with a click of a mouse, so that typing on a keyboard is not required.
4.) No additional space is required to show the user how to focus the hit list and reduce the hit count. That is, repetitive prompts, such as a “Similar Pages” prompt at the end of each hit or a separate list of sub-topic choices, such as those shown by Visisimo or Endeca are not required. Saving display space is especially advantageous where display space is limited, such as when search results are displayed on a handheld device.
The number of search terms that can be applied to a search is limited only by the data itself; there is no technical or practical limit to the number of search terms that are available to the user. The user has much more opportunity to apply their own meaning and interpretation to a search than they have using other methods to reduce hits counts and focus information.
Selecting from choices that appear in context rather than selecting choices that appear in sub-topic lists makes search more transparent than other technologies. It gives users the latitude to select groups of hits based on their own knowledge of what they want from the content. Selecting in context, from existing sentences and descriptions, also preserves phrases and multi-part names that may be fragmented or overlooked when sub-topics are constructed by extracting only high-count words and listing them for users. Examples will be given in connection with the material inFIGS. 4 and 5.
There are many ways to visually indicate that terms are selectable. In the preferred embodiment of the current invention, selectable search terms are identified by a change in the shape of the cursor when the cursor hovers over selectable words in the text. Since a large number of terms are selectable from the brief descriptions that are shown, it is a design preference to indicate selectable words with only a change in the cursor's appearance and to avoid changing the appearance of the selectable terms by adding font or color changes, bolding or underlining. Multiple changes tend to clutter the appearance of the display when a large number of choices are available. Positioning techniques, timing features and other details of a cursor that is positioned over a word in a display have been described, for example, by Todd, et al, in U.S. Pat. No. 7,100,123.
Also, the visual features that indicate words can be search terms are always different than visual features that indicate a user can hyperlink to other information because the two technologies may appear side-by-side in the text Terms that can not be used to refine a search are not identified by distinctive features.
As an optional feature, the number of hits that will be selected can be shown to the user before a selection is made. The user will know how many hits they will be selecting in advance of selecting the search term. This is a useful feature because sometimes users choose one term over another because they prefer the smaller or larger number of hits selected by one rather than the other term. One way to display hit counts is through the use of anchor tags. The technology is described online, for example, by the W3C consortium at http://www.w3schools.com/tags/tag_a.asp.
Another optional feature is to allow the user to click an icon and reverse a search decision that has been made once they see its effects. One way of doing this is illustrated at www.clusteredhits.com where, after reducing a hit list, a user can click an image that says “Undo Last.”
The present invention allows for the selection of obvious, common sense terms such as adding “guitar” to a keyword search using “bass.” The invention also allows for the selection of unusual and unlikely refinements such as adding a location, a style description, a brand, the year built or the ownership history to the search of “bass guitar.” If lists are displayed using ranking algorithms, popular search terms tend to be at the top of the list.
The present invention has particular advantages for people searching for unique combinations of terms. They are not limited to popular or obvious pre-created sub-topics. For example, medical researchers and other scientists may find topics that rarely occur in databases and develop new areas of exploration. This new technology encourages serendipitous discovery in all fields of knowledge.
Search strings arc self-constructed. Users work from their own experience and choose only the best terms from their own point of view. Users are not limited to search terms that are identified by someone else or pre-selected automatically by computer algorithms. This flexibility allows for more combinations of search terms than other technique. Users pick the terms that meet their own interests. For ease of description, this very flexible process is sometimes called cherry picking. The user is able to reduce the hit list many times and in many ways to produce an individualized, useful sub-list of hits.
These and other objects of the invention, as well as many intended advantages thereof, will become more readily apparent when reference is made to the following description, taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE FIGURESFIG. 1 shows a general configuration of computer hardware and software used in accordance with the present invention.
FIG. 2 is a flow chart showing the decision rules for implementing the invention.
FIG. 3 shows four contiguous hits displayed when the word, “bass” was used as a keyword to search a large, general purpose search site.
FIG. 4 shows four contiguous hits that were displayed when the words “business intelligence” were entered as keywords on a smaller, special purpose search site and then filtered to display only non-fiction hits.
FIG. 5 shows part of a newspaper page in accordance with the invention.
FIG. 6 shows instructions displayed to users.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTSIn describing the preferred embodiments of the invention, specific terminology will be used for the sake of clarity. However, the invention is not intended to be limited to the specific terms used. It is understood that each specific term includes all technical equivalents that operate in a similar manner to accomplish a similar purpose.
It is also understood that even though Internet examples are generally used in this description to illustrate the invention, the same technology applies to intranets and other non-public electronic search environments. The same technology applies to a self-contained electronic device where the communication link is between the display portion of the device and the processing portion of the device. Also, the technology can be applied as the sole method of selecting search terms or it can be used in combination with other methods of selecting search terms.
The following definitions are used to illustrate the invention and to facilitate description of the invention, but are not intended to limit the scope of the invention or the meaning of the claims. Other language could be used. A “term” refers to any symbol, image or word that conveys information. A “stop word” is any small and/or frequently occurring word that is not used as a search word, such as “and” and “the.” “A “search string” is one or more terms used to conduct a specific search.
The system and method of the present invention is implemented by computer software that permits the accessing of data from an electronic information source. The software and the information in accordance with the invention may be within a single, free-standing computer or it may be in a central computer networked to a group of other computers or other electronic devices. The information may be stored on a computer hard drive, on a CD ROM disk or on any other appropriate data storage device.
FIG. 1 shows the overall implementation of the invention. Thesystem100 preferably is implemented as a computer network having a plurality ofclient computers120 networked to one or more remotely locatedservers110 by acommunication link150 and bi-directional communication lines130. Theclient computers120 have amemory121,processing capacity123, adisplay device125 and apointing device127. The server(s)110 have astorage capacity112,memory116 andprocessing capability119. As a practical matter, theservers110 also have adisplay device117 and a keyboard (not shown). Thecommunication link150 is preferably the Internet, thedisplay devices125,117 may be monitors or the like, and thepointing device127 may be a mouse. The information that is to be searched can be stored in theserver110 or can be information that is available from other locations.
Thecommunication link150 and thecommunication lines130 provide two way communication between theclients120 and theserver110. Thelink150 is established when aclient120 accesses the server at itselectronic address118. This is done, for example, by entering the Internet address of theserver118 using a Web browser. Memory in theserver116 is optionally allocated so that theserver110 may retain the status of search requests generated byindividual computers120 during any individual search session.
FIG. 2 is a flow chart depicting a preferred operation of the system in accordance with the invention. The user has entered a keyword search or submitted a query by another means from theclient120. Information is transmitted to theserver110. The server conducts a search and identifies the information, preferably in the form of a list of hits. At least part of the list is transmitted to theclient120.
The system starts at step202 inFIG. 2. Atstep204, each term generated from the search is examined. Theserver110 preferably examines each term in turn and determines whether or not each term is a selectable search term based on four decisions, steps208-214 inFIG. 2.
If the word is a stop word (Yes at step208), it is not selectable. One way to make this determination is to compare all terms in the text of the hit list to a list of stop words maintained for that purpose.
If the term is not a stop word (No at step208) the term is then examined to see if the term is already in the current search string. If it is in the current search string (Yes at step210), then it is not selectable. This decision is made to avoid search redundancy. If a term has already been used in a search, there is no reason to search on it again.
If the term is not in the search string (No at step210), the term is examined to see if searching on it will have an effect on the hit list that is displayed,step212. One way to make this determination is to generate hit counts for each term that is not a stop word and compare the hit count of the target term with the total hit count of the current search string. If they are the same, the term will not be designated selectable (No at step212) because searching with it will have no effect of the outcome of the search.
If the term will have an effect on the hit list (Yes at step212), the term is examined to see if it should be disregarded for any other reason. As an example, in the preferred embodiment, words are disregarded if they change the hit list by only a very small number of hits. For instance, if the current hit list consists of 500,001 or more terms and if searching on a specific term will reduce the hit list by 100 or fewer hits, a Yes decision is made at214. Setting the lower search limit in this way allows search companies to prevent users from choosing terms that make very little difference in the outcome of their search. It also gives search companies some control over the volume of search activity on their servers. Another example of a situation-specific reason for not making terms selectable (Yes at214) is shown inFIG. 5. If a newspaper wants to separate news content from editorial opinion and encourage the selection of the former, the invention can disregard adjectives and adverbs that appear in an online searchable newspaper.
If the decisions atsteps208,210 or214 are Yes or if the decision atstep212 is No, the term is not made selectable,step206, and another word is examined beginning at204. The order of executing steps208-214 may be changed to accommodate efficient searching. For example, all stop words may be identified and set aside,step208, before execution of steps210-214
If the term is not disregarded for any other reason (No at step214), theserver110 marks the term to be a selectable term. The function and the appearance of the cursor is changed so the term is selectable and so that the user can tell that it is selectable216. In the current implementation, the shape of the cursor is changed from a straight line to an icon of a hand with a pointing index finger.
Atstep218, a display is generated on the client'sdisplay device125. The user can move the cursor over the text and pause on the terms one-at-a-time220. If a selectable term is selected (Yes at step222), the selected term is added to the search string and the search is updated to reflect the additional term,step224. This is preferably done by having theserver computer110 search the hit list using the additional term that was selected. The number of hits is computed and a new search results page is sent to theclient computer120, and step226. The process can be repeated from Start, step202.
FIGS. 3-6 further illustrate the invention. InFIG. 3, four hits from a hit list are displayed. They were generated by searching a large, general purpose Internet search engine using the keyword “bass.” In this example, approximately 140 Million hits were found. The four hits shown in the figure are typical. Stop words occur in every hit that is shown. The stop words are identified in the first full line of the first hit inFIG. 3 as elements311-316. The remaining words in that line—i.e., leading, provider, casual, dress, footwear, men, women, children—are selectable search terms with the exception of the word “bass.” “Bass” is already a search term and is disqualified as a new search term according to the decision rule atstep210 ofFIG. 2. As further illustrated inFIG. 3, the user has placed the cursor over the word, “Steinberger”320. The shape of the cursor has been changed but the word has not been underlined, indicating the term is a selectable term. Theanchor tag321 indicates that there are 1,115 hits that contain that term in the currently selected set of 140 Million hits.
Hit lists ordinarily display only a small amount of text about each hit. However, the brief descriptions are sufficient information for the user to make decisions about how to proceed. For example, the user who enters “bass” as a search term on a general purpose search engine,FIG. 3, learns that there are a large number of hits about bass fish, bass shoes, and bass guitars. With this invention, the user can click on “guitar” and consolidate the list to a few million hits. The user also learns from the text that there are names of people and places and technical information associated with bass guitars and that those terms can be used to reduce the set of hits.
For example, as illustrated, the user can select “Steinberger”320. The user will know in advance that “Steinberger,” gives them access to1,115 hits that contain the search words, “bass” and “Steinberger.”
FIG. 4 shows another example of information generated as a list of hits. It comes from a specialized database listing about 300,000 library books. The hits show book titles and sub-titles, authors, publication dates and library call numbers. The site also provides sub-topic lists that can reduce the number of hits generated from keyword searches. The four hits in the figure were generated by searching on the keywords, “business intelligence” and then selecting “Non-fiction” as a sub-topic to focus the list of hits.. Six-hundred-fourteen hits were found. The four hits that are shown inFIG. 4 are typical of the whole list.
These four hits from the library search show selectable words that can be used to reduce the search further if the user wants to focus the hit list and have fewer hits than the current count of614. One search term, “Trade” is shown proximate to thecursor420. The anchor tag,421, indicates that there will be58 hits if the user clicks on “trade.” Several other available search terms that could be applied to reduce the 614 hits in the example, are shown at431-437. They include: American, cultural, people, value, global, work, and growth.
It is important to recognize that users will not ordinarily think to enter most of the words that are identified as selectable search terms. Users benefit from seeing selectable search terms such as “people”433, “global”435, and “growth”437 displayed in the hit list. Users think of how to narrow their search when they see search terms displayed.
Also, seeing terms in context often provides more meaning than seeing lists of terms that have been extracted and put in a separate list. For example, inFIG. 4, the selectable term, “people”433 appears in two places. It is used in the second hit in the context of “people skills for global business” and in the fourth hit in the context of “people from other cultures.” These two phrases give the user subtle information about what kinds of information they will probably see if they add the term, “people” to their search. If the user puts the cursor over “people” and learns that there are only a few hits out of614 containing the word “people,” they will probably conclude that each of the hits are likely to be about people in groups, not people by name. If what they are looking for is people by name, the user will probably select some other term. Perhaps farther down the list, they will see the word, “executives” or “CEOs” and choose one of those terms hoping they will find people by name, if that is their goal. Seeing terms in the context of the original text allows users to include word tense, word position, word relationships and other subtle meanings of phrase and sentence structure in their decisions. Context gives more information about the author's use of a term than lists of sub-topics that are taken out of context and located separately on the page as sub-topic choices.
The present invention can be used by itself as illustrated inFIG. 3 or with sub-topics, as illustrated inFIG. 4. This invention fills a gap in search technology that cannot be filled by existing technologies. This invention allows a user to find and select any term that is selectable, no matter how often or how seldom it appears. That includes the ability to find unusual and obscure search terms that. because of space limitations, will not be placed in a list of sub-topics. Users benefit from being able to click on any selectable term when they see it in the context of the hit list and conduct a search that includes that term.
Taken together, the examples inFIG. 3 andFIG. 4 also illustrate that this invention can be successfully applied on sites that give only a small amount of information about each hit in a display of hits. The user can generate meaningful search terms from brief descriptions no matter how large or small the data set.
FIG. 5 illustrates the operation of the invention on web sites that are not provided primarily as search sites. InFIG. 5, part of a page from an online newspaper is shown. Selectable search terms are identified in news copy just as they are on the search sites shown inFIGS. 3 and 4. One such search term, “White,” is shown inFIG. 5 at520. Ananchor tag521 shows that the hit count for “White,” will be47.
In this example of news information, as a design choice of the newspaper, terms that are not selectable have been expanded according to step214 inFIG. 2 to include adjectives and adverbs as shown at530 and532. Excluding adjectives and adverbs tends to promotes the selection of news and exclude the selection of editorial comment.
When a site is not set up as a search site, the scope of the search needs to be defined. In the example shown, it may be an amount of time such as the previous 24 hour day, the past week, the past month or some other amount of time. Other limits, such as certain sections of the paper may also be specified.
The ability to preserve phrases and multi-part names is one of the features of this invention. Its value becomes apparent when news copy is being searched inFIG. 5. For example, selecting “White”520 always selects the phrase, “White House.” Also, as a result of the decision rule atstep212 ofFIG. 2, if “House” does not appear in any other context, that is, if White and House have the same hit count. “House” will be changed into a non-selectable word,step212. The phrases “war strategy”540, “Golf Coast”542, and “Consumer Electronics Show”544, and the names “Saddam Hussein”546 and “David H. Petraeus”548 will also be preserved throughout the search. This feature of the current invention provides much more access to data in context than other technologies.
FIG. 6 shows directions that can be presented to users who are seeing this new technology for the first time. Directions will appear in a separate, smaller browser window the first time a user passes their cursor over a selectable word. A user can easily turn it off.
The foregoing description and drawings should be considered as illustrative only of the principles of the invention. The invention may be configured in a variety of ways and is not intended to be limited by the preferred embodiment. Numerous applications of the invention will readily occur to those skilled in the art. Therefore, it is not desired to limit the invention to the specific examples disclosed or the exact construction and operation shown and described. Rather, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.