BACKGROUND OF THE INVENTION1. Field of the Invention[0001]
This invention generally relates to the field of computer based search systems, and more particularly relates to a system and method for visually searching knowledge bases.[0002]
2. Description of Related Art[0003]
Knowledge repositories are collections of information in computer system for capturing expertise in some strategic area. Their purpose is to retain and disseminate expert information to a user, such as within a corporation. An example of a knowledge repository in a computer software consulting company would be a set of documents describing what strategies have been used effectively in the past to win business away from each of the company's chief competitors.[0004]
The elements of the knowledge repository under consideration here are in electronic form and available for distribution. The size of the repository could outnumber several million separate pieces of information. Typically, the vastness of the knowledge repository causes users to rely primarily on search engines to retrieve information.[0005]
A typical search engine possesses a user interface with a search window where the user enters an alphanumeric search expression or keywords. The search engine sifts through a database of information for the user's search terms, and returns the search results in the form of HTML pages. Each search result includes a list of individual entries that have been identified by the search engine as satisfying the user's search expression. Each entry or “hit” includes a hyperlink that points to a location within the database. In addition to the hyperlink, certain search engine result pages include a summary or abstract that describes the content of the document. Other information may also be returned as part of a search result in response to a user's request.[0006]
Although search engines have made tremendous strides at improvement in recent years, the user interface can still be very complicated to use. Primarily, these interfaces place too many burdens on the inexperienced user to know which words to enter to retrieve only the best documents. As a further complication, the form of the result pages may not adequately indicate which documents truly contain the most relevant information for the user. Then, the user is forced to manually sort through a long list of results to try and determine what is most important to him. Additionally, for complex searches, users may even be called upon to employ Boolean operators, all of which serve to further confuse users.[0007]
Therefore a need exists to overcome the problems with the prior art as discussed above, and particularly for a user interface that visually displays corresponding search results and guides the user in exploring a knowledge repository.[0008]
SUMMARY OF THE INVENTIONAccording to a preferred embodiment of the present invention, a computing system and method provides a visual display for exploring a knowledge repository. This preferred method assumes that a knowledge expert has created a set of multiple partitionings of the knowledge repository according to the content of the knowledge elements, as well as a separate dictionary of terms for each partitioning. The method comprises the steps of accepting a natural language query, calculating the distance between the query string and a plurality of categories in partitions of a knowledge repository; and displaying, via a radial graph, at least one category that comprises a nearest distance to the query. Further, in response to the user selecting a category, the method visually displays all matching elements in that category along with its nearest neighbor categories in a scatter plot. This preferred method advantageously guides the user to better refine search queries.[0009]
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a block diagram illustrating a visual search system in accordance with a preferred embodiment of the present invention.[0010]
FIG. 2 is a more detailed block diagram showing a database server in the system of FIG. 1, according to a preferred embodiment of the present invention.[0011]
FIG. 3 is a more detailed block diagram showing a computer system in the system of FIG. 1, according to a preferred embodiment of the present invention.[0012]
FIG. 4 is a more detailed block diagram of a visual search application in the system of FIG. 3, according to a preferred embodiment of the present invention.[0013]
FIG. 5 is an exemplary visual user interface illustrating a radial graph representation of a search query response according to a preferred embodiment of the present invention.[0014]
FIG. 6 is an exemplary visual user interface illustrating a scatter plot representation of a search query response according to a preferred embodiment of the present invention.[0015]
FIG. 7 is an operational flow diagram illustrating an exemplary operational sequence for the system of FIG. 1, according to a preferred embodiment of the present invention.[0016]
DESCRIPTION OF THE PREFERRED EMBODIMENTSThe present invention, according to a preferred embodiment, overcomes problems with the prior art by guiding the user through a visual representation of all elements contained within a knowledge repository. The user may have little expertise in the area of interest to him. The user begins the search by entering a keyword or words of interest. For example, assuming the user was interested in wireless paging applications, the user would enter the word “wireless”. The program could then initiate a search, in a previously partitioned database located either locally to the computer system, on a server located within the company Intranet, or even a remote Internet site. The elements may be categorized in a multitude of manners, such as by subject, country of origin, author, etc . . . much as a card catalog in a library system. Each category constitutes a partitioning, and each partitioning contains its own unique dictionary of terms.[0017]
As a result of this search, the user would be presented with a visual representation of the most relevant categories, and conceptual alternatives, in the database, in which the word “wireless” appears. The user then provides additional information, gradually, to winnow down the number of possible matching elements. Finally, the invention highlights specific areas of retrieved elements, which would be of greatest relevance to the user's previous input.[0018]
A significant aspect of the preferred embodiment is the approach of overlaying concept maps on top of a knowledge set, helping the user to understand the knowledge space and better refine his queries. The program then retrieves not simply element titles, but also the most relevant phrases within the most relevant elements. Such focused retrieval is important to prevent “knowledge retrieval overload”.[0019]
FIGS. 1, 2, and[0020]3 illustrate an exemplary visual search system according to a preferred embodiment of the present invention. Thevisual search system100 includes acomputer system102, having avisual search application318.Computer system102 may be communicatively coupled with adatabase system106, via a localarea network interface104. The localarea network interface104 may be a wired communication link or a wireless communication link. Additionally,computer system102 may also be communicatively coupled with awide area network108 such as the Internet, a wired, wireless, or combination of wired and wireless communication links via a wide areanetwork communication link110.
[0021]Computer system102 includes controller/processor320 (shown in FIG. 3) which processes instructions, performs calculations, and manages the flow of information through thecomputer system102. Additionally, controller/processor320 is communicatively coupled withprogram memory312. Included withinprogram memory312 are a visual search application318 (which will be discussed in later in greater detail),operating system platform314, andglue software316. Theoperating system platform314 manages resources, such as the data stored indata memory328, the scheduling of tasks, and processes the operation of thevisual search application318 in theprogram memory312. Theoperating system platform314 also manages a graphical display interface (not shown), a user input interface (not shown) that receives inputs from thekeyboard306 and themouse308, and communication network interfaces (not shown) for communicating with thenetwork links104,110 respectively. Additionally, theoperating system platform314 also manages many other basic tasks of thecomputer system102 in a manner well known to those of ordinary skill in the art.
[0022]Glue software316 may include drivers, stacks, and low level application programming interfaces (API's) and provides basic functional components for use by theoperating system platform314 and by compatible applications that run on theoperating system platform314 for managing communications with resources and processes in thecomputing system102.
Network interfaces (not shown) communicatively couple the[0023]visual search application318 in thecomputer system102 with a look-upsystem202. The look-upsystem202, according to one alternative embodiment, comprises aknowledge database204; aclassification record210, communicatively coupled to theknowledge database204; adictionary space206 communicatively coupled to theknowledge database204 and to theclassification record210; acentroid record212, communicatively coupled to theknowledge database204, theclassification record210, and thedictionary space206; and avector space208 communicatively coupled to theknowledge database204, theclassification record210, thecentroid record212, and thedictionary space206. Note that as shown thedatabase server system106 is shown on a local area network. According to a second embodiment, a look-up system may comprise a remote search server system (not shown). Alternatively, the look-upsystem202 may be located in thecomputer system102 resident inprogram memory312 and managing data in a database memory (not shown) as part of thedata memory328.
The[0024]knowledge database204 comprises multiple partitionings, defined according to different aspects of their content. There are many well-known methods for partitioning knowledge databases based on content in an automated fashion. One possibility is that a series of keyword queries (one for each category of the partition) may be fashioned and used to identify which elements belong in each category. Another possibility is to use text-clustering techniques to create a set of element clusters. A third possibility is to use a combination of these two techniques combined with manual intervention by a human expert, as taught by U.S. patent application Ser. No. 09/429650, “System and Method for Interactive Classification and Analysis of Data” filed on Oct. 29, 1999, the entire teachings of which are hereby incorporated by reference, to determine, on a case by case basis, which elements fall into which category. Theclassification record210 contains a listing of which elements are contained in which partitioning.
The[0025]dictionary space206 comprises a set of separate dictionaries of terms; one dictionary for each partitioning of theknowledge database204. These dictionaries may be created by first finding the most frequently occurring words in theknowledge database204, then pruning out those words that have little or no relevance for purposes of the classification. Automated counting methods, as taught by U.S. patent application Ser. No. 09/629831, “Method for Generation of an N-Word Phrase Dictionary from a Text” filed on Jul. 31, 2000, the entire teachings of which collectively are hereby incorporated by reference, can help determine word relevance by counting co-occurrence of dictionary terms and classes.
The[0026]vector space208 comprises a set of records, one for each partitioning, which contain a “vector” of equal length to the size of that partitioning's corresponding dictionary. All elements within a partitioning are mapped to each term within that dictionary; with non-zero values equal to the number of times each dictionary term occurs in the knowledge element.
The[0027]centroid record212 contains a listing of the average word occurrence of each term in each partitioning of theknowledge database204.
FIG. 4 illustrates the exemplary[0028]visual search application318 according to a preferred embodiment of the present invention. The visual user interface/event manager404 is structured to receive all visual user interface/event manager404 events, such as mouse movements, keyboard inputs, drag and drop actions, user selections, and updates to thedisplay304. Visual user interface/event manager404 is also structured to receive match records, from the result setmanager410, which will be discussed subsequently, representing the result set326 for a user initiated search request. These results are then displayed to the user via thedisplay304. The visual user interface/event manager404 is communicatively coupled withquery handler406,application programming interface412, and result setmanager410.
After the user has requested a keyword search, the[0029]query handler406 is invoked from the visual user interface/event manager404 to initiate the user's search request. Input to thequery handler406 can preferably be a text based search query. The preferably text based search query is parsed to determine which dictionary terms occur within it, then sent to the Application Programming Interface412 (API) designated for communication with the look-upsystem202, and then to the look-upsystem202 which has been discussed previously.
The[0030]vector calculator408 then evaluates the relative nearness between the query string vector and each category centroid of each partition, then sends the result set326 to the result setmanager410. The M nearest categories are sent to the visual user interface/event manager404 for displaying to thescreen304.
FIG. 5 is an exemplary visual user interface illustrating a[0031]radial graph representation322 of a search query response according to a preferred embodiment of the present invention. Theradial graph representation322 comprises akeyword hub502; a set ofweighted spokes504, radially coupled to thekeyword hub502; and a set ofrelated concept boxes506, eachrelated concept box506 being coupled to opposite end of oneweighted spoke504. Therelated concept box506 comprising atop portion508 and abottom portion510, thetop portion508 containing the partition name and thebottom portion510 containing the category name. The relevance of eachconcept box506 is indicated by the appearance of the weighted spoke504 coupling theconcept box506 to thekeyword hub502. For example, a yellow line504 (indicated in FIG. 5 as a double line) would indicate a greater relative similarity to the query than ablack line522. Thick spokelines504, followed by thin524, followed bydotted lines520 indicate decreasing similarity within a color. In the example shown in FIG. 5, the query “wireless” is most nearly related to the categories “Broadband”514 and “Voice Recognition Technology”516 within the partitioning “Buzzwords”518.
FIG. 6 is an exemplary visual user interface illustrating a[0032]scatter plot representation324 of a search query response according to a preferred embodiment of the present invention. The knowledge elements of a selected category, along with its nearest neighbor categories, are displayed aspoints608 in a scatter plot fashion. Text from a corresponding element is displayed in a pop-upwindow606 in response to the user moving amouse308 over apoint608 in the plot.
FIG. 7 is an operational flow diagram illustrating an exemplary operational sequence for the system of FIG. 1. The system enters the sequence at
[0033]step700, wherein a user is communicating via a
user interface402 with the
computer system102. The user operates the user interface, such as the
mouse308 or
keyboard306 to enter a search query string. A
query handler406 parses the query string, at
step702, and the
vector calculator408 evaluates the relative nearness between the query string and the categories of each partition. A numeric “nearness” value for the query string to all categories in all partitionings is calculated by finding the average word occurrence (centroid) within a partition of elements and comparing this to the query string vector using the cosine distance metric to determine the distance (d) between vector (X) and vector (Y):
The M nearest categories are selected for display to the user as a[0034]radial graph322.
The user then provides additional information, at[0035]step704, by “clicking” on theconcept box506 of interest. This additional information helps to narrow the search further, without the user needing to provide additional search terms.
A preferred embodiment of the invention then presents the matching elements and nearest neighbor categories, at[0036]step706, in avisual display324 as illustrated in FIG. 6. The neighboring categories are chosen by measuring the relative distance of the mean of the selected class to the mean of every other class using the distance metric described above. Each point in the plot represents a knowledge element and its relative position in the dictionary space. The high dimensional dictionary space is reduced to two dimensions by use of a visualization mechanism described in U.S. Pat. No. 6,100,901, “Method and Apparatus for Cluster Exploration and Visualization”, filed Jun. 22, 1998, the entire teachings of which are hereby incorporated by reference. Alternatively, the search results could be presented as a list of elements with a corresponding distance calculation displayed.
The preferred embodiment could also calculate the terms within the selected partitioning that have the highest occurrence correlation with the query. The user, at[0037]step716, could then choose to add these suggested terms to the search query string and begin a new search. This further refines the search request to more suit the user's needs.
The user, at[0038]step708, then moves his mouse over points in the plot. Text from the corresponding knowledge element is displayed, atstep710, in a pop-upwindow606. The text is found by searching the original knowledge element until the first occurrence of a user query term is found. This term, plus the next N characters are included in the display string, adding enough characters to insure that the last word is not truncated.
Then, at[0039]step712, the user clicks on apoint608 in the visualization to select a knowledge element. The entire text of the original knowledge element is displayed, atstep714, with the query words highlighted. The user may repeatsteps700,704,708, or712 at any time.
The present invention can be realized in hardware, software, or a combination of hardware and software. A system according to a preferred embodiment of the present invention can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system—or other apparatus adapted for carrying out the methods described herein—is suited. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.[0040]
The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods. Computer program means or computer program in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following a) conversion to another language, code or, notation; and b) reproduction in a different material form.[0041]
Each computer system may include, inter alia, one or more computers and at least a computer readable medium allowing a computer to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium. The computer readable medium may include non-volatile memory, such as ROM, Flash memory, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits. Furthermore, the computer readable medium may comprise computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network, that allow a computer to read such computer readable information.[0042]
Although specific embodiments of the invention have been disclosed, those having ordinary skill in the art will understand that changes can be made to the specific embodiments without departing from the spirit and scope of the invention. The scope of the invention is not to be restricted, therefore, to the specific embodiments, and it is intended that the appended claims cover any and all such applications, modifications, and embodiments within the scope of the present invention.[0043]