RELATED APPLICATIONSThis application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/250,557 filed on Oct. 11, 2009 by the inventor of the present invention, the entire contents of which are incorporated herein by reference.
FIELD OF THE INVENTIONThe present invention relates to the field of document research, and more particularly to methods and systems for locating relevant classifications.
BACKGROUNDDocument research involves finding relevant subject matter within a set of documents as may be found in a document repository. Search engines, for example, use “key” words or phrases as search arguments to locate text passages containing those words or phrases. Classification systems provide another means for assessing context. In a classification system, documents with common threads are grouped together in classes. A field of context, therefore, can be narrowed by selecting relevant classes. Patents and patent-related documentation databases are examples of database repositories that implement classification systems. The most commonly used classification system for patents and published patent applications, at least in the U.S., is the USPTO (United States Patent and Trademark Office) Patent Classification System. Two other classification systems in common usage on the international scene include: the “IPC” (International Patent Classification) and the “ECLA” (European Classification).
Documentation classifications systems provide a means for improving the productivity of a document researcher. However, in many large-scale databases the classification system itself may be complex. Patents and patent-related documentation databases provide examples of such large-scale classified database systems with corresponding complex classification systems. The USPTO classification system currently comprises at least 984 classes and numerous digests (collections of certain subjects) within each class. Each class is broken into subclasses; each subclass may be further broken into subclasses and so on. Patents are thus grouped into categories, which are broken down into sub-categories, and sub-categories into more sub's, as required. The USPTO examiners decide the class/subclass in which to file a particular invention. To add further complexity, any one invention can be filed in more than one class/subclass, and most are filed in several classes/subclasses.
The challenge of performing document research in such a large-scale document repository, therefore is to develop an experienced understanding of the classification system. Existing classification analysis tools provide some assistance in navigating classification. See for example U.S. Pat. No. 7,333,984 to Oosta. A counting and sorting technique is shown inFIG. 8. However, the analysis is broad, and does not show a researcher where he or she needs to search, which is important because a patent search involves many iterations of polling a database, and with each iteration, the researcher should progressively narrow the size of the field of interest.
U.S. Patent Application 20020022974 to Lindh shows a method for display of patent information that involves applying statistical analysis to groups of references containing classifications. Lindh does not show additional cross referencing to a search history in order to locate unsearched classifications, which again is important in progressively narrowing and focusing a search.
U.S. Patent Application 20090313221 to Chen shows a patent technology association classification method. While Chen has shown the method of removing classifications and counting frequency, Chen fails to show the additional function of comparing classification frequencies to search histories, nor does Chen show additional broad and narrow reporting schemes for use at different stages of a patent search.
U.S. Patent Application 20080228724 to Huang et al. seek to assist a researcher in performing classification-based research. Huang shows a technical classification method for searching patents, which includes generating counts from a group of references. The method shows the researcher a quality of a search, but falls short in that Huang does not assist the researcher in locating additional classification areas to search in a next iteration.
U.S. Patent Application 20020073095 to Ohga shows a patent classification displaying method and apparatus having some similarities to the present invention. As seen inFIG. 4, the apparatus provides a classification counting system, wherein the most frequently occurring codes are sorted to the top of the list. Other systems, such as Thompson Delphion, have reporting features like this. Several critical components are however missing when viewed next to the present invention. First, the classification codes on the report should be cross referenced against a running tally of codes kept by the researcher in a given search project. With this additional function, the researcher sees not only relevant classifications, but also classifications that have not been searched yet. In addition, Ohga fails to show additional modes of class counting and weighting that are used at different stages of a patent research project, such that the researcher can use broad analysis in the beginning and narrow analysis during the iterative part of the patent search.
U.S. Patent Application 20010027452 to Tropper shows a system and method to identify documents in a database which relate to a given document by using recursive searching and no keywords. While Tropper realizes the benefits of using latest search results to form new searches, he fails to teach the accumulation of classification codes, weighting the codes, ranking of the codes and then comparing the rankings to the researchers search history.
A need thus exists for an improved classification analysis system, not only for the less-experienced document researcher, but also for the efficiency of those with established skill and experience with a particular classification system. Embodiments of the present invention address many of the shortfalls in the prior art while presenting, what will hereinafter become apparent to be, a pioneering document analysis technology.
BRIEF SUMMARY OF THE PRESENT INVENTIONIt is a first object of the present invention to provide a classification analysis system that equips a researcher with broad scope reporting for the initial phase of a search project. It is a second object to enable the researcher to progressively narrow the scope of the search project. Yet another object of the present invention is to enable the researcher to track a classification search history such that duplication is avoided. Still another object of the present invention is to provide a system of narrow classification analysis cross referenced against the classification search history. Yet another object of the present invention is to enable the researcher to effectively cycle through the narrow phase of a search project. Still another object of the present invention is to provide a system that permits the researcher to confidently end a classification based search project.
The present invention provides a system and method for efficiently and accurately identifying relevant document classifications. The system receives one or more classified reference documents in a document set along with a relevancy indicator for each document. The system retrieves all document classifications from the document set, and arranges a classification analysis interface. The researcher has four modes for the interface, which are called: Main, Parents, Subclass, and Primary mode—wherein Main is the broadest and Primary is the narrowest. The researcher is provided GUI tools to select classification codes from the classification analysis interface, and add them to a classification search history which is stored along with the document set in a project file.
In use, the researcher uses the Main and the Parents mode during the first hour of the search project, and the Subclass mode for the remaining 3-4 hours. In the Main mode, the researcher is shown occurrence of main classes in the document set, which provides a broad base for class/text searching. In Parents mode, the researcher is shown common occurrence of parent sub-classifications of the document classifications, while the document classifications are not shown. With this information, the researcher can inspect child classifications of the parents in a classification schedule. For the bulk of the search project, the researcher uses the Subclass mode. In the Subclass mode, the document classifications are collected, counted, scored, and sorted—providing the researcher quick viewing of potentially relevant classifications. Once the researcher locates potentially relevant classifications, he or she executes searches in the newly located classifications, and then adds documents along with relevancy indicators to the expanding document set. The researcher then re-executes Subclass Mode classification analysis on the document set. The classification analysis module scores classification codes and then cross references against the classification search history. The resulting classification analysis interface is displayed along with various sensory indicators (e.g. a color) that show the researcher relevant classifications that are 1) un-searched, 2) partially searched, or 3) fully searched. In this manner the researcher may quickly determine where a next iteration in the search project should be directed. The researcher may continuously iterate through the process of locating new classification areas, searching the new classification areas, augmenting the document set with new documents, and then using the classification analysis tool to locate additional unsearched classification areas. The researcher is encouraged to add many (ie. 50-100) documents to the project file using a document management interface to tag even moderately relevant documents for the purpose of utilizing many hundreds of classification codes in the scoring. The process continues until the top 5-10 classifications presented by the classification analysis interface are indicated as fully searched, at which point the search project can be brought to a close. With the present invention, important classification areas are very difficult to overlook, regardless of the experience level of the researcher.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1A is a block diagram illustrating a document research system in accordance with an exemplary embodiment of the invention.
FIG. 1B is a sample of a document.
FIG. 2A is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2B is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2C is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2D is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2E is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2F is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2G is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2H is a diagram of a project file created and used by the present invention.
FIG. 2I is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 2J is an interface diagram in accordance with an exemplary embodiment of the invention.
FIG. 3A is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A.
FIG. 3B is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A.
FIG. 3C is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A.
FIG. 3D is a search indicator to sensory indicator color scheme table.
FIG. 4 is a block diagram illustrating a document analysis system in accordance with another exemplary embodiment of the invention.
FIG. 5 is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A.
DETAILED DESCRIPTIONReference will now be made in detail to the present exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings.
Referring toFIG. 1A, a block diagram is shown illustrating adocument analysis system1000, or search system, in accordance with an exemplary embodiment of the invention. Thedocument analysis system1000 comprises aclient device1010, which may be a computer. Theclient device1010 includes aclassification analysis module1012, aninterface module1014 and a user Input/Output (I/O)interface1018. By way of example, theclient device1010 may be a computing device having a processor such as personal computer, a phone, a mobile phone, or a personal digital assistant. Thedocument analysis system1000 may also comprise adocument provider1030, aclassification data provider1040 and anetwork1020. Thedocument provider1030 is configured to deliver one or more documents, labeled generally as1032. By way of example, thedocuments1032 may be electronic files containing patent data or any type of electronic file that contains textual data. SeeFIG. 1B for an example of adocument1032. As seen, thedocument1032 hasmultiple document classifications135 that are further divided into aclass136 and asubclass137. In addition, notice the body of the document is composed of multiple sections (eg. Abstract, description, claims), and that section are further divided intoparagraphs138. The textual data of eachdocument1032 includes content data and one ormore classification codes135. Thedocument provider1030 may be a remote server and may also include asearch engine1034 for retrieving the one ormore documents1032 from a document data repository (not shown) based on a search query. By way of example, the search engine may be that provided by the United States Patent and Trademark Office (USPTO) FreePatentsOnLine, Micropatent®, Delphian®, PatentCafe®, Thompson Innovation or Google®. Thedocument provider1030 may retrieve the document data from a local repository or from one or more remote documents repositories. Examples of such a document repository include patent databases including those provided by EP (European patents), WO (PCT publications), JP (Japan abstracts) and DWPI (Derwent World Patent Index for patent families). Moreover, thedocument provider1030 may be cloud based bulk storage system, such as Amazon Simple Storage Service.
Theclassification data provider1040 is configured to provide access to aclassification data repository1042. Theclassification data repository1042 may be a database or file storage element that stores hierarchicalclassification data entries1044. Eachclassification data entry1044 includes a classification code. Eachclassification data entry1044 may also include a classification code description field. Theclassification data provider1040 may be a remote server provided by the United States Patent and Trademark Office (USPTO). The classification data may be representative of a document classification system such as the Manual of Classification issued by the USPTO. The classification data provider may retrieve the document data from a local repository or from one or more remote documents repositories. It is noted that while shown as separate components, the document provider and classification data provider may be co-located on a single remote server.
Theinterface module1014 is configured to receive one ormore documents1032 from thedocument provider1030, and to retrieveclassification data1044 from theclassification data provider1040 by way ofnetwork1020. By way of example, the network may be the Internet. Theinterface module1014 may alternatively be configured to receive thedocuments1032 orclassification data1044 through the user I/O interface1018. In such an embodiment, thedocuments1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device. Thedocuments1032 may alternately be paper-based documents and may be provided to theinterface module1014 by use of a scanner (not shown) that is configured with the I/O interface1018. Theclient device1010 may also include adata storage element1016, which may be at least one of a computer readable medium and a memory. Theinterface module1014 may also be configured to receive a set of one or more concepts from a researcher by way of the I/O interface1018. The I/O interface1018 may also include at least one input device such as a keyboard, mouse, microphone or a touch screen for receiving the concepts from the researcher. Each concept is comprised of one or more text-based keywords or sets of text-based keywords which are used to determine the relevancy each of thedocuments1032. Theclient device1010 may alternatively include a document analysis module that generates statistical data based on the user-defined concepts and thedocuments1032. The statistical data may be used by the researcher to quickly assess the relevancy of eachdocument1032 to each of the user-defined concepts. The document analysis module may transmit the statistical data to theinterface module1014 which presents the data to the researcher by way of the I/O interface1018. The I/O interface118 may also include a display such as an LCD or CRT monitor configured to display a graphical user interface (GUI) for presenting information such as the statistical data to the researcher. The GUI will now be discussed in greater detail.
Referring now toFIG. 2A,FIG. 2B, andFIG. 2C,FIG. 2D,FIG. 2E,FIG. 2F andFIG. 2G, diagrams are shown illustrating adocument analysis GUI200 in accordance with an exemplary embodiment of the invention.FIG. 3A which illustrates anexemplary method1300 for performing classification-based document analysis will also be discussed. At a first step labeled as1310, theinterface module1014 may receive concept data from the researcher. Theinterface module1014 first generates adocument analysis GUI200 and displays theGUI200 to the researcher by way of the display device included with user I/O interface1018. As shown inFIG. 2A, thedocument analysis GUI200 includes adocument relevance interface220, adocument management interface250, and adocument image window254. As seen inFIG. 2F, the researcher may start a research project by entering onemore concepts272. Eachconcept272 may have one or more words or word groups associated therewith. As shown inFIG. 2B, thedocument analysis GUI200 includes akeyword entry interface210. Thekeyword entry interface210 comprises multiple rows of alphanumeric entry fields212. One ormore keywords213 may be entered by a researcher into eachentry field212, wherein eachkeyword213 is conceptually related such that each line represents akeyword group214. The researcher is also provided with auser thesaurus211 andweb thesaurus219. Theuser thesaurus211 can be edited and stored in thedata storage element1016, and theweb thesaurus219 may be accessed through thenetwork1020 by theinterface module1014. Five alphanumeric entry fields212 are shown to be filled inFIG. 2B. Eachconcept272 andcorresponding keyword group214 may be determined manually by the researcher or may be received from an external source. By way of example, the concepts may be reduced to a manageable number of concepts (e.g. 4-5 concepts).Keywords213 may then be chosen for each of the concepts and entered into one of thealphanumeric fields212 to form thekeyword group214. After entering each of the desired concepts, the researcher may then exit thekeyword entry interface210 and proceed to analysis of a set of documents based on the user-defined concepts.
At a next step labeled as1320 theinterface module1014 will receive one ormore reference documents1032. As discussed theinterface module1014 is configured to receive one ormore documents1032 from thedocument provider1030 by way ofnetwork1020. Theinterface module1014 may be configured to allow the researcher to request a predetermined set ofdocuments1032. By way of example, the researcher may initiate a request for a specific set of patent documents or a set of patent documents that fall within a specific category or classification. The researcher may also initiate a search of a remote document repository through a search interface window230 (shown inFIG. 2D) provided by thedocument analysis GUI200. The search may be initiated by entering a set of search parameters, such as keywords, into one ormore search fields232 located on thesearch interface window230. Boolean operators, wildcards and proximity indicators may be used to link the keywords together in logic sets. Thesearch interface window230 may also provide asearch assistance window234 that allows the previously definedkeywords213 to be added to the set of search parameters in response to a single user action (e.g. a mouse click). Thesearch assistance window234 thereby facilitates the loading of search parameters into the one or more search fields232. In addition, the researcher is provided with aclassification search history290, which contains a table for documenting the search project strategy (discussed in detail later). The researcher may pick classification codes from theclassification search history290. As discussed, theinterface module1014 may alternatively be configured to receive one ormore documents1032 through the user I/O interface1018. In such an embodiment, thedocuments1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device. Upon receiving the one or more reference documents theinterface module1014 will populate a document management table252 located on a document management interface250 (shown inFIG. 2E) withselectable rows253 each having information descriptive of one of the receiveddocuments1032. By way of example, each row may include areference document number255 anddocument title256.
At a next step labeled as1330 theinterface module1014 receives and stores data from the researcher that indicates relevancy of a currently selecteddocument1032 to the one or more user-defined concepts. As discussed, theinterface module1012 will populate the document management table252 (shown inFIG. 2E) withselectable rows253 each having information descriptive of one of the received reference documents. In the exemplary embodiment, the document management table252 also includes one or more additional columns for allowing the researcher to indicate (by way of a mouse-click or similar navigation event) the relevance of the currently selected document. Each row of the document management table252 may have arelevancy column257 that contains an input field for indicating an overall relevance of the associated reference document. By way of example theinterface module1014 may provide the researcher with the ability to select an indicia (e.g. using a drop-down menu list) such as “A” for highest relevance, “B” for suspected relevance, and “C” for uncertain relevance. Irrelevant documents may be marked with an “I” to place a marker in a project file205 (FIG. 2H) indicating that a reference document was reviewed. Each row of the document management table252 may also have one or more additional columns labeled generally as258 that contain an input field for indicating whether a specific concept has been verified to appear in the currently selected reference document. Theinterface module1014 may provide the researcher with the ability to toggle a field (one such field is labeled as259) corresponding to a specific concept “on” or “off” (e.g. by a mouse-click) when indicating whether aparticular concept272 does or does not exist inside the selected document. A column may be provided for each of the previously discussedconcepts272. As discussed, theinterface module1014 may provide the researcher with a concept management window270 (seeFIG. 2F) for allowing the researcher to definedifferent concepts272 which theadditional columns258 may be derived from. In this manner, the researcher is able to track higher-level or more abstract concepts than were initially defined and may also provide more user-friendly naming of the concepts272 (useful for example for report generation). Theinterface module1014 may also store the previously discussed relevancy indicators in theproject file205, which is located in thedata storage element1016 inFIG. 1A. By storing each of the indicators, theinterface module1014 is able to provide information to theclassification analysis module1012. Theclassification analysis module1012 will now be discussed in greater detail.
At a next step labeled as1340 classification analysis begins with theinterface module1014 first displaying aclassification analysis interface280, which is shown inFIG. 2G. Theclassification analysis interface280 can include aclassification search history290, which is retrieved by theinterface module1014 from theproject file205. Theclassification search history290 shows a previously identifiedclassification code291 and a corresponding previously identifiedclassification title292. Each previously identifiedclassification code291 also has a search extent indicator294 and asearch status indicator293, both of which can be manipulated by the researcher to various states. By way of example, if the researcher has already searched or plans to search previously identifiedclassification code291 in its entirety, he or she may indicate this with the word “Yes” in the search extent indicator294. In addition, the researcher may keep record of which previously identifiedclassification codes291 have been properly addressed with either text limited searching or full searching by similarly indicating in thesearch status indicator293. Theclassification analysis interface280 may include adocument selection field281 and a classification analysismode selection field282. Thedocument selection field281 provides one or more options to the researcher for selecting a set of documents which the classification analysis will be performed on. By way of example, the researcher may select all documents in theproject file205 that have previously been indicated to be relevant to any of the concepts272 (i.e. all documents selected in any of columns258), all documents relative to a specific concept (i.e. all documents selected in one of columns258) or documents that have been indicated to have a specific overall relevance (e.g. all documents having a relevancy of “A’ from relevancy column257). Theclassification analysis interface280 also has aclass weighting286 option and arelevancy weighting287 option. Theclass weighting286 instructs theclassification analysis module1012 to account for total size of a classification, which balances the effect of large classifications overshadowing smaller classifications in un-weighted frequency counts. Therelevancy weighting287 allows the researcher to assign greater weight in the scoring todocuments1032 of higher relevance recorded in therelevancy column257. The classification analysismode selection field282 provides one or more options to the researcher for selecting the mode of classification analysis to be performed. The most common mode is the Subclass mode which is discussed in the next step. (Detailed discussions all four modes are found immediately following.)
Step1340 may proceed after the researcher confirms the previously described classification analysis options. Theinterface module1014 then instructs theclassification analysis module1012 to perform classification analysis on the selected set of documents. Referring back toFIG. 1B,documents1032 have one ormore document classifications135 associated therewith, which can be further divided into aclass136 and asubclass137. Theclassification analysis module1012 will retrieve thedocument classifications135 from each document and then generate a count of instances of eachdocument classifications135 over the entire selected set. Theclassification analysis module1012 will then send eachdocument classification135 and its corresponding count or score to theinterface module1012 to be displayed (step1350) via theclassification analysis interface280 where each unique code will be displayed in a separate row. The unique codes may be displayed in aclassification code column284 while the corresponding score will be displayed in aclassification score column283. The rows may be sorted based on the score of each unique code. In an alternative embodiment discussed later, the score for each code may be multiplied by a weighting factor that accounts for the size of each subclass (ie the number of documents in the subclass) or by a weighting factor that accounts for the document relevance. Theinterface module1014 may also retrieve a classification description for each unique code from theclassification data provider1040, using each unique classification code to look up the correspondingclassification code entry1044. The classification description may also be displayed in aclassification title column285 of theclassification analysis interface280. Theclassification analysis module1012 will use a search indicator to sensory indicator table241, as seen inFIG. 3D, to determine a sensory indicator (e.g. a color) for each unique classification code that appears in the classification analysis interface. Theclassification analysis module1012 determines the sensory indicator by first determining whether the corresponding classification code has been previously searched and to what extent. If a code appears in theclassification code column284, and does not appear as a previously identifiedclassification code291 in theclassification search history290, then the code is assumed to be unsearched. If a code appears in theclassification code column284, and also appears as a previously identifiedclassification code291, and the correspondingsearch status indicator293 shows “No”, then the code is assumed to be at partially searched. If a code appears in theclassification code column284, and appears as a previously identifiedclassification code291, and the correspondingsearch status indicator293 shows “Yes”, then the code is assumed to be fully searched. The sensory indicator may be a green highlighting if the code is unsearched, a yellow highlighting if it has been partially searched, or a red highlighting if the code has been fully searched. Theclassification analysis window280 thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes have not yet been searched. In this manner the researcher may very quickly determine where a next iteration of a search project should be directed.
Atstep1350 the researcher will determine whether to add a new classification code to the search project. The researcher is provided the ability to quickly add entries to theclassification search history290 directly from theclassification code column284 using a mouse click. In doing so, the process will return to step1320, as indicated by dashedarrow1360, at which point theinterface module1014 provides a new search inquiry to thedocument provider1030 and a new set ofreference documents1032 will be received. Each ofsteps1330 through1350 are repeated to determine the relevancy of the new set of reference documents to the user-defined concepts and whether the search should be expanded to a new classification.Steps1320 through1350 may be repeated until the researcher is satisfied that the most relevant classes have been searched. By way of example, the researcher may make this determination when a threshold number of the most frequently occurring classifications are highlighted in red, which indicates that all are present on theclassification search history290, and all are indicated as complete by thesearch status indicator293. By way of example, the threshold may be least ten red highlighted classifications in theclassification analysis interface280.
Modes of Operation: As discussed the classification analysis performed by theclassification module1012 may be performed by first specifying a mode using the classification analysismode selection field282. By way of example, the classification analysis modes may include: a Main Classes mode, a Subclass Parents mode, a Subclass Mode and a Primary Subclass mode. Referring toFIG. 3B, all four modes are shown, and will now be discussed in detail. In addition,FIG. 3C shows the process ofFIG. 3B along with actual numbers. Steps701-706 are run in all modes, and will be discussed first.
As seen atstep701, theclassification analysis module1012 retrieves thedocuments1032 from theproject file205. The documents are then filtered according to the preference of the researcher usingdocument selection field281. As an example, the researcher may run just “B” tagged documents or just documents having a specific element tagged in the document management table252. Next atstep702, theclassification analysis module1012 compiles alldocument classifications135 into a 2D-Array750 containingdocument classification135, relevancy, score, and primary (see forexample array750 inFIG. 3C). The relevancy is originally set by the researcher inrelevancy column257 as A,B,C,D, or E. Score is initially set to 1. Primary is an indication as to whether thedocument classification135 is the first listed. Next atstep703, if theclass weighting286 is turned on, then move to step704. Atstep704, theinterface module1014 requests the classification size (ie. the total number of documents currently classified therein) for each classification in the 2D-Array750 from theclassification data provider1040. Next theclassification analysis module1012 divides the score in 2D-Array750 by the classification size, which effectively weights each classification inversely according to classification size. Next atstep705, if therelevancy weighting287 is turned on, then move to step706. Atstep706, theclassification analysis module1012 multiplies the score in 2D-Array750 by a relevancy factor according to the relevancy listed in 2D-Array750. Current relevancy factors are A=1.5,B=1,C=0.75,D=0.5,E=0.5.
Main Classes Mode: If classification analysismode selection field282 is set to “Main” then proceed throughstep717 to step718. Atstep718, thedocument classifications135 in the 2D-Array750, are rewritten to show only theclasses136. Next, atstep718, the 2D-Array750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The 2D-Array750 is then sorted high to low according to score, and the class description is added forstep720, which is the display ininterface280. SeeFIG. 2ifor an example of theinterface280 after a run in Main Classes mode.
SubClass Parents Mode: If classification analysismode selection field282 is set to “Subclass Parents” then proceed throughstep714 and on to step715. Next, theclassification analysis module1012 requests all ancestors of thedocument classifications135 in the 2D-Array750 from theclassification data provider1040 via theinterface module1014. The ancestors are then inserted into the 2D-Array750, and simultaneously the original document classifications are deleted from the 2D-Array750. Next, atstep716, the 2D-Array750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The resulting table is displayed in theclassification analysis interface280. SeeFIG. 2J for an example of theinterface280 after a run in SubClass Parents mode.
SubClass Mode: If classification analysismode selection field282 is set to “Subclass” then proceed throughstep710 and on to step711. Next, theclassification analysis module1012 rearranges the previously generated 2D-Array750 by summing the scores and eliminating repeats. The resulting 2D-Array750 is sorted according to score from high to low. Next, atstep712, theclassification analysis module1012 compares all rows in 2D-Array750 to all rows of theclassification search history290, and assigns colors according to the following scheme (see alsoFIG. 3D for the scheme): 1) if a classification is in 2D-Array750 and is not in theclassification search history290 then assign green, 2) if a classification is in 2D-Array750 and is in theclassification search history290 with asearch status293 of “No” and a search extent294 of “No” then assign light yellow, 3) if a classification is in 2D-Array750 and is in theclassification search history290 with asearch status293 of “Yes” then assign red. 2) if a classification is in 2D-Array750 and is in theclassification search history290 with asearch status293 of “No” and a search extent294 of “Yes” then assign bright yellow. Atstep720, the resulting table is displayed along with the color scheme in theclassification analysis interface280. SeeFIG. 2G for an example of theinterface280 after a run in SubClass mode.
Primary Mode: If classification analysismode selection field282 is set to “Primary” then proceed throughstep707 and on to step708. Next, theclassification analysis module1012 sorts through 2D-Array750 and removes all but the entries labeled as primary. Atstep720, the resulting table is displayed in theclassification analysis interface280.
Referring toFIG. 4, a block diagram is shown illustrating adocument analysis system800 in accordance with another exemplary embodiment of the invention. Thedocument analysis system800 is similar to the document analysis system ofFIG. 1A however provides a client-server architecture. Accordingly,document analysis system800 includes aclient device810 and aserver device880. Theserver device880 may be a computing device having a processor such as personal computer or may be implemented on a high performance server, such as a HP, IBM or Sun computer using an operating system such as, but not limited to, Windows, Solaris or UNIX. Theserver device880 includes a classification analysis module similar in function to the document analysis module of1012 of the embodiment ofFIG. 1A.
Thus, a document analysis system having the benefits of allowing for efficient and accurate identification of potentially relevant classifications is contemplated. Referring now toFIG. 5, an exemplary method2100 of performing a patent search using multiple modes of the present invention comprised of the following:
Step2101: Synthesizing a proposition into one or morekey concepts272;
Step2102: Developing one ormore keyword groups214 based on thekey concepts272;
Step2103: Conducting a text search with text search inquiry over a database of documents having text, images and one ormore document classifications135 therein using thekeyword groups214;
Step2104: Compiling a search file ofdocuments1032 from the text search inquiry;
Step2105: Selecting a first set of documents from the file ofdocuments1032 and creating aproject file205;
Step2106: Taggingdocuments1032 in theproject file205 using adocument management interface250, with indicia in arelevancy column257 andconcepts272 inadditional columns258;
Step2107: Instructing aclassification analysis module1012 to run in Main Class Mode to locate a set ofclasses136 by counting and ranking according to frequency;
Step2108: Conducting a first class & text search over the database using the top-rankedclasses136 combined with text from thekeyword groups214;
Step2109: Compiling a second search file ofdocuments1032 from the classification & text search;
Step2110: Selecting a second set of 4-5 and appending the set to theproject file205;
Step2111: Tagging untagged documents in theproject file250 as appropriate, and particularly the second set of documents, using adocument management interface250, with indicia in arelevancy column257 andconcepts272 inadditional columns258;
Step2112: Instructing theclassification analysis module1012 to run in Subclass Parents Mode to locate a second set ofdocument classifications135 by counting and ranking according to frequency;
Step2113: Inspecting a classification schedule to locate potentially relevant child classifications of the second set located instep2112 and adding said classifications to theclassification search history290;
Step2114: Conducting a third classification & text search over the database using the classifications from2113 combined with text from thekeyword groups214;
Step2115: Compiling a third search file ofdocuments1032 from the third classification & text search;
Step2116: Selecting a third set of 4-5documents1032 and appending the set to theproject file205;
Step2117: Tagging untagged documents in theproject file250 as appropriate, and particularly the third set of documents, using adocument management interface250, with indicia in arelevancy column257 andconcepts272 inadditional columns258;
Step2118: Instructing theclassification analysis module1012 to run in Subclass Mode by counting andranking document classifications135 according to frequency and cross referencing results against theclassification search history290 to locate annth document classification135 to add to theclassification search history290;
Step2119: Conducting an nth search over the database using the nth classification fromstep2118 either combined with text from thekeyword groups214 or inspecting the nth classification in its entirety;
Step2120: Compiling an nth search file ofdocuments1032 from the nth classification & text search;
Step2121: Selecting allrelevant documents1032 and appending the set to theproject file205;
Step2122: Tagging untagged documents in theproject file250 as appropriate, and particularly the nth set of documents, using adocument management interface250, with indicia in arelevancy column257 andconcepts272 inadditional columns258;
Step2123: Inspecting theclassification search history290 for minimum of ten document classification codes and optionally repeating from2118 to2123;
Step2124: Conducting forward and backward citation search (not shown) on the selected high-relevance documents from theproject file205 and adding relevant documents to the project file;
Step2125: End.
While the foregoing invention has been described with reference to the above-described embodiments, various modifications and changes can be made without departing from the spirit of the invention. Accordingly, all such modifications and changes are considered to be within the scope of the appended claims.