RELATED APPLICATION This application claims priority from U.S. Provisional Application Ser. No. 60/091,348, filed Jul. 1, 1998.
TECHNICAL FIELD This invention relates generally to the field of information management, and more particularly to a method and system for confidentially tracking and reporting information available on global computer networks.
BACKGROUND The Internet has experienced exponential growth and the number of interconnected computers is quickly approaching one billion worldwide. As such, the Internet provides unprecedented access to massive volumes of information and resources. An entity resource, such as a company, organization, periodical, etc., presents information to the Internet by uploading the information to a server that is connected to one of the interconnected networks and has a registered Internet Protocol (IP) address. Often, an entity organizes its information on the server as a hierarchy of pages composed with hypertext markup language (HTML). Along with general information, each page may contain links to other informative items including graphics, documents or even links to other web sites. Users can easily access an entity's information using a graphical software program referred to as a browser. Because the Internet is essentially a vast web of interconnected computers, databases, systems and networks, an entity's information is often referred to as its “website”. For this reason, the Internet and its interconnected web sites is often referred to as the World Wide Web. Finding relevant information on the Internet, including the millions of websites and the billions of individual web pages, is a difficult task that has been inadequately addressed.
Many companies have developed search engines in an attempt to ease the location and retrieval of information from the Internet. Examples of current search systems include the AltaVista™ search engine developed by Digital Equipment Corp., Lycos™, Infoseek™, Excite™ and Yahoo™. Most conventional search systems consist of two components. First, a data gathering component, known as a webcrawler or robot, systematically traverses the Internet and retrieves information from various websites. Often, the webcrawler moves from website to website traversing every link found. As the individual websites are accessed, each page of information is retrieved, analyzed and stored for subsequent searching and retrieval. After retrieving and examining each page of a website, the webcrawler moves on to another site on the Internet. While the webcrawler is traversing various websites and retrieving the pages of information, the webcrawler indexes the information presented by each page and stores a link to each page and the corresponding index information in a repository such as a database.
The second component of conventional search systems is the search engine. The search engine provides an interface for selecting the links stored in the repository in order to identify web pages with desired content. For example, the above mentioned search engines allow a user to enter various search criteria. The search engine probes the stored index information generated by the webcrawler according to the search criteria. The search controller presents to the user any stored links having corresponding index information that satisfies the entered search criteria. The user is able to view the actual page located on the original website by following the link to the actual website.
SUMMARY The present invention is directed to a method and system for systematically tracking a defined set of network resources on a global computing network. The method and system can be arranged to deterministically guarantee that any information from the sites is relevant and current. The method and system also can be arranged to increase the confidentiality of search parameters and the identities of parties seeking information.
In one embodiment, the present invention provides a computer-implemented method for gathering information from network resources on a global computer network, the method comprising assigning search times to the network resources, the search times designating times at which the network resources are to be searched within a monitoring period, categorizing the network resources into industry groups, generating search items, each of the search items defining a search for particular information and designating one or more of the industry groups, identifying, at a given search time, the network resources that have been assigned the given search time and categorized into industry groups designated by one or more of the search items, retrieving and storing information from the identified network resources, and performing the searches defined by one or more of the search items on the stored information.
In another embodiment, the present invention provides a method for gathering information from network resources on a global computer network, the method comprising assigning search times to the network resources, the search times designating times at which the network resources are to be searched within a monitoring period, generating search items, each of the search items defining a search for particular information and designating one or more of the network resources, identifying, at a given one of the search times, the network resources that have been assigned the given search time and which are designated by one or more of the search items, retrieving and storing information from the identified network resources, whereby information from the network resources that have not been assigned the given search time or are not designated by one or more of the search items is not retrieved and stored, and performing the searches defined by one or more of the search items on the stored information.
In a further embodiment, the present invention provides a method for gathering information from network resources on a global computer network, the method comprising generating a set of search items, each of the search items defining a search for particular information and designating one or more of the network resources, retrieving and storing information from the network resources designated by one or more of the search items, performing the searches defined by one or more of the search items on the stored information, and presenting results of the searches.
In an added embodiment, the present invention provides a method for gathering information from network resources on a global computer network, the method comprising categorizing the network resources into industry groups, generating a set of search items, each of the search items defining a search for particular information and designating one or more of the industry groups, retrieving and storing information from the network resources associated with the industry groups designated by one or more of the search items, performing the searches defined by the search items on the stored information, and presenting results of the searches.
In another embodiment, the present invention provides a method for gathering information from network resources on a global computer network, the method comprising selecting a set of network resources residing on the global computer network, assigning a search time to each of the network resources, the search time indicating a time within a monitoring period in which the network resource is to be searched, generating a set of search items, each of the search items defining parameters for a search and designating one or more of the network resources to be searched, determining, at approximately the search time for each of the network resources, whether the respective network resource is designated for searching by at least one of the search items, retrieving and storing information from the network resources designated by at least one of the search items, performing the searches defined by the search items on the stored information, and presenting results of the searches to users.
In a further embodiment, the present invention provides a software system for monitoring network resources residing on a global computer network over a time interval, the system comprising a database storing resource identifiers that correspond to particular network resources, and search items that define a search for information and specify one or more of the network resources, a system executive that constructs a set of the resource identifiers scheduled to be searched, and a set of the search items specifying at least one of the network resources corresponding to one of the resource identifiers of the constructed resource identifier set, a collection controller, for each of the resource identifiers of the constructed set of resource identifiers, the collection controller retrieving information presented by the networked resource corresponding to the resource identifier, a search controller for receiving the information retrieved by each of the collection controllers, and a search instance, for each search item of the search item list, wherein the search controller instantiates each search instance to perform the search defined by the respective search item on the information received from the collection controllers for the network resource specified by the respective search item.
In an added embodiment, the present invention provides a method for monitoring information presented by at least one of a plurality of networked computers comprising storing a plurality of identifiers, wherein each identifier corresponds to one of the plurality of networked computers, storing a plurality of search items, wherein each search item includes search criteria and at least one networked computer to be monitored, generating a set of identifiers to be searched, generating a set of search items monitoring at least one of the networked computers corresponding to one of the identifiers of the identifier set, retrieving information presented by each of the networked computers corresponding to an identifier of the identifier set, and searching the retrieved information according the search criteria of each search item of the search item set monitoring the networked computer corresponding to the retrieved information.
Other advantages, features, and embodiments of the present invention will become apparent from the following detailed description and claims.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a block diagram of a software system for accessing and reporting network resources on global computer networks in accordance with the present invention;
FIG. 2 is a flow chart illustrating a high-level operation of a system executive in order to control the various software components of the software system;
FIG. 3 is a flow chart illustrating one mode of operation in which the system executive controls the software system to access and search the network resources that are due to be searched and currently targeted by a search item;
FIG. 4 is a flow chart illustrating one mode of operation of a collection controller responsible for traversing a single network resource;
FIG. 5 is a flow chart illustrating one mode of operation of a web crawler responsible for retrieving a single informative item and extracting any links to other informative items;
FIG. 6 is a flow chart illustrating one mode of operation of a search controller responsible for managing the analysis of each informative item retrieved by the collection controllers;
FIG. 7 is a flow chart illustrating one mode of operation in which the software system restarts the monitoring cycle and balances the retrieval and searching of the network resources across a plurality of computing devices according to the actual number of pages previously retrieved from each network resource;
FIG. 8 is one example of a report generated by the software system for reporting matching information to a client;
FIG. 9 is block diagram of a computing system having a plurality of computing devices suitable for executing the software system in a distributed manner; and
FIG. 10 is a block diagram of one embodiment of a global networked environment in which a service center executes a software system in accordance with the present invention.
DETAILED DESCRIPTION In the following detailed description, references are made to the accompanying drawings which illustrate specific embodiments in which the invention may be practiced. Electrical, mechanical and programmatic changes may be made to the embodiments without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense and the scope of the present invention is defined by the appended claims and their equivalents.
Conventional search systems are deficient in many ways. For example, due to the vast information and myriad of sites residing on the Internet, conventional search systems produce excess, irrelevant information. A rather narrowly defined search on many of the conventional systems may easily produce thousands of references. Because the webcrawler traverses each and every site that it finds on the Internet, valuable information is often lost among thousands of references to irrelevant sites. Furthermore, conventional systems are, in a sense, non-deterministic. The matching links presented to the user by the search controller often no longer exist. Furthermore, the index information stored in the repository for a particular page is often incorrect and does not contain recently released information. In addition, conventional search engines require huge resources to store the index information and links for subsequent analysis.
Conventional search engines are also incredibly labor intensive. In order to search for specific information on the internet, a user is forced to access one or more publicly available search systems, enter its search criteria and manually parse the results. This process is tedious and time consuming. For example, the user is forced to periodically repeat the process in order to determine if any new information has been released. In order to identify any new information, however, the user is forced to parse through the previous information already examined.
Conventional search systems are also non-confidential. For example, in order to reduce the numerous irrelevant references produced by conventional systems as described above, a user must narrowly define the search criteria. Often, the user is forced to provide a fairly comprehensive description of the desired information before the number of matches approaches a manageable number. This, however, is problematic in that it forces the user to divulge the idea being researched. For this reason, there is currently no feasible mechanism to search the Internet without divulging trade secrets or other intellectual property. The inability to confidentially retrieve information from the Internet manifests itself in other areas besides the use of conventional search engines. For example, many web sites provide a local search mechanism to assist in finding information within the web site. A user is able to access a web page and find all relevant information simply by engaging the search mechanism. This, however, forces the user to describe the desired information in detail and disclose the information to the website. Thus, the user is unknowingly revealing the details regarding the desired information. Furthermore, because the IP address of a user is readily available to the host site, not only is the information revealed, but the user is easily identified.
FIG. 1 is a block diagram of asoftware system10 for confidentially accessing and reporting information present on global computer networks, such as the Internet, in accordance with the present invention.Software system10 includes system executive20, one ormore collection controllers30, one or more web crawlers40,search controller50, one or morecustomer search instances60, report generator70,database manager80 anduser interface90.
System executive20 is responsible for overall control and management ofsoftware system10.FIG. 2 is a flow chart illustrating one mode of operation of system executive20. Upon initial execution ofsoftware system10, system executive20 starts execution in step100, immediately proceeds to step102 and instantiatesdatabase manager80 for managing all accesses to a database (not shown). In one embodiment,database manager80 has its own thread of execution. Preferably,database manager80 has a client/server interface whereby other components ofsoftware system10 initiate a remote procedure call in order to access the data of a database. In this manner, all accesses of database100 are synchronized and inherently thread safe. Upon instantiatingdatabase manager80, system executive20commands database manager80 to retrieve configuration data from a database. Typical configuration data includes a maximum number ofconcurrent collection controllers30 that may be instantiated concurrently, a maximum number of concurrent web crawlers40 and a maximum number of concurrentcustomer search items60.
System executive20 proceeds fromstep102 to step104 and waits for a control message. Control messages can be issued to system executive20 in two ways. First,user interface90 presents a graphical interface by which an operator controlssoftware system10. After receiving input from the operator,user interface90 communicates a control message to system executive20. Second,software system10 includes an timer thread (not shown) that awakens at user-configurable times and sends control messages to system executive20, thereby triggering automatic execution ofsoftware system10. Referring again toFIG. 2, system executive20 receives control messages instep104 and sequentially executessteps106 through114 to determine the nature of the received control message.
If a StartTracking control message is received, system executive20 proceeds fromstep106 to step116 and analyzes information present on network resources in accordance with the present invention.FIG. 3 illustrates one mode of operation of system executive20 for analyzing network resources that are due to be tracked. Instep116, system executive20 proceeds to step128 and retrieves information on the daily resources that are due to be analyzed. More specifically, the database ofsoftware system10 stores a plurality of resource identifiers, each identifier corresponding to a resource residing on the global computer network. In one embodiment, the database stores a plurality of domains for monitoring. Each domain identifies a website of a company, government body or other organization. Each resource identifier is categorized into one of a plurality of industry groups. Each resource includes a search date that indicates when the resource is to be searched within the monitoring period. As discussed below,software system10 deterministically monitors the resources over a configurable period such as one week, one month or even one year. In other embodiments, the database may stores a plurality of domains that identify web-based databases, such as trademark, domain name, or toll free telephone number databases, for monitoring of competitive activity or availability of such assets. Thus, the databases can be analyzed in a systematic fashion to maintain a “watch” for activity with respect to such assets.
In addition to a plurality of resource identifiers, the database contains a plurality of search items. Each search item includes general information, such as a type which may be patent, trademark, etc., an abstract and search criteria. Furthermore, each search item designates one or more network resources or industry groups to be monitored. Instep130, system executive20 instructsdatabase manager80 to retrieve: (1) a set of the stored search items, and (2) a set of pending network resources that are due to be searched and are designated by at least one of the search items. In this manner,software system10 need not waste computing resources in order to analyze network resources that are not being tracked.
Upon receiving the daily tracking information fromdatabase manager80 in step128 (FIG. 3), system executive20 proceeds to step130 and instantiates acorresponding collection controller30 for each pending network resource, subject to the user-configured maximum number of concurrently executingcollection controllers30. Eachcollection controller30 is responsible for analyzing the website of its corresponding resource. In one embodiment, eachcollection controller30 has its own thread of execution and receives an address, known as the base address, of the network resource to be analyzed. For example, the base address may be “www.netshadow.com”.
After spawning the maximum number ofcollection controllers30, system executive20 proceeds to step132 and waits for one of the executingcollection controllers30 to finish traversing the corresponding network resource and retrieving its contents. When acollection controller30 signals completion, system executive20 proceeds to step134 and instructsdatabase manager80 to update the schedule data for the network resource traversed by thefinished collection controller30. In this manner,database manager80 updates the database such that the traversed network resource will not be traversed again until the next monitoring period. After updating the database, system executive20 proceeds to step136 and determines whether there are more network resources scheduled to be traversed and analyzed. If so, system executive20 jumps back to step130 and spawns anothercollection controller30. If not, system executive20 proceeds to step138 and determines whether one ormore collection controllers30 are currently traversing network resources. If so, system executive20 jumps back to step132 and waits for anothercollection controller30 to finish. When all thecollection controllers30 have finished traversing the pending network resources, system executive20 returns to step104 ofFIG. 2.
FIG. 4 is a flow-chart illustrating in detail one mode of operation of an executingcollection controller30. Upon creation by system executive20,collection controller30 begins execution instep139 and immediately proceeds to step140. Instep140,collection controller30 creates a “pending link list” for holding links to be followed. Initially,collection controller30 inserts the base address received from system executive20. After initializing the pending link list,collection controller30 proceeds to step142 and instantiates aweb crawler40 for each link stored in the pending link list, subject to the user-configured maximum concurrent web crawlers. Eachweb crawler40 is responsible for retrieving the content of the informative item pointed to by its link. For example, theweb crawler40 may download and store an entire HTML page, a file published using Adobe Acrobat, a graphic file, etc. As described in more detail below, eachweb crawler40 also retrieves any links to other informative items the item contains.
When first executingstep142,collection controller30 creates asingle web crawler40 for retrieving the item pointed to by the base address. Instep144,collection controller30 waits for aweb crawler40 to finish. When aweb crawler40 has finished retrieving the content of the informative item pointed to by its link,collection controller30 proceeds to step146 and receives any links the finished web crawler may have found.Collection controller30 scans the pending link list and inserts any newly found links that: (1) are not already on the pending link list and (2) that have not already been followed. Instep148,collection controller30 creates a token (data structure) that describes the information retrieved byfinished web crawler40 and adds the token totoken queue55. Instep148,collection controller30 deletes the instantiation of thefinished web crawler40, proceeds to step150 and determines whether any links are pending. If so,collection controller30 returns to step140 and spawns anotherweb crawler40. If no links are pending,collection controller30 proceeds to step152 and determines whether any web crawlers40 are currently executing. If so, collection controller returns to step144 and waits for one of the executing web crawlers40 to finish. If no web crawlers40 are currently executing, collection controller proceeds fromstep152 to step154 and signals system executive20 that the network resource has successfully been traversed. After signaling system executive20,collection controller30 proceeds to step156 and terminates.
In one embodiment,collection controller30 maintains and stores a list of successfully crawled links as it traverses the network resource. This embodiment is useful in the event thatsoftware system10 terminates beforecollection controller30 is able to completely traverse the network resource. In this case, the nexttime collection controller30 attempts to traverse the same network resource it loads the archived list of successfully crawled links. In this manner,collection controller30 continues to traverse the network resource without retrieving previously retrieved informative items.
In yet another embodiment,collection controller30 waits a configured delay time before spawning eachweb crawler40. In this manner,collection controller30 ensures a reasonable loading on the network resource being traversed. This aspect is also advantageous in giving the appearance of manually traversing the network resource. For example, in another embodiment,collection controller30 waits a random delay time, within a range of possible delay time, between the spawning of web crawlers40, thereby giving the appearance of manually traversing a network resource.
FIG. 5 is a flow-chart illustrating one mode of operation ofweb crawler40. Whenweb crawler40 is instantiated bycollection controller30, it receives a link to an informative item such as an HTML page, a graphic, an Acrobat file, etc.Web crawler40 begins execution atstep160, immediately proceeds to step162 and opens an HTTP connection with the network resource pointed to by the link. Once an HTTP connection is established,web crawler40 proceeds to step164 and creates a local file to hold the retrieved informative item. Instep166,web crawler40 downloads the informative item into the local file. After downloading the item,web crawler40 proceeds to step168 and scans the local file for any links to other items. Upon scanning the file,web crawler40 proceeds to step170 and signals collection controller20. After communicating the name of the local file and any newly found links to collection controller20,web crawler40 proceeds to step172 and terminates.
Search controller50 receives tokens fromcollection controllers30 viatoken queue55 and is responsible for determining whether a retrieved item satisfies the search criteria of one or more of the search items stored in the database. Each token includes a filename of a local file holding an informative item for searching as well as a type field indicating the file type.
FIG. 6 is a flow-chart illustrating one mode of operation ofsearch controller50. Upon creation by system executive20,search controller50 begins execution in step180 and proceeds to step181 where it receives a set of search items from system executive20. Next,search controller50 proceeds to step182 and waits for tokens to be placed in thetoken queue55 bycollection controller30. When a token is received,search controller50 proceeds to step184. Instep184,search controller50 retrieves the filename and file type from the token, opens the local file indicated by the filename and generates a hash table and a checksum based on the content of the local file.
After generating the hash table and the checksum,search controller50 proceeds to step185 andqueries database managers80 to determine whether an informative item having the same link address and checksum has already matched a search. If so,search controller50 jumps to step200, deletes the token, returns to step182 and waits for the next token. In this fashion,search controller50 conserves computing resources by not searching documents or files that have already matched search criteria and have remained unchanged.
If the test instep185 fails,search controller50 advances to step186 and instantiates asearch instance60 for each search item received from system executive20, subject to the user-configured maximumconcurrent search instances60. Eachsearch instance60 is responsible for testing the hash table with the search criteria of the corresponding search item. For example, each search item has one or more search strings similar to the following:
(semicond!*wafer)+(fabric!*chip!)+(memory w/2 module)
where ‘*’ signifies boolean AND, ‘+’ signifies boolean OR, ! is an expansion operator and ‘w/x’ means within X words.
After spawning a maximum number of search instances instep186,search controller50 proceeds to step188 and waits for asearch instance60 to finish. When asearch instance60 has finished testing the hash table with the search criteria,search controller50 proceeds to step190 and queries thefinished search instance60 whether the hash table satisfied the search criteria. If a match did not occur,search controller50 jumps ahead to step194. If a match occurred,search controller50 moves the temporary local file to a more permanent location and stores the new locations, the link address of the original informative item and the checksum in the database.
Instep194,search controller50 deletes the instantiation of thefinished search instance60, proceeds to step196 and determines whether any search items still remain for testing against the hash table. If so,search controller50 returns to step186 and spawns anothersearch instance60. If no search items remain,search controller50 proceeds to step198 and determines whether anysearch instances60 are still examining the hash table. If so,search controller50 returns to step188 and waits for one of the executingsearch instances60 to finish. If nosearch instances60 are currently executing,search controller50 proceeds fromstep198 to step200 and deletes the token that was popped from the token queue and the corresponding temporary file containing the informative item. Thus, unlike conventional search items that store retrieved information to be used to satisfy future searches,software system10 deletes all information that does not match current search criteria. In this manner,software system10 conserves system resources and deterministically guarantees that each search item is tested with current information.
After deleting the token,search controller50 proceeds to step182 and waits for the next token. In this manner,software system10 deterministically monitors a plurality of network resources over a configurable period. In addition,software system10 conserves resources by not searching pages that have already satisfied search criteria and have not been changed.
Referring again toFIG. 2, if a RestartSearchCycle control message is received, system executive20 proceeds fromstep108 to step118 and restarts the monitoring period by invoking a sophisticated load balancing technique.FIG. 7 is a flow-chart illustrating in detail one mode of operation ofsoftware system10 for restarting the monitoring period instep118. Instep200, system executive20 sets local variable D equal to the total days of the monitoring period as configured by the operator and stored in the database. This allows the operator to completely control the period in which the set of network resources are completely monitored. Next, system executive20 sets a local variable CD equal to the current date. System executive20 instructsdatabase manager80 to set the starting date of the current search cycle to the current date. Next, system executive20commands database manager80 to set the ending date of the search cycle to the current date plus the number of days in the monitoring period.
After setting the start and ending dates in the database, system executive20 proceeds to step202. As discussed in detail below,software system10 may be distributed over a number of computers. Instep202, system executive20queries database manager80 for a list of all of the computers in the distributed system that traverse network resources by executingcollection controllers30. Based on this list, system executive20 set a local variable (TC) to a total number of computers in the distributed system that operate as such. Next, system executive20 instructsdatabase manager80 to access each network resource identifier stored in the database and retrieve a number of known pages (RKP) for each resource. This value is set whenever acollection controller30 successfully traverses an entire network resource and indicates the total number of pages retrieved from the resource. As described in detail below, system executive20 balances the tracked network resources across the number of computers in the distributed system according to the previous number of pages retrieved for the network resources, thereby more accurately load balancing the system. Asdatabase manager80 access each network resource identifier stored in the database, a running total of the number of pages (TP) is maintained.
System executive20 proceeds fromstep202 to step204 and calculates an average daily pages (ADP) by dividing the total pages by the days in the current monitoring period. System executive20 further calculates an average pages per computer (APC) by dividing the average daily pages by the total number of computers in the distributed system. This value, APC, reflects the average number of pages (informative items) each computer should retrieve per day for the system to be optimally balanced. Instep206, system executive20 clears a local variable current computer pages (CCP) and sets another variable, current computer (CCR), to the first computer in the list of computers that executecollection controllers30. After initializing these variables, system executive20 proceeds to step208 and begins the load balancing process.
Instep208, system executive20commands database manager80 to once again access each network resource identifier stored in the database. For each network resource identifier, system executive20 repeatssteps210,212 and214. Instep210, system executive20commands database manager80 to set the network resource identifier's next search date to the date stored in the local variable CD. Initially, this value will be the current date. In addition, system executive20commands database manager80 to set the identifier's search computer to the computer stored in the local variable CCR. System executive20 adds the number of known pages (RKP) for each resource to the variable CCP, thereby keeping track of the total number of pages assigned to the current computer.
System executive20 proceeds fromstep210 to step212 and checks whether the number of pages assigned the current computer has exceeded the average (APC) as calculated above. If not, system executive20 jumps back to step208 and continues through the network resource identifiers. If the number of pages assigned the current computer has exceeded the average, system executive20 proceeds fromstep212 to step214 and sets the local variable CCR to the next computer in the list received fromdatabase manager80. If the list has been exhausted, CCR is set to the first computer in the list. Next, system executive20 resets the variable CCP and jumps back tostep208. When all of the network resource entries in the database have been updated, system executive20 jumps fromstep208 to step104 (FIG. 2) and waits for another control message. In this manner, system executive20 sets the next search date and search computer for each network resource. Furthermore, the network resources are evenly balanced throughout the monitoring period and across the computers of the distributed system. This balancing is improved by using stored information on the last number of pages previously retrieved from each network resource. Furthermore, the search cycle can be restarted manually by the operator or by the alarm thread when the current search cycle has completed. In this manner,software system10 balances the tracking of the network resources upon the completion of each monitoring period.
Referring again toFIG. 2, if a GenerateReports control message is received, system executive20 proceeds fromstep110 to step120 and commands report generator110 (FIG. 1) to generate client reports. To create a client report,report generator110 instructsdatabase manager80 to retrieve all of the link addresses and permanent file locations recently stored bysearch controller50 for informative items that satisfied one or more of the client's search items. The reports can be generated in a variety of forms.
In one embodiment,report generator110 constructs a hierarchy of HTML files that comprise the client's report and may be viewed by a conventional browser. A main HTML file contains a list of each search item for the client. When one of the search items is selected, the browser displays a second HTML file that more fully describes the search item and its corresponding search criteria. In addition, the second HTML file includes a list of each informative item that satisfied the selected search item's criteria. When one of the informative items is selected, the browser displays the selected informative item with any text that satisfied the search criteria highlighted. In this embodiment, the hierarchy of HTML files includes an HTML file for each informative item. In order to communicate the report to the client, the entire hierarchy of files is placed on a diskette, or other suitable media such as a CDROM, and mailed to the corresponding client. Alternatively, the files may be communicated via electronic mail to the client. Preferably, the electronic communication is encrypted to maximize confidentiality.
In another embodiment,report generator110 constructs an HTML file for each search item. The HTML file fully describes the search item and its corresponding search criteria. In addition, the HTML file includes a list of informative items that satisfied the selected search item's criteria. Unlike the embodiment described above, in this embodiment, a client report does not actually include the informative items. The HTML file is constructed such that when one of the informative items is selected, the browser follows the link address to the actual network resource containing the item, retrieves the item and displays the item. As in the previous embodiment, each HTML file may be placed on a diskette or electronically mailed to the client.
In yet another embodiment, thereport generator110 retrieves the base address for each network resource that satisfied one or more of a client's search items. Unlike the previous embodiments,report generator110 does not construct a report based on the network resource's matching informative items but traverses the entire network resource in order to construct a hierarchy of HTML files that form a comprehensive site index. More specifically,report generator110 formulates a list of every word disclosed by the informative items of the network resource. Based on this list,report generator110 constructs the index that provides a link to each usage.FIG. 8 illustrates one portion of a sample index. When a particular usage is selected, the browser displays the informative item with the usage highlighted.
Referring again toFIG. 2, if a Shutdown control message is received, system executive20 proceeds fromstep114 to step124 and deletesdatabase manager80,search controller50,token queue55 andreport manager110. After successful deletion of the various components, system executive20 andsoftware system10 terminate.
The present invention described above is suitable for executing on a single computer having a storage device and network interface such as a network card, an ISDN terminal adapter or a high-speed modem. The present invention, however, may readily be distributed across a system having multiple computers in order to efficiently monitor large numbers of network resources.
FIG. 9 is a block diagram of a distributedcomputing system300 for executing software system10 (FIG. 1) to confidentially access information present on global computer networks, such as the Internet, in accordance with the present invention.Computing system300 comprises a plurality of computing devices, includingcollection nodes310,search nodes320, database server330 and user interface device340, that are communicatively coupled vianetwork345. As explained in detail below, each of these computing devices executes one copy ofsoftware system10. Upon execution on each computing device, system executive20 ofsoftware system10 determines the type of computing device and operates accordingly.
First, system executive20 determines whether the particular computing device is database server330. If so, system executive20 instantiatesdatabase manager80 as a server that directly controls access to the database. If not, system executive20 instantiatesdatabase manager80 as a client that handles access requests via making a remote procedure call (RPC) to thedatabase manager80 of database server330. In addition, system executive20 determines whether the particular computing device is acollection node310, asearch node320 or a user interface device330.
Next, forcollection nodes310, system executive20 instantiatestoken queue55 as an RPC client. Forsearch nodes320, system executive20 instantiatestoken queue55 as a server that receives tokens overnetwork345 via RPC calls. Each system executive20 ofcollection nodes310 spawns one ormore collection controllers30 in order to traverse the network resources that are due and are assigned to thecorresponding collection node310.Collection nodes310access Internet360 viarouter350. The retrieved informative items are passed to the token queue client which communicates pertinent information, such as the link address and local file location, to a token queue server of one of thesearch nodes320. Each system executive20 ofsearch nodes320 spawnssearch controller50 to accept tokens fromtoken queue55 and search any received token as illustrated inFIG. 6 described above. In this manner, the informative item retrieved bycollection nodes310 are distributed evenly to searchnodes320, thereby allowing efficient monitoring of vast numbers of network resources.
In one embodiment,network345 ofcomputing system300 allows remote access via authorized clients. For example, in one embodiment,user interface device320 executes Windows NT and handles remote clients using Remote Access Server (RAS). In another embodiment,network345 supports a virtual private dial network. In this embodiment clients are able to view their corresponding search items, and recently retrieved informative items that matched their search criteria, without communicating confidential information overInternet300. Thus, unlike conventional search engines, the present invention allows clients to automatically monitor a plurality of network resources of a configured monitoring period without ever communicating the confidential search criteria over an insecure network.
In order to allow an operator to control and configure distributedcomputing system300, system executive20 instantiatesuser interface90 upon determining that the computing device is user interface device340. For example, when various computing devices are added or removed fromcomputing system300,user interface90 allows an operator to update the database via database server330.
FIG. 10 is a block diagram of one embodiment of a globalnetworked environment400 in whichservice center405 executes software system10 (FIG. 1) in accordance with the present invention. In one embodiment,service center405 includes distributed computing system300 (FIG. 9) and executessoftware system10 as described above. In addition to confidentially monitoring information as described above,service center405 integrates advertising and processes electronic orders for patents, file wrappers and technical disclosures (described below).Individual users415 communicate withservice center405 over a global computer network, such as the Internet, in order to view secure accounts that contain their corresponding search items and any informative items found byservice center405 that satisfy the search criteria. In one embodiment, all communications betweenusers415 are encrypted and digitally signed and authenticated, thereby ensuring confidentiality.
In one aspect,service center405 is configured to communicate with intellectual property (IP)management software410 executing withinorganization420 which may be any entity such as a corporation, legal firm, etc. In one embodiment, all communications betweenservice center415 andorganization420 are encrypted and digitally signed and authenticated, thereby ensuring confidentiality.IP management software410 is any software suitable for presenting information and status regarding the intellectual property oforganization420. For example,IP management software410 integrates docketing information, guidelines, templates and existing confidential disclosure agreements.
One beneficial feature of the present invention is that asorganization420 gains new intellectual property, information is automatically (and confidentially) communicated fromIP management software410 to an account withinservice center405. The information regarding the new intellectual property is received byservice center405 and added, as a search item with appropriate search criteria, to the account oforganization420. Once received,service center405 begins monitoring global computer networks for any information regarding the new intellectual property. Thus, the present invention eliminates the need fororganization420 to manually upload information regarding new intellectual property, such as patents and trademarks.Service center405 sends an alert, such as an email, toorganization420 andusers415 when relevant informative items are added to their accounts.
From time to time inventors use technical disclosure services to publish information they want in the public domain but have decided not to pursue via patent or product. This service, however, is quite expensive and may cost up to $300 per page. Conventional services publish the disclosures anonymously in many countries. Such a service is basically a defensive measure by which the inventors prevent others from patenting the idea.
The present invention contemplates technical publication service that anonymously publishes information on global computer networks. More specifically, users log into a website and submit technical disclosures. Preferably the disclosures are in text, Acrobat (pdf), Microsoft Word or any other commonly used format. When a user submits a disclosure, he or she also submits an abstract and perhaps identifies key terms that best describe the disclosure. According to the present invention, after receipt of the disclosure the network service automatically:
- 1. Adds the submitted electronic disclosure to a collection of other store publications. In one embodiment the collection of electronic publications is maintained in a jukebox of recordable CDs.
- 2. Updates a publicly available database, thereby making the publication immediately available to anyone who can access the global computer network.
- 3. Transmits the stored location of the received electronic disclosure, as well as the abstract and key terms, to a plurality of major search engines, thereby making the new disclosure immediately accessible and locatable.
- 4. Accesses one of the search engines and exercises the engine to look for any documents that satisfy the key terms of the received disclosure. While accessing the search engine, the service records the results for future proof of publication.
- 5. Communicates the results to the user via email or paper so that the user can offer the results as evidence that the disclosure was indeed published and available to the public.
- 6. Maintains the received publication in the database for a fixed period of time, thereby allowing the public to retrieve and view the document. Various embodiments of a method and system for confidentially accessing and reporting information present on global computer networks have been described. This application is intended to cover any adaptations or variations of the present invention. It is manifestly intended that this invention be limited only by the claims and equivalents thereof.