CROSS REFERENCE TO RELATED APPLICATIONSThis application claims the benefit and priority of U.S. Provisional Patent Application entitled “Automatic Display of Web Content to Smaller Display Devices: Improved Summarization and Navigation” filed May 10, 2001 and of U.S. patent application Ser. No. 10/076,786 entitled “System and Method for Modifying a Document Format)” filed Feb. 14, 2002, the disclosures of which are hereby incorporated by reference in their respective entireties.[0001]
TECHNICAL FIELDThe present invention relates to a system and method for modifying a document format.[0002]
BACKGROUNDHandheld devices, including Personal Digital Assistants (PDAs) and cellular telephones, offer connectivity to the Internet and permit access to documents available over the Internet. Wireless Application Protocol (WAP) is a standard for providing cellular phones, PDAs, pagers and other handheld devices with secure access to web pages. WAP features the Wireless Markup Language (WML), which generally serves as a medium for translating web-based HTML content into a format that accommodates small form factor displays and key sets found on conventional handheld devices. WML also allows handheld device manufacturers to include microbrowsers in their products that accept WML input from a WAP-based system across vast regions of the world.[0003]
The proliferation of wireless PDAs has also created a popular means for handheld Internet access. However, presenting IP-based content, and other content developed for display on large form factor devices (e.g., PC monitors), on small form factor screens of handheld devices has, in the past, been problematic. Two primary methods of presenting such content to handheld devices have been employed.[0004]
The first such method can be termed “fixed mapping.” Fixed mapping typically involves rewriting an existing document, such as an HTML-based web page, to conform to a specific standard, such as WAP, J-PHONE, or i-Mode, or to a small display device. A web server must then maintain the rewritten web site as a separate site with its own URL in addition to the original document. As new content is added to the original document, a web site operator must manually trim, edit, and condense the new content by rewriting the new content into a format that will accommodate the interface parameters of handheld devices. This method is limited in that considerable time and expense are typically required to maintain the two web sites in parallel. Further, the manual editing of the rewritten web site can be time-consuming, burdensome, and expensive.[0005]
The second method may be termed “transcoding.” Transcoding typically involves the use of software that takes the entire content of a web site as input, converts the entire content into a format of a specific handheld wireless standard for transmission to handheld devices. The entire content, as formatted according to a handheld wireless standard, is then transmitted to the handheld device. This conversion may be performed “on-the-fly” (i.e., automatically in real time) or may be performed manually.[0006]
Transcoding has the advantage of reducing the investment to reach wireless markets since it leverages existing web sites. From a user standpoint, transcoding is desirable in that it preserves all the text-based information from the originating site. For large volumes of text, however, using this approach may overwhelm the handheld device user with large volumes of text to be viewed on a small form factor display. Further, the unorganized transcoded content makes changes or modifications to the wirelessly enabled web site more difficult for the web site operator.[0007]
In addition, many wireless handheld devices have limited bandwidth. Thus, downloading an entire web page designed for viewing on a large form factor device at data rates common to handheld wireless devices may require large download times. These large download times may be burdensome to the user who must wait while the entire web page downloads, even though the user may only desire to view a portion of the web page. Further, these large download times may be expensive for users who pay for wireless service based on the amount of time or the number of packets downloaded. For example, service plans are time-based or packet-based. These service plans charge on either the time connected or number of packets received, respectively. Thus, large downloads under these service plans will be more expensive than smaller downloads.[0008]
Additional background details are disclosed in U.S. Pat. No. 6,336,124, the disclosure of which is hereby incorporated by reference.[0009]
SUMMARYAccordingly, a need exists to provide a system and method for presenting content developed for display on large form factor devices (e.g., PC monitors) on small form factor screens of handheld devices. In particular, a need exists for a system and method for permitting a handheld device user to easily navigate material available over a network, such as an Internet web site.[0010]
Pursuant to one embodiment of the present invention, a method of ranking entries in a table of contents for display at a client device includes transmitting a first document from an application server over a network, such as the Internet, to the client device. The first document includes text and at least one link. The application server then receives a request for a second document associated with the link from the client device. Next, the application server divides the second document into subdocuments and assigns a label to each of a plurality of the subdocuments. The application server also performs a comparison of the text of the first document with the text of each of the plurality of subdocuments to generate a document-document value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the document-document values.[0011]
In another embodiment, the application server performs a comparison of the text of the link with the text of each of the subdocuments to generate a link-text value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the link-text values.[0012]
In yet another embodiment, the application server performs a comparison of the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the link-label values.[0013]
In still another embodiment, the application server generates a size value indicative of an amount of text in each of the plurality of subdocuments for each of the plurality of subdocuments. After generating a size value for each of the plurality of subdocuments, the application server ranks the plurality of subdocuments based, at least in part, by the size value.[0014]
In this manner, subdocuments likely to be relevant to the first document, the selected link, or both, are listed at or near the top of a table of contents to facilitate user selection of the same. Hence, users may easily follow a text that spans multiple documents by having table of contents of a requested page list the subdocuments containing continuing portions of the text listed at or near the top of the table of contents.[0015]
Additional details regarding the present system and method may be understood by reference to the following detailed description when read in conjunction with the accompanying drawings.[0016]
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a block diagram of a document delivery system in accordance with one embodiment of the present invention.[0017]
FIG. 2 is a block diagram of the formatter of FIG. 1 in accordance with one embodiment of the present invention.[0018]
FIG. 3 is a block diagram of the mapper of FIG. 2 in accordance with one embodiment of the present invention.[0019]
FIG. 4 illustrates a tree data structure in accordance with one embodiment of the present invention.[0020]
FIG. 5 is a block diagram of the control module of FIG. 2 in accordance with one embodiment of the present invention.[0021]
FIG. 6 is a flowchart illustrating a method in accordance with one embodiment of the present invention.[0022]
FIG. 7 is a flowchart illustrating a method in accordance with another embodiment of the present invention.[0023]
FIG. 8 illustrates a progression of material displayed at the display of the client device of FIG. 1 in accordance with an embodiment of the present invention.[0024]
Common reference numerals are used throughout the drawings and detailed description to indicate like elements.[0025]
DETAILED DESCRIPTIONFIG. 1 illustrates a[0026]document delivery system100 in accordance with one embodiment of the present invention. Thedocument delivery system100 permits aclient102 to access content of documents (not shown) stored atserver104,server106, orother servers108 over anetwork110, such as the Internet, and over anetwork111, such as an intranet.
In one embodiment, the[0027]client102 comprises a handheld device, such a PDA (Personal Digital Assistant), a mobile telephone, or the like, having a smallform factor display112. Theclient102 also includes aweb browser114. Theweb browser114 may comprise a microbrowser designed for small display screens on web-enabled cellular telephones, PDAs and other handheld devices, including wireless handheld devices.
The[0028]client102 may exchange data with thenetwork110 in a wireless fashion via awireless station120 and agateway122 in accordance with WAP (Wireless Application Protocol), i-Mode, or other suitable protocol or service. Optionally, theclient102 may exchange data with thenetwork110 via a wired connection (not shown).
The[0029]client102 may also exchange data with thenetwork111 in a wireless fashion via awireless station121 and agateway123 in accordance with WAP (Wireless Application Protocol), i-Mode, or other suitable protocol or service for delivery of web content to small display devices. Optionally, theclient102 may exchange data with thenetwork111 via a wired connection (not shown).
In one embodiment, the[0030]gateways122,123 are network devices that connect a wireless network with a wired network, such as thenetworks110,111. Access between theclient102 andapplication server124 may also pass through one or more other firewalls (not shown), other gateway devices (not shown), or the like.
Pursuant to one embodiment, the[0031]client102 transmits requests for documents stored on one or more of theservers104,106,108 to theapplication server124. The request for content may comprise an HTTP request or other suitable type of request. Moreover, theapplication server124 may alternatively receive the request for a document from theclient102 from any network (e.g.,110,111). Theapplication server124, among other functionality, functions as a proxy server and receives requests for documents from client devices, such as theclient102, over thenetworks110,111 and provides associated content in response to such requests by transmitting the associated content over at least one of thenetworks110,111.
In response to a request for a document from the[0032]client102, theapplication server124 requests the document identified by the request from one or more of theservers104,106,108. Upon receipt of the document identified by the request, theapplication server124 modifies the format of the document identified by the request for content using aformatter126.
In one embodiment, the document identified by the request is an HTML or XML web page, although other document types, such as PDF (Portable Document Format), may also be requested. The[0033]application server124 then transmits at least a portion of the reformatted content of the document identified by the request to theclient102 in a format compatible with thebrowser114 for display at thedisplay112 of theclient102.
The[0034]formatter126 includes a database (see FIG. 5) that may be configured from aclient admin computer140 via adatabase modifier128. Thedatabase modifier128 may comprise a JavaScript module that permits a user at the client admin computer to visually modify a data structure of a document into a desired format. The modification may be performed by, for example, adding labels, re-ordering, moving, deleting, or otherwise changing portions of the data structure and stores the changed, or modified version of the data structure in the database.
In particular, the[0035]client admin computer140 includes aweb browser142, such as Internet Explorer™ by Microsoft Corporation or other suitable web browser for permitting a user at theclient admin computer140 to view pages at thedatabase modifier128 hosted at theapplication server124. The pages at thedatabase modifier128 of theapplication server124 permit user configuration of the FIG. 5 database, as discussed in more detail below.
In general, the[0036]formatter126 receives the document identified by the request from one of theservers104,106,108, divides the document into multiple blocks, and assigns labels to individual blocks. Theformatter126 then generates a list containing the content of the various blocks. If a data structure associated with the document is stored in the database, theformatter126 then uses the data structure to generate output files from the generated list of content. The output file may contain a Table of Contents (TOC) page and subdocuments. The TOC page lists labels associated with the subdocuments and may contain links to the subdocuments. Theformatter126 then transmits the TOC page, a headline, an image, or other content specified by a database at theapplication server124 to theclient102 over at least one of thenetworks110,111. Details of the operation of theformatter126 are discussed in more detail below.
FIG. 2 illustrates details of the[0037]formatter126 of FIG. 1 according to one embodiment of the invention. As shown, theformatter126 includes amapper202, and acontrol module206, which may comprise software written in C++ or other suitable programming language. Themapper202 receives the requested document and reformats the document as a list ofdocument content204. Thecontrol module206 then generates an output file using thelist document content204. Additional details regarding themapper202, the list ofdocument content204, and thecontrol module206 are discussed below.
FIG. 3 illustrates details of the[0038]mapper202 of FIG. 2 according to one embodiment of the invention. Themapper202 includes a number of software modules stored in a computer readable medium. In particular, themapper202 includes anetwork interface302, aparser304, alabel engine306, adata structure converter308, and aranking engine310. Thenetwork interface302 receives the document requested from the network. As mentioned above, the document requested may comprise a web page, such as an HTML document, and XML document, or the like.
The[0039]parser304 parses and decomposes the document into a tree data structure. FIG. 4 illustrates an exampletree data structure400, which may comprise a structural representation of a document, such as an HTML web page. As shown, thetree data structure400 includes aroot node402 associated with the document. The parser304 (FIG. 3) divides the document into multiple blocks and represents each block of the document as atable node404 in thetree data structure400. Eachtable node404 has at least onerow node406 as a child node.Individual row nodes406 each have at least onecolumn node408 as a child node. Thecolumn nodes408 may then have additional table nodes as children. At this point, thetree data structure400 may be recursive.
Thus, the document is divided into blocks, which may be defined by the structure of the document. The primary content for each of the blocks, or tables, is stored in the[0040]column nodes408 and the remaining structure of the various blocks is represented in the other portions of thetree data structure400.
Referring again to FIG. 3, the[0041]label engine306 then assigns labels to individual blocks and may assign a classification to each block according to the contents of the block. In one embodiment, thelabel engine306 assigns a classification to each block based on the block contents. For example, if the document is a web page, the web page may include links, text, forms, and pictures, as well as other classes of content.
The[0042]label engine306 optionally analyzes individual blocks and assigns a classification to the block indicating the type, or class, of content in the block. Hence, a block that contains primarily links may be assigned a “navigation” classification, a block that contains primarily text may be assigned a “story” classification, a block that contains primarily pictures may be assigned an “image” classification, and a block that contains form information like an address may be assigned a “form” classification. Thelabel engine306 inserts a classifier associated with the assigned classification for each block into the table node of each block.
After classifying the blocks, the[0043]label engine306 optionally merges, or combines, column nodes of each block that have the same classification. For example, if a given block has multiple column nodes having the classification of “story,” thelabel engine306 may merge, or combine, the content of these column nodes. Likewise, if a given block has multiple columns having the classification of “navigation,” thelabel engine306 may merge, or combine, the content of these column nodes.
In one embodiment, the[0044]label engine306 may merge, or combine, column nodes in accordance with predetermined merging rules stored at thelabel engine306. An example merging rule is that a large “story” node is not merged with another large “story” node. Another example merging rule is that a small “story” node may get merged with a “navigation” node. Thus, according to these rules, a large story, which is likely to be substantial enough to be viewed in isolation, will not be combined with another large story. However, a small story would not be isolated. Rather, the user experience may be improved by merging other nodes, such as a small “navigation” node or a small “story” node. The specifics of these merging rules may vary and may be customized according to particular applications. The classifying and merging are optional according to some embodiments of the invention.
The[0045]label engine306 also assigns a label to each block according to the block contents. In one embodiment, thelabel engine306 uses the first several words of text of a block including text as the label for that block. In another embodiment, thelabel engine306 assigns a label to a block based on the classification of the block. Thelabel engine306 then adds the assigned label to the table node of the associated block.
With continued reference to FIG. 3, a[0046]data structure converter308 of themapper202 next “flattens” the tree data structure by converting the tree data structure into a linear, one-dimensional list containing the content of thecolumn nodes408. Thetable nodes404 and therow nodes406 are not included in the one-dimensional list. Individual entries in the one-dimensional list include the content of an associatedcolumn nodes408.
A[0047]ranking engine310 then ranks the entries in the one-dimensional list according to the content of the individual entries. In one embodiment, theranking engine310 analyzes characteristics of each entry and assigns a “weight” value to each entry. The weight assigned to each entry may be based on a variety of parameters. These parameters may include, for example, the size of the font used in the entry, whether the text in the entry is boldface, the color of the text, whether the text is flashing, whether the text is underlined, and the position of the item in the document.
The[0048]ranking engine310 may also generate a size value indicative of an amount of text in each of the plurality of subdocuments. Pursuant to this embodiment, the size value may be larger for subdocuments comprising large amounts of text and the size value may be smaller for subdocuments comprising smaller amounts of text. Ranking the entries in the table of contents, at least in part, according to the size value tends to make entries associated with larger amount of text appear higher on the list of entries in the table of contents (i.e., or more important or more relevant).
In one embodiment, the weight assigned to each entry may also depend on the content of the link leading to the document, the text of the previous document, the text of the subdocument associated with the entry, the text of the label associated with the entry, or a combination of these. Additional details regarding this embodiment are described below with reference to FIGS. 7 and 8. Based on parameters such as these, the[0049]ranking engine310 assigns a weight to individual entries in the one-dimensional list and then re-orders the one-dimensional list according to the weighted rankings.
In one embodiment, the[0050]ranking engine310 reorders the list in an order of decreasing weight values such that the first entry in the re-ordered list is the entry having the largest weight value and the last entry in the list the entry having the smallest weight value. The re-ordered list is then stored as the list of document content204 (FIG. 2). Thus, in some embodiments, entries having large or bold text may be ranked before entries having smaller or plain text. Also, entries having a graphic may be ranked higher than entries having primarily links.
FIG. 5 illustrates details of the[0051]control module206 of FIG. 2 in accordance with one embodiment of the present invention. In general, thecontrol module206 receives the list ofdocument content204 and creates a new document structure according to anavigation rules database502 and the list ofdocument content204.
The navigation rules[0052]database502 contains a tree data structure for one or more documents. In one embodiment, contents of thenavigation rules database502 may be modified by accessing the formatter126 (FIG. 1) from a client computer, such as the client admin computer140 (FIG. 1). Thedatabase modifier128 may modify the contents of thenavigation rules database502 described above.
In particular, the[0053]client admin computer140 includesbrowser142 and permits a user to access thedatabase modifier128 and to modify the contents of thenavigation rules database502. To modify the contents of thenavigation rules database502, a user at theclient admin computer140 directs thebrowser142 to thedatabase modifier128. Thedatabase modifier128 then presents the user with a GUI (Graphical User Interface) via thebrowser142 that permits the user to view a default tree data structure, as constructed by themapper202, for a given document, such as an HTML or XML web page document. The default tree structure may be the structure of the document at issue as determined by parsing the document.
The user may then delete entries in the tree data structure. The user may alternatively move tree data structure entries from one location to another within the tree data structure. Further, the user may change the label or classification assigned to given nodes within the tree data structure. After the user has thus modified, or customized, the tree data structure, the[0054]control module206 stores the modified tree data structure as an entry in thenavigation rules database502 associated with the document.
The[0055]control module206 also includes a URL (Uniform Resource Locator)checker504. TheURL checker504 receives the list ofdocument content204 from themapper302 and determines whether thenavigation rules database502 includes a tree data structure associated with the list ofdocument content204. In one embodiment, the URL checker determines whether the URL associated with the list ofdocument content204 matches a URL associated with an entry in thenavigation rules database502. If such a match exists, anoutput file generator506 retrieves the tree data structure in thenavigation rules database502 associated with the list ofdocument content204. Theoutput file generator506 then creates one ormore output files508 based on the retrieved tree data structure using the content of list ofdocument content204.
The output files[0056]508, in one embodiment, include a table of contents (TOC) page that lists the labels of the document. The output files508 also contain one or more subdocuments. Individual sub-pages are associated with individual entries in the TOC. One or more of the labels, or entries, of the TOC may include links to associated subdocuments.
If the[0057]URL checker504 determines that thenavigation rules database502 does not include a tree data structure associated with the list ofdocument content204, then theoutput file generator506 generates an output files508 that include a TOC page that lists the labels of the document. One or more of the labels, or entries, of the TOC may include links to associated subdocuments.
The[0058]formatter126 then transmits the TOC page over at least one of thenetworks110,111 to theclient102. Upon receipt of the TOC page at theclient102, theclient102 displays the TOC page at thedisplay112 of theclient102. The user may then select a link associated with one of the entries of the TOC, which requests an associated subdocument from the output files508. In response to a request for a subdocument in the output files508, the formatter transmits the requested subdocument to theclient102 over at least one of thenetworks110,111 for display at thedisplay112 of theclient102.
FIG. 6 illustrates a[0059]flowchart600, which depicts a method according to one embodiment of the present invention. The method commences atblock602 whereapplication server124 receives a request for document from the client102 (FIG. 1), the requested document residing on at least one of theservers104,106,108. The request for document may be directed to theapplication server124 directly. Alternatively, the request for document may be directed directly to one of theservers104,106,108, which, in turn, redirects the request for document to theapplication server124. The request for document may comprise an HTTP request or other suitable request. Moreover, the requested document may comprise a document in HTML, XML, PDF, or other suitable format.
Next, at[0060]block604, theapplication server124 retrieves the requested document from one or more of theservers104,106,108 on which the document resides. This retrieval may be accomplished by theapplication server124 transmitting an HTTP request to theserver104,106,108 at which the requested document is stored. For example, if the requested document resides at theserver104, theapplication server124 requests the document from theserver104 over thenetwork110 and receives the requested document over thenetwork110.
Then, at[0061]block606, theformatter126 of theapplication server124 extracts a structure of the retrieved document. In one embodiment, a parser304 (FIG. 3) parses the retrieved document and generates a tree data structure representing the structure of the retrieved document. An example of such a tree data structure is illustrated in FIG. 4 and is described above.
For individual nodes of the tree data structure that include document content, the[0062]formatter126 next analyzes the content of the nodes and assigns one of a set of predefined classifiers to each of the nodes based on the content of the nodes, pursuant to block608. As discussed above, for a node having content comprising primarily text, thelabel engine306 of theformatter126 may assign a “story” classifier to the node. The classifier may comprise a text string or other identifier added to the node.
At[0063]block610, thelabel engine306 of theformatter126 assigns labels to individual nodes of the tree data structure that include document content. Thelabel engine306 may assign a label based on the content of the node, the assigned classification of the node, or both. In one embodiment, thelabel engine306 uses the first several words of nodes having text content as the label for the associated node. The label may indicate the content of the node being labeled.
At[0064]block612, thelabel engine306 merges nodes having content according to their classification. For example, if a pair of nodes having content both have the classification “navigation,” then thelabel engine306 merges the content of these nodes to form a single node that includes the content of the merged nodes.Block612 may alternatively, or additionally, be performed afterblock616. In one embodiment, the merging is performed before and after ranking.
At[0065]block614, thedata structure converter308 of themapper202 converts the tree data structure to a list. Thedata structure converter308 extracts the nodes of the tree data structure that include content and generates a list comprising the nodes of the tree data structure that include content, without the other associated nodes, such as table and row nodes, which do not include content.
Next, at[0066]block616, the ranking engine310 (FIG. 3) of themapper202 reorders the entries of the list generated atblock614. In one embodiment, theranking engine310 assigns a weight value to each of the entries in the list according to certain parameters of the content of the entries, the classification of the list entry, or a combination thereof. Then, theranking engine310 reorders the list according to the weight value of the list entries. For example, theranking engine310 may order the list entries in order of decreasing weight value. Theranking engine310 then stores the re-ordered list as the list of document content204 (FIG. 2).
The control module[0067]206 (FIG. 5) then determines whether the navigation rules database520 includes an entry associated with the list ofdocument content204, pursuant to block618. In one embodiment, theURL checker504 of thecontrol module206 determines whether a URL associated with the list ofdocument content204 matches a URL associated with an entry in thenavigation rules database502. TheURL checker504 determines that thenavigation rules database502 contains an entry associated with the list of document content if such a match exists and execution proceeds to block620, else execution proceeds to block622.
At[0068]block620, theoutput file generator506 creates a new data tree structure using the list ofdocument content204 and the associated entry of thenavigation rules database502. The entry of thenavigation rules database502 may specify labels to be assigned to the various nodes, the location of the various nodes within the new data tree structure, and whether certain nodes are included in the new data tree structure. Theoutput file generator506 then creates a new data tree structure according to the entry in thenavigation rules database502 and inserts the associated content from the list ofdocument content204 to form a new data tree, which may be stored as the output files508.
At[0069]block622, theoutput file generator506 stores the new data tree structure as the output files508 if thenavigation rules database502 contains as entry associated with the list ofdocument content204. Otherwise, theoutput file generator506 stores the list of document content as the output files508 or processes the list of document content from memory. Moreover, theoutput file generator506 may generate device-specific output.
The output files[0070]508 include a table of contents (TOC) page that lists the labels of the nodes having content and subdocuments that include the content of blocks associated with the labels. Each of the subdocuments is associated with one of the links so that a user at theclient102 may request a subdocument by selecting the link associated therewith.
Lastly, pursuant to block[0071]624, theformatter126 transmits the TOC page to theclient102.
FIGS. 7 and 8 illustrate details of one embodiment of the operation of the[0072]ranking engine310 described above and illustrated in FIG. 3. Since, according to some embodiments, each document is analyzed individually and independently, when a body of text is followed from one document to another, tracking the body of text is a consideration for the ease of reading the body of text and navigating a set of documents. Indeed it is common for a story to begin on a first document and extend to a second document. Hence, in some applications, it may be desirable to facilitate identification of the continuing portion of the story within the second document, which may be divided into multiple subdocuments.
With reference to FIG. 8, the[0073]display112 of the client device102 (FIG. 1) displays asubdocument802 containingtext806 and one ormore links804. Thelink804 is a selectable connection (e.g., a hyperlink) from a word, a set of words, or other information object, to another. One implementation of thelink804 is a highlighted set of words, or text, that can be selected by a user, such as with a mouse or by touch-screen control, resulting in the immediate delivery and view of another file. The highlighted text may be referred to as an anchor.
FIG. 7 is a flowchart illustrating a method in accordance with one embodiment of the present invention. FIG. 8 illustrates an example sequence of material displayed at the display[0074]112 (see, FIG. 1). In general, user selection of thelink804 causes theclient102 to transmit a request for an associated file, such as a document, from theapplication server124. As discussed above with reference to FIG. 6, when a document is thus requested, the application server124 generates a table ofcontents page810, including a list of labels, with each label being associated with a subdocument.
It is desirable in some applications that the label associated with the selected[0075]link804 be at or near the top of the list of labels in the table ofcontents page810 to facilitate navigation and to permit the user to easily locate the label associated with the selected link. Thus, it is desirable that thelabel812 of the table ofcontents page810 be associated with the selected link to permit the user to quickly and easily identify the subdocument associated with the selectedlink804. The user may then select thelabel812, which comprises a link to thesubdocument820 containing thetext822.
Referring to FIGS. 7 and 8, the user at a client[0076]102 (FIG. 1) views asubdocument802 at adisplay112 of theclient102. As shown in FIG. 8, thesubdocument802 includestext806 and one ormore links804. Pursuant to block701 of FIG. 7, the user selects one of thelinks804 of thesubdocument802.
The user selection of the[0077]link804 pursuant to block701 causes theclient102 to transmit a request for a document associated with thelink804 selected by the user. Pursuant to block702, theapplication server124 receives the request for document from the client102 (FIG. 1), the requested document residing on at least one of theservers104,106,108. The request for document may be directed to theapplication server124 directly or to one of theservers104,106,108, which, in turn, redirects the request for document to theapplication server124.
Next, at block[0078]704, theapplication server124 retrieves the requested document from one or more of theservers104,106,108 on which the document resides. This retrieval may be accomplished as described above. Atblock706, theformatter126 of theapplication server124 extracts a structure of the retrieved document as described above.
For individual nodes of the tree data structure that include document content, the[0079]formatter126 next analyzes the content of the nodes and assigns one of a set of predefined classifiers to each of the nodes based on the content of the nodes, pursuant to block708 as discussed above. Atblock710, thelabel engine306 of theformatter126 assigns labels to individual nodes of the tree data structure that include document content as discussed above. Atblock712, thelabel engine306 merges nodes having content according to their classification and, atblock714, thedata structure converter308 of themapper202 converts the tree data structure to a list, as discussed above.
At[0080]block716, the ranking engine310 (FIG. 3) compares thetext806 of theprevious subdocument802 to each of the subdocuments of the requested document using conventional document, or text, matching techniques to determine the extent to which the previous subdocument is associated with each of the subdocuments of the requested document. Theranking engine310 may employ an n-dimensional vector matching technique for comparing the text of theprevious subdocument802 to each of the subdocuments of the requested document.Modern Information Retrieval,by R. Baeza-Yates, et al, published by Addison-Wesley Pub Co; 1999, ISBN: 020139829X, discloses related techniques and is incorporated herein by reference.
In comparing the[0081]text806 of theprevious subdocument802 to each of the subdocuments of the requested document, the ranking engine301 generates a document/document value for each of the subdocuments of the requested document. The document/document value indicates the degree to which there is an association between thetext806 of theprevious subdocument802 to each of the subdocuments of the requested document. For example, if thetext806 of thesubdocument802 included the terms such as “XYZ,” “merger,” “corporate,” “shareholders” and the like, the ranking engine301 would assign a higher degree of association, and thus either a higher or lower document/document value, to subdocuments in the requested page that include the same or similar terms.
At[0082]block718, the ranking engine310 (FIG. 3) compares the text of the selectedlink804 to each of the subdocuments of the requested document. For example, if the selectedlink804 comprised the text “XYX merger” theranking engine310 would determine the degree to which the text “XYZ merger” is present in each of the subdocuments of the requested document. The ranking engine301 generates a link/document value for each of the subdocuments of the requested document. The link/document value indicates the degree to which the text of the selected link is present in each of the subdocuments of the requested document.
At[0083]block720, theranking engine310 compares the text of the selectedlink804 to each of the labels of the requested document. For example, if the selectedlink804 comprised the text “XYZ merger” theranking engine310 would determine the degree to which the text “XYZ merger” is present in each of the labels assigned to the requested document and would generate a link/label value for each of the subdocuments of the requested document. The link/label value indicates the degree to which the text of the selected link and the subdocuments of the requested document are related.
The[0084]ranking engine310 may also use additional factors in reordering the list entries. For example, the ranking engine may generate a size value indicative of an amount of text in each of the plurality of subdocuments. Pursuant to this embodiment, the size value may be larger for subdocuments comprising large amounts of text and the size value may be smaller for subdocuments comprising smaller amounts of text. Ranking the entries in the table of contents, at least in part, according to the size value tends to make entries associated with larger amount of text appear higher on the list of entries in the table of contents.
At[0085]block722, theranking engine310 reorders the list entries according to the document/document value, the link/document value, the link/label value or a combination of these values. In one example embodiment, theranking engine310 assigns a weight to each of the document/document, link/document, and link/label values and then combines the weighted values to determine the reordering of the list entries. Pursuant to another embodiment, theranking engine310 reorders the list entries according to one or more of the document/document, link/document, and link/label values and other factors, including, for example, amount of content in the subdocument, the size of the font used in the subdocument, whether the text in the subdocument is boldface, the color of the text in the subdocument, whether the text of the subdocument is flashing, and the position of the item in the document.
After the[0086]ranking engine310 has reordered the list entries, execution returns to block618 of the flowchart of FIG. 6 as described above. Performing one or more of theblocks716,718,720 together with theblock722 improves user navigation. In particular, this functionality increases the probability that the label listed at or near the top of the table ofcontents810 will be associated with the selected link, the text of the subdocument including the selected link, or both.
The above-described embodiments of the present invention are meant to be merely illustrative and not limiting. Thus, those skilled in the art will appreciate that various changes and modifications may be made without departing from this invention in its broader aspects. Therefore, the appended claims encompass such changes and modifications as fall within the scope of this invention.[0087]