The method and system of the automatically positioning main contents of webpages of mobile communication equipment terminalsTechnical field:
The present invention relates to a kind of disposal route and system thereof of automatically positioning main contents of webpages, particularly a kind of disposal route and system thereof that is used for the automatically positioning main contents of webpages of mobile communication equipment terminals.
Background technology:
Development along with the internet web page technology, it is more and more rich and varied that internet web page becomes, and the content information of being showed is also more and more multiple various, for example: because factors such as commerce, interests, often a common page also can contain a large amount of advertisements except containing main contents of webpages.Yet the information that some and webpage main contents have nothing to do not is that page viewers are desired preferentially to be seen.
For the cellphone subscriber, because mobile phone screen is narrow and small, the web page contents that can show is limited, if (for example: navigation bar contain the irrelevant information of a large amount of and main contents of webpages in the content of pages, advertisement columns etc.), so, the user just needs continuous page turning to seek the main contents of webpage.
If browser can directly navigate to the screen position main contents of the page after the load page content, this will improve user's experience greatly so.Wherein, the problem of most critical is to want to find main contents automatically on webpage, yet traditional implementation method based on statistics, artificial intelligence is very complicated, and effect and bad.
Summary of the invention:
The objective of the invention is provides a kind of method of automatically positioning main contents of webpages in order to overcome the existing problem that can't navigate to the main contents of webpage in the process of portable terminal browsing page a quick step.
The invention provides a kind of disposal route that is used for the automatically positioning main contents of webpages of mobile communication equipment terminals, comprising:
URL according to the target web page gets access to the target web page data and will resolve to first data with preset data structure;
URL according to the described target web page, from the database of server end, find in advance the URL of the similar but incomplete same similar web page page of the URL to the described target web page of storage, and obtain the similar web page page data according to the URL of the described similar web page page;
The described similar web page page data that obtains is resolved to second data with preset data structure;
Described first data and described second data are compared, and the data that the different pieces of information between described first data and described second data partly are defined as the main contents of the described target web page return to mobile communication equipment terminals.
Wherein, above-mentioned disposal route also comprises: with the data of described main contents mark in addition, and return the described target web page data handled through described mark to browser;
Described browser identifies the described mark in the described target web page data, shows described target web content of pages, and the screen position is navigated to described main contents place with the corresponding described target web page of described mark.
Wherein, in described similar web page page data obtaining step, when not having the URL that stores the described similar web page page in advance in the described database, the described target web page data that will obtain in described target web page data obtaining step is without the mark processing and directly be returned to browser, and shows the described target web page on described browser.
Wherein, in described similar web page page data obtaining step, when not having the URL that stores the described similar web page page in advance in the described database, deposit the URL of the described target web page in described database, and the described target web page data that will obtain in described target web page data obtaining step handles without mark and directly is returned to browser, and shows the described target web page on described browser.
The present invention also provides a kind of disposal system that is used for the automatically positioning main contents of webpages of mobile communication equipment terminals, comprising:
Target web page data acquisition module is used for getting access to the target web page data and will resolving to first data with preset data structure according to the URL of the target web page;
Similar web page page data acquisition module, be used for URL according to the described target web page, from the database of server end, find in advance the URL of the similar but incomplete same similar web page page of the URL to the described target web page of storage, and obtain the similar web page page data according to the URL of the described similar web page page;
Similar web page page data parsing module, the described similar web page page data that is used for obtaining resolves to second data with preset data structure;
The data identification module of main contents, be used for described first data and described second data are compared, and the data that the different pieces of information between described first data and described second data partly is defined as the main contents of the described target web page are returned to mobile communication equipment terminals.
Wherein, above-mentioned disposal system also comprises:
The data markers module of main contents is used for the data of described main contents mark in addition, and returns the described target web page data handled through described mark to browser;
Main contents locating and displaying module, when described browser identifies described mark in the described target web page data, show described target web content of pages, and the screen position is navigated to described main contents place with the corresponding described target web page of described mark.
Wherein, when not having the URL that stores the described similar web page page in advance in the described database, the described target web page data that will obtain in described target web page data acquisition module is without the mark processing and directly be returned to browser, and shows the described target web page on described browser.
Wherein, when not having the URL that stores the described similar web page page in advance in the described database, deposit the URL of the described target web page in described database, and the described target web page data that will obtain in described target web page data acquisition module handles without mark and directly is returned to browser, and shows the described target web page on described browser.
By implementing the present invention, make browser after page loading is finished, can navigate to the screen position main contents part of the corresponding web page page automatically, make things convenient for the user to read and check.And, on webpage, find the main contents technology to compare automatically with traditional method realization based on statistics, artificial intelligence, the present invention implements easily and is effective.
Description of drawings:
Fig. 1 is the block diagram of automatically positioning main contents of webpages disposal system;
Fig. 2 is the process flow diagram of automatically positioning main contents of webpages disposal route;
Fig. 3 is another process flow diagram of automatically positioning main contents of webpages disposal route;
Fig. 4 is another process flow diagram of automatically positioning main contents of webpages disposal route;
Fig. 5 is another process flow diagram of automatically positioning main contents of webpages disposal route.
Embodiment:
Describe the specific embodiment of the present invention in detail below in conjunction with accompanying drawing.
Because internet site uses template in a large number, on same website, the similar page of URL (URL(uniform resource locator), Uniform Resource Locator) all is the same column that belongs to the website usually, and has used same web page template.Therefore, in order to determine the main contents of a Webpage, at first need to obtain in advance the URL (for example http://www.uc.cn/a/news/2010/0404/1041.html) that a URL (for example http://www.uc.cn/a/news/2010/0524/1370.html) with this target web page (being the Webpage that the user will browse) has another Webpage of highest similarity (but must be incomplete same).Then, server all resolves to dom tree with them after obtaining the data of these two pages according to above-mentioned two Webpage URL, these two dom trees are compared again, determine different piece between two dom trees, above-mentioned different piece is exactly the main contents of the Webpage that will browse of user.
Below with reference to the accompanying drawings, the specific embodiment of the present invention is elaborated.
Fig. 1 shows a kind of block diagram of automatically positioning main contents of webpages disposal system.As shown in Figure 1, disposal system comprises thedata identification module 14 of target web pagedata acquisition module 11, similar web page pagedata acquisition module 12, similar web page pagedata parsing module 13, main contents, thedata markers module 15 and the main contents locating and displayingmodule 16 of main contents.Wherein, when the user imports the URL of the target web page in the browser of client (mobile communication equipment terminals), above-mentioned target web pagedata acquisition module 11 URL according to this target web page get access to the target web page data and it are resolved to have the preset data structure data (first data) of (for example dom tree structure), above-mentioned then similar web page pagedata acquisition module 12 URL according to the above-mentioned target web page, from the database of server end, find in advance the URL of another the most similar with it but incomplete same Webpage of storage, be the URL of the target web page, and obtain the page data of this similar web page according to the URL of this similar web page page.Above-mentioned similar web page pagedata parsing module 13 obtains above-mentioned similar web page pagedata acquisition module 12 accordingly from the internet, the page data similar to target web resolves to another data (second data) with preset data structure (for example dom tree structure), compare by above-mentioned two kinds of data of resolving above-mentioned corresponding page data and obtaining by 14 pairs of the data identification module of above-mentioned main contents again with preset data structure, determine both different pieces of information parts, described different pieces of information partly is the data of the main contents of the target web page, and by thedata markers module 15 of above-mentioned main contents with the data of above-mentioned main contents mark in addition, the above-mentioned target web page data that return is handled through above-mentioned mark is to client (mobile communication equipment terminals) browser, identify described mark in the described target web page data by above-mentioned main contents locating and displayingmodule 16 then, and show above-mentioned target web content of pages, and the screen position is navigated to described main contents place with the corresponding above-mentioned target web page of above-mentioned mark at client browser.
Fig. 2 shows the flow process of automatically positioning main contents of webpages.As shown in Figure 2, in step S11, when the user wants to browse the content of Webpage of certain website, the URL of this target web page of input in the browser of client (mobile communication equipment terminals), above-mentioned target web pagedata acquisition module 11 is according to the URL of the above-mentioned target web page of client input acquisition, and obtain corresponding web data (step S12) from above-mentioned website, and the above-mentioned web data that is obtained is resolved to have the preset data structure data (step S13) of (for example dom tree structure) according to the URL of the above-mentioned target web page that is obtained.
In step S14, above-mentioned similar web page pagedata acquisition module 12 is found the URL (being the URL of the similar web page page) of the most similar with it but incomplete same another Webpage according to the URL of the above-mentioned target web page from the database that is arranged in server end.Wherein, the URL of above-mentioned another Webpage is collected in advance and is stored in the corresponding database.
In step S15, above-mentioned similar web page pagedata parsing module 13 again according to the URL of above-mentioned another Webpage from the internet, obtain accordingly, the page data similar to target web, and it is resolved to the data that another has preset data structure (for example dom tree structure).
In step S16, thedata identification module 14 of above-mentioned main contents will compare by resolving above-mentioned two kinds of data with preset data structure that corresponding page data obtains, determine both different pieces of information parts, described different pieces of information partly is the data of the main contents of the target web page.
In step S17, thedata markers module 15 of above-mentioned main contents is carried out mark with the data of the main contents of the described target web page again, and the target web page data after will handling through mark be returned to client browser.The target web page data that promptly is returned to client browser is to comprise the page data that has the main contents mark.Wherein, after the target web page data that server will have a mark was returned to client browser, this browser carried out certain processing to above-mentioned target web page data, makes above-mentioned webpage data be suitable for showing on the browser of portable terminal.Make it meet the demonstration of the browser of portable terminal for processing how to carry out webpage data, the present invention limits especially, can make after treatment above-mentioned target web page data to show on the browser of mobile communication equipment terminals and get final product.
In step S18, above-mentioned main contents locating and displayingmodule 16 receives described target web page data, identify the page data of described mark, the display web page content of pages, and screen navigated to and the corresponding content of pages of the page data of described mark place, the automatic location of promptly having finished main contents of webpages.
In the above-described embodiment, in step S14, when above-mentioned similar web page pagedata acquisition module 12 can't find the URL of another Webpage similar but incomplete same to the URL of target web, need re-enter the URL of different target Webpage.As shown in Figure 3, step S21-23 is identical with step S11-13 respectively, and step S26-29 is identical with step S15-18 respectively.In step S24, above-mentioned similar web page pagedata acquisition module 12 URL according to the acquisition target web, whether inquiry exists the URL of another similar but incomplete same Webpage of the URL to the above-mentioned target web page of prior storage in server database.(step S25: be) enters step S26 when inquiring the URL that has above-mentioned another Webpage in server database.(step S25: not) when inquiring the URL that in server database, does not have above-mentioned another Webpage, return step S21, re-enter the URL of the target web page different, promptly import the URL of other target web page of the URL that is different from the target web page of importing last time with input last time.
In the above-described embodiment, as shown in Figure 4, step S31-34,36-39 are identical with step S21-24,26-29 respectively.In step S35, (step S35: be) enters step S36 when inquiring the URL that has above-mentioned another Webpage in server database.(step S35: not), the target web page data that will obtain in step S32 directly is returned to client browser and shows (step S310) when inquiring the URL that does not have above-mentioned another Webpage in server database.
Also there is another embodiment in the present invention, and as shown in Figure 5, step S41-44,46-49 are identical with step S31-34,36-39 respectively.In step S45, (step S45: be) enters step S46 when inquiring the URL that has above-mentioned another Webpage in server database.(step S45: not) when inquiring the URL that in server database, does not have above-mentioned another Webpage, the URL of the target web page that will import in step S41 deposits (step S410) in the above-mentioned server database in, and the target web page data that will obtain in step S42 directly is returned to client browser and shows (step S411) simultaneously.
Aforesaid detailed description and accompanying drawing are undertaken by literal interpretation and diagram, and its purpose does not lie in the protection domain that limits claim.Each mutation of embodiment in this instructions is apparent for those of ordinary skill, and is in the protection domain of claim and equivalent technologies thereof.