Movatterモバイル変換


[0]ホーム

URL:


CN101866362A - Method and system for automatically positioning main contents of webpages for mobile communication equipment terminal - Google Patents

Method and system for automatically positioning main contents of webpages for mobile communication equipment terminal
Download PDF

Info

Publication number
CN101866362A
CN101866362ACN 201010215031CN201010215031ACN101866362ACN 101866362 ACN101866362 ACN 101866362ACN 201010215031CN201010215031CN 201010215031CN 201010215031 ACN201010215031 ACN 201010215031ACN 101866362 ACN101866362 ACN 101866362A
Authority
CN
China
Prior art keywords
web page
data
target web
url
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010215031
Other languages
Chinese (zh)
Inventor
梁捷
周志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ucweb Inc
Original Assignee
Ucweb Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ucweb IncfiledCriticalUcweb Inc
Priority to CN 201010215031priorityCriticalpatent/CN101866362A/en
Publication of CN101866362ApublicationCriticalpatent/CN101866362A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The invention discloses a processing method for automatically positioning main contents of webpages for a mobile communication equipment terminal, which comprises that: according to URL of a target webpage input by a client, a server acquires the data of the corresponding webpage, and simultaneously searches out the URL of another webpage, which is pre-stored and has the highest similarity with (but different from) the URL of the target webpage, from a server database; after acquiring the data of the two webpages according to the URL of the two webpages, the server resolves the data into DOM trees and compares the two DOM trees, and the different part of the two DOM trees is the main contents of the webpage to be browsed by the user. The invention also provides a process system for automatically positioning main contents of webpages for a mobile communication equipment terminal.

Description

The method and system of the automatically positioning main contents of webpages of mobile communication equipment terminals
Technical field:
The present invention relates to a kind of disposal route and system thereof of automatically positioning main contents of webpages, particularly a kind of disposal route and system thereof that is used for the automatically positioning main contents of webpages of mobile communication equipment terminals.
Background technology:
Development along with the internet web page technology, it is more and more rich and varied that internet web page becomes, and the content information of being showed is also more and more multiple various, for example: because factors such as commerce, interests, often a common page also can contain a large amount of advertisements except containing main contents of webpages.Yet the information that some and webpage main contents have nothing to do not is that page viewers are desired preferentially to be seen.
For the cellphone subscriber, because mobile phone screen is narrow and small, the web page contents that can show is limited, if (for example: navigation bar contain the irrelevant information of a large amount of and main contents of webpages in the content of pages, advertisement columns etc.), so, the user just needs continuous page turning to seek the main contents of webpage.
If browser can directly navigate to the screen position main contents of the page after the load page content, this will improve user's experience greatly so.Wherein, the problem of most critical is to want to find main contents automatically on webpage, yet traditional implementation method based on statistics, artificial intelligence is very complicated, and effect and bad.
Summary of the invention:
The objective of the invention is provides a kind of method of automatically positioning main contents of webpages in order to overcome the existing problem that can't navigate to the main contents of webpage in the process of portable terminal browsing page a quick step.
The invention provides a kind of disposal route that is used for the automatically positioning main contents of webpages of mobile communication equipment terminals, comprising:
URL according to the target web page gets access to the target web page data and will resolve to first data with preset data structure;
URL according to the described target web page, from the database of server end, find in advance the URL of the similar but incomplete same similar web page page of the URL to the described target web page of storage, and obtain the similar web page page data according to the URL of the described similar web page page;
The described similar web page page data that obtains is resolved to second data with preset data structure;
Described first data and described second data are compared, and the data that the different pieces of information between described first data and described second data partly are defined as the main contents of the described target web page return to mobile communication equipment terminals.
Wherein, above-mentioned disposal route also comprises: with the data of described main contents mark in addition, and return the described target web page data handled through described mark to browser;
Described browser identifies the described mark in the described target web page data, shows described target web content of pages, and the screen position is navigated to described main contents place with the corresponding described target web page of described mark.
Wherein, in described similar web page page data obtaining step, when not having the URL that stores the described similar web page page in advance in the described database, the described target web page data that will obtain in described target web page data obtaining step is without the mark processing and directly be returned to browser, and shows the described target web page on described browser.
Wherein, in described similar web page page data obtaining step, when not having the URL that stores the described similar web page page in advance in the described database, deposit the URL of the described target web page in described database, and the described target web page data that will obtain in described target web page data obtaining step handles without mark and directly is returned to browser, and shows the described target web page on described browser.
The present invention also provides a kind of disposal system that is used for the automatically positioning main contents of webpages of mobile communication equipment terminals, comprising:
Target web page data acquisition module is used for getting access to the target web page data and will resolving to first data with preset data structure according to the URL of the target web page;
Similar web page page data acquisition module, be used for URL according to the described target web page, from the database of server end, find in advance the URL of the similar but incomplete same similar web page page of the URL to the described target web page of storage, and obtain the similar web page page data according to the URL of the described similar web page page;
Similar web page page data parsing module, the described similar web page page data that is used for obtaining resolves to second data with preset data structure;
The data identification module of main contents, be used for described first data and described second data are compared, and the data that the different pieces of information between described first data and described second data partly is defined as the main contents of the described target web page are returned to mobile communication equipment terminals.
Wherein, above-mentioned disposal system also comprises:
The data markers module of main contents is used for the data of described main contents mark in addition, and returns the described target web page data handled through described mark to browser;
Main contents locating and displaying module, when described browser identifies described mark in the described target web page data, show described target web content of pages, and the screen position is navigated to described main contents place with the corresponding described target web page of described mark.
Wherein, when not having the URL that stores the described similar web page page in advance in the described database, the described target web page data that will obtain in described target web page data acquisition module is without the mark processing and directly be returned to browser, and shows the described target web page on described browser.
Wherein, when not having the URL that stores the described similar web page page in advance in the described database, deposit the URL of the described target web page in described database, and the described target web page data that will obtain in described target web page data acquisition module handles without mark and directly is returned to browser, and shows the described target web page on described browser.
By implementing the present invention, make browser after page loading is finished, can navigate to the screen position main contents part of the corresponding web page page automatically, make things convenient for the user to read and check.And, on webpage, find the main contents technology to compare automatically with traditional method realization based on statistics, artificial intelligence, the present invention implements easily and is effective.
Description of drawings:
Fig. 1 is the block diagram of automatically positioning main contents of webpages disposal system;
Fig. 2 is the process flow diagram of automatically positioning main contents of webpages disposal route;
Fig. 3 is another process flow diagram of automatically positioning main contents of webpages disposal route;
Fig. 4 is another process flow diagram of automatically positioning main contents of webpages disposal route;
Fig. 5 is another process flow diagram of automatically positioning main contents of webpages disposal route.
Embodiment:
Describe the specific embodiment of the present invention in detail below in conjunction with accompanying drawing.
Because internet site uses template in a large number, on same website, the similar page of URL (URL(uniform resource locator), Uniform Resource Locator) all is the same column that belongs to the website usually, and has used same web page template.Therefore, in order to determine the main contents of a Webpage, at first need to obtain in advance the URL (for example http://www.uc.cn/a/news/2010/0404/1041.html) that a URL (for example http://www.uc.cn/a/news/2010/0524/1370.html) with this target web page (being the Webpage that the user will browse) has another Webpage of highest similarity (but must be incomplete same).Then, server all resolves to dom tree with them after obtaining the data of these two pages according to above-mentioned two Webpage URL, these two dom trees are compared again, determine different piece between two dom trees, above-mentioned different piece is exactly the main contents of the Webpage that will browse of user.
Below with reference to the accompanying drawings, the specific embodiment of the present invention is elaborated.
Fig. 1 shows a kind of block diagram of automatically positioning main contents of webpages disposal system.As shown in Figure 1, disposal system comprises thedata identification module 14 of target web pagedata acquisition module 11, similar web page pagedata acquisition module 12, similar web page pagedata parsing module 13, main contents, thedata markers module 15 and the main contents locating and displayingmodule 16 of main contents.Wherein, when the user imports the URL of the target web page in the browser of client (mobile communication equipment terminals), above-mentioned target web pagedata acquisition module 11 URL according to this target web page get access to the target web page data and it are resolved to have the preset data structure data (first data) of (for example dom tree structure), above-mentioned then similar web page pagedata acquisition module 12 URL according to the above-mentioned target web page, from the database of server end, find in advance the URL of another the most similar with it but incomplete same Webpage of storage, be the URL of the target web page, and obtain the page data of this similar web page according to the URL of this similar web page page.Above-mentioned similar web page pagedata parsing module 13 obtains above-mentioned similar web page pagedata acquisition module 12 accordingly from the internet, the page data similar to target web resolves to another data (second data) with preset data structure (for example dom tree structure), compare by above-mentioned two kinds of data of resolving above-mentioned corresponding page data and obtaining by 14 pairs of the data identification module of above-mentioned main contents again with preset data structure, determine both different pieces of information parts, described different pieces of information partly is the data of the main contents of the target web page, and by thedata markers module 15 of above-mentioned main contents with the data of above-mentioned main contents mark in addition, the above-mentioned target web page data that return is handled through above-mentioned mark is to client (mobile communication equipment terminals) browser, identify described mark in the described target web page data by above-mentioned main contents locating and displayingmodule 16 then, and show above-mentioned target web content of pages, and the screen position is navigated to described main contents place with the corresponding above-mentioned target web page of above-mentioned mark at client browser.
Fig. 2 shows the flow process of automatically positioning main contents of webpages.As shown in Figure 2, in step S11, when the user wants to browse the content of Webpage of certain website, the URL of this target web page of input in the browser of client (mobile communication equipment terminals), above-mentioned target web pagedata acquisition module 11 is according to the URL of the above-mentioned target web page of client input acquisition, and obtain corresponding web data (step S12) from above-mentioned website, and the above-mentioned web data that is obtained is resolved to have the preset data structure data (step S13) of (for example dom tree structure) according to the URL of the above-mentioned target web page that is obtained.
In step S14, above-mentioned similar web page pagedata acquisition module 12 is found the URL (being the URL of the similar web page page) of the most similar with it but incomplete same another Webpage according to the URL of the above-mentioned target web page from the database that is arranged in server end.Wherein, the URL of above-mentioned another Webpage is collected in advance and is stored in the corresponding database.
In step S15, above-mentioned similar web page pagedata parsing module 13 again according to the URL of above-mentioned another Webpage from the internet, obtain accordingly, the page data similar to target web, and it is resolved to the data that another has preset data structure (for example dom tree structure).
In step S16, thedata identification module 14 of above-mentioned main contents will compare by resolving above-mentioned two kinds of data with preset data structure that corresponding page data obtains, determine both different pieces of information parts, described different pieces of information partly is the data of the main contents of the target web page.
In step S17, thedata markers module 15 of above-mentioned main contents is carried out mark with the data of the main contents of the described target web page again, and the target web page data after will handling through mark be returned to client browser.The target web page data that promptly is returned to client browser is to comprise the page data that has the main contents mark.Wherein, after the target web page data that server will have a mark was returned to client browser, this browser carried out certain processing to above-mentioned target web page data, makes above-mentioned webpage data be suitable for showing on the browser of portable terminal.Make it meet the demonstration of the browser of portable terminal for processing how to carry out webpage data, the present invention limits especially, can make after treatment above-mentioned target web page data to show on the browser of mobile communication equipment terminals and get final product.
In step S18, above-mentioned main contents locating and displayingmodule 16 receives described target web page data, identify the page data of described mark, the display web page content of pages, and screen navigated to and the corresponding content of pages of the page data of described mark place, the automatic location of promptly having finished main contents of webpages.
In the above-described embodiment, in step S14, when above-mentioned similar web page pagedata acquisition module 12 can't find the URL of another Webpage similar but incomplete same to the URL of target web, need re-enter the URL of different target Webpage.As shown in Figure 3, step S21-23 is identical with step S11-13 respectively, and step S26-29 is identical with step S15-18 respectively.In step S24, above-mentioned similar web page pagedata acquisition module 12 URL according to the acquisition target web, whether inquiry exists the URL of another similar but incomplete same Webpage of the URL to the above-mentioned target web page of prior storage in server database.(step S25: be) enters step S26 when inquiring the URL that has above-mentioned another Webpage in server database.(step S25: not) when inquiring the URL that in server database, does not have above-mentioned another Webpage, return step S21, re-enter the URL of the target web page different, promptly import the URL of other target web page of the URL that is different from the target web page of importing last time with input last time.
In the above-described embodiment, as shown in Figure 4, step S31-34,36-39 are identical with step S21-24,26-29 respectively.In step S35, (step S35: be) enters step S36 when inquiring the URL that has above-mentioned another Webpage in server database.(step S35: not), the target web page data that will obtain in step S32 directly is returned to client browser and shows (step S310) when inquiring the URL that does not have above-mentioned another Webpage in server database.
Also there is another embodiment in the present invention, and as shown in Figure 5, step S41-44,46-49 are identical with step S31-34,36-39 respectively.In step S45, (step S45: be) enters step S46 when inquiring the URL that has above-mentioned another Webpage in server database.(step S45: not) when inquiring the URL that in server database, does not have above-mentioned another Webpage, the URL of the target web page that will import in step S41 deposits (step S410) in the above-mentioned server database in, and the target web page data that will obtain in step S42 directly is returned to client browser and shows (step S411) simultaneously.
Aforesaid detailed description and accompanying drawing are undertaken by literal interpretation and diagram, and its purpose does not lie in the protection domain that limits claim.Each mutation of embodiment in this instructions is apparent for those of ordinary skill, and is in the protection domain of claim and equivalent technologies thereof.

Claims (8)

CN 2010102150312010-07-012010-07-01Method and system for automatically positioning main contents of webpages for mobile communication equipment terminalPendingCN101866362A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN 201010215031CN101866362A (en)2010-07-012010-07-01Method and system for automatically positioning main contents of webpages for mobile communication equipment terminal

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN 201010215031CN101866362A (en)2010-07-012010-07-01Method and system for automatically positioning main contents of webpages for mobile communication equipment terminal

Publications (1)

Publication NumberPublication Date
CN101866362Atrue CN101866362A (en)2010-10-20

Family

ID=42958090

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN 201010215031PendingCN101866362A (en)2010-07-012010-07-01Method and system for automatically positioning main contents of webpages for mobile communication equipment terminal

Country Status (1)

CountryLink
CN (1)CN101866362A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102024028A (en)*2010-11-222011-04-20百度在线网络技术(北京)有限公司Method and equipment for distinctly displaying main contents of webpage on mobile terminal
WO2013017009A1 (en)*2011-08-022013-02-07百度在线网络技术(北京)有限公司Method for obtaining target page and equipment thereof
WO2014180227A1 (en)*2013-10-112014-11-13中兴通讯股份有限公司Method, device, terminal and computer storage medium for realizing intelligent reading of a browser
US9679076B2 (en)2014-03-242017-06-13Xiaomi Inc.Method and device for controlling page rollback
CN108600342A (en)*2018-03-302018-09-28连尚(新昌)网络科技有限公司A kind of message display method, equipment and storage medium
CN114117181A (en)*2022-01-252022-03-01北京金堤科技有限公司Website page turning logic acquisition method and device and website page turning control method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050273706A1 (en)*2000-08-242005-12-08Yahoo! Inc.Systems and methods for identifying and extracting data from HTML pages
CN1920815A (en)*2006-05-092007-02-28上海态格文化传播有限公司Web page cleaning method based on web page content
CN101441662A (en)*2008-11-282009-05-27北京交通大学Topic information acquisition method based on network topology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050273706A1 (en)*2000-08-242005-12-08Yahoo! Inc.Systems and methods for identifying and extracting data from HTML pages
CN1920815A (en)*2006-05-092007-02-28上海态格文化传播有限公司Web page cleaning method based on web page content
CN101441662A (en)*2008-11-282009-05-27北京交通大学Topic information acquisition method based on network topology

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102024028A (en)*2010-11-222011-04-20百度在线网络技术(北京)有限公司Method and equipment for distinctly displaying main contents of webpage on mobile terminal
CN102024028B (en)*2010-11-222014-04-02百度在线网络技术(北京)有限公司Method and equipment for distinctly displaying main contents of webpage on mobile terminal
WO2013017009A1 (en)*2011-08-022013-02-07百度在线网络技术(北京)有限公司Method for obtaining target page and equipment thereof
WO2014180227A1 (en)*2013-10-112014-11-13中兴通讯股份有限公司Method, device, terminal and computer storage medium for realizing intelligent reading of a browser
CN104572650A (en)*2013-10-112015-04-29中兴通讯股份有限公司Method and device for realizing browser intelligent reading and terminal comprising device
US9892099B2 (en)2013-10-112018-02-13Zte CorporationIntelligent reading for accessing multi-page data from a web browser
US9679076B2 (en)2014-03-242017-06-13Xiaomi Inc.Method and device for controlling page rollback
CN108600342A (en)*2018-03-302018-09-28连尚(新昌)网络科技有限公司A kind of message display method, equipment and storage medium
CN108600342B (en)*2018-03-302020-01-10连尚(新昌)网络科技有限公司Message display method, device and storage medium
CN114117181A (en)*2022-01-252022-03-01北京金堤科技有限公司Website page turning logic acquisition method and device and website page turning control method and device

Similar Documents

PublicationPublication DateTitle
US12124404B2 (en)Method of and system for enhanced local-device content discovery
US8103652B2 (en)Indexing explicitly-specified quick-link data for web pages
KR101667344B1 (en) Method and system for providing search results
CN101256596B (en)Method and system for instation guidance
CN101866362A (en)Method and system for automatically positioning main contents of webpages for mobile communication equipment terminal
US8150979B1 (en)Supporting multiple landing pages
CN103389983A (en)Webpage content grabbing method and device applied to network crawler system
US20100169756A1 (en)Automated bookmarking
US20190235721A1 (en)Flexible content organization and retrieval
CN102306171A (en)Method and equipment for providing network access suggestions and network search suggestions
US20120054669A1 (en)Method and system for providing enhanced user interfaces for web browsing
CN101894138B (en)Visual page content subscription processing method and system thereof
US20090259649A1 (en)System and method for detecting templates of a website using hyperlink analysis
KR102214990B1 (en)System for providing bookmark management and information searching service and method for providing bookmark management and information searching service using it
US20110225134A1 (en)System and method for enhanced find-in-page functions in a web browser
CN102622402B (en)Server, method and system for providing information search service by using sheaf of pages
KR101637016B1 (en)Method for providing user reaction web page
CN103778156A (en)Method and device for searching for data and server for data search
US9043320B2 (en)Enhanced find-in-page functions in a web browser
KR20110069018A (en) Indexing system
CN103377246A (en)Bookmark processing method and terminal browser
CN107544994B (en)Associated data processing method and device
CN103092937A (en)Visualization webpage recording detection method
JP2010231592A (en) Search server and search method
CN104484415A (en)E-book supplying method and e-book supplying device

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C12Rejection of a patent application after its publication
RJ01Rejection of invention patent application after publication

Application publication date:20101020


[8]ページ先頭

©2009-2025 Movatter.jp