Movatterモバイル変換


[0]ホーム

URL:


CN102436455B - Realize method, system and client browser that word browses - Google Patents

Realize method, system and client browser that word browses
Download PDF

Info

Publication number
CN102436455B
CN102436455BCN201010501897.8ACN201010501897ACN102436455BCN 102436455 BCN102436455 BCN 102436455BCN 201010501897 ACN201010501897 ACN 201010501897ACN 102436455 BCN102436455 BCN 102436455B
Authority
CN
China
Prior art keywords
webpage
text information
character
resource
browsing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010501897.8A
Other languages
Chinese (zh)
Other versions
CN102436455A (en
Inventor
严峻
李鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co LtdfiledCriticalTencent Technology Shenzhen Co Ltd
Priority to CN201010501897.8ApriorityCriticalpatent/CN102436455B/en
Publication of CN102436455ApublicationCriticalpatent/CN102436455A/en
Application grantedgrantedCritical
Publication of CN102436455BpublicationCriticalpatent/CN102436455B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention provides and a kind of realize method, system and the client browser that word browses, including according to the Resource Properties in webpage, distinguish Word message and non-legible information, the non-legible information in shielding web page, and show Word message.Wherein the Resource Properties in webpage can be label or request resource type or the nodal community etc. of dom tree.Pass through the present invention program, shield other web page resources outside Word message resource, achieve word browse mode, improve webpage rendering speed and user's surfing, particularly when user's browse network Word message, avoid other network resource informations and user is browsed the interference of Word message, quickly and compactly achieve web page browsing.

Description

Method, system and client browser for realizing character browsing
Technical Field
The present invention relates to web browsing technologies, and in particular, to a method, a system, and a client browser for implementing text browsing.
Background
The browser has multiple browsing modes, and different browsing modes correspondingly optimize different requirements of users, so that the users can browse webpages better. The browsing modes of the existing browser are roughly defined as a full browsing mode, a safe browsing mode, a text browsing mode and a custom shielding mode. Wherein,
in the full browsing mode, for any content download of the webpage, the operation of any script has no any shielding and runs according to the default rule completely; in the safe browsing mode, in order to protect the local computer from being threatened by a webpage, malicious plug-in downloading and JavaScript operation are selectively shielded; in the character browsing mode, pictures, videos, Flash and the like are not displayed, sound is not played, all other resources except characters required to be displayed by the webpage are shielded, and webpage content is displayed in a simple formatting mode, so that the webpage becomes cleaner and browsing becomes faster; in the custom browsing mode, based on the options provided by the browsing mode, the custom browsing mode enables the user to better select the browsing mode required by the user, such as: inhibit (or allow) Flash download and play, inhibit (or allow) picture download and display, inhibit (or allow) video download and play, inhibit (or allow) sound download and play, inhibit (or allow) web scripts run, inhibit (or allow) Java applets run, and the like.
Besides the text information, the web page resources also include many other contents, such as pictures, sound, video, Flash, and the like. At present, most domestic browsers realize a safe browsing mode, intercept advertisements and the like. Malicious scripts, advertisements, and plug-ins can be masked from running. For simplicity, most browsers offer a browsing mode that is mainly to directly shield JavaScript, plug-in execution, or verify advertisements by specific script string matching, and shield related presentations.
Based on the current browser technology maturity, the diversity of webpage display, the existence of various system bugs and other reasons, the operation of the plug-ins and scripts and the interception of advertisements are not intelligent enough, and the capability is extremely limited. Such as: the existing advertisement blocking is realized based on the shielding of the pop-up window and the mouse operation of the user according to the pop-up window or through the shielding of the URL blacklist, and for the advertisement which is not in the pop-up window and is not in the blacklist, the existing method can not correctly identify the related content, so that the advertisement which the user wants to block is still displayed, and thus, the downloading of useless resources can not be thoroughly shielded.
At present, the network speed is not a problem, but as the network speed increases, the content of the web page is richer and the injection amount of the spam information is more and more, such as most web page advertisements which are not concerned by the user. When a user only needs to browse text contents, a great amount of resources such as pictures, videos and audios loaded in the full browsing mode, particularly advertisements, affect the webpage rendering speed and the user browsing speed. Particularly, when a user accesses a website such as a novel, the user only cares about the text content of the novel, but is not concerned about other contents such as pictures and advertisements, and the contents such as the pictures and the advertisements can be complicated web pages, so that the user can be disturbed to browse the text information.
For the text browsing mode, no specific implementation method is provided in the prior art.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a method, a system and a client browser for implementing text browsing, which can implement a text browsing mode and quickly and simply implement web browsing.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method for realizing text browsing comprises the following steps:
distinguishing character information and non-character information according to the resource attribute in the webpage;
and shielding non-text information in the webpage and displaying the text information.
The distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps:
and in the process of browsing the webpage, distinguishing the text information from the non-text information according to the analyzed label of the webpage file.
The method further comprises the following steps: and presetting the level of each label, and setting the shielding degree of other resources browsed by characters according to the level of the label.
The shielding of the non-text information in the webpage comprises: and shielding the labels corresponding to the non-character information according to the distinguished labels, and browsing the modified webpage file again.
The distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps:
in the process of browsing the webpage, when the network resources related to the labels in the webpage file are downloaded, whether the request is the text information resource or the non-text information resource is distinguished according to the identified type of the request resource.
The method further comprises the following steps: the grade of the type of each resource is preset, and the shielding degree of other resources for text browsing is set according to the grade of the type.
The shielding of the non-text information in the webpage comprises: and shielding the request of the non-character resource information according to the distinguished type of the requested resource.
The shielding of the request for the non-literal resource information comprises: and selectively shielding the selected requests for other resources except the text resource information according to a preset strategy.
The distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps: and distinguishing the text information and the non-text information according to the node attribute of the Document Object Model (DOM) tree.
The shielding of the request for the non-literal resource information comprises: deleting other resource nodes except the text nodes according to the distinguished nodes with different attributes; or,
and changing the attributes of the distinguished resource nodes except the character nodes to shield the resource nodes.
A character browsing system at least comprises a web server and a client browser, wherein,
the client browser is used for requesting a webpage file from the webpage server and distinguishing character information and non-character information according to the resource attribute in the webpage; shielding non-text information in the webpage and displaying the text information;
and the webpage server is used for providing a webpage file according to the request of the client browser.
The client browser comprises a request module, an analysis module and a display module, wherein,
the request module is used for requesting the webpage file from the webpage server and outputting the webpage file to the analysis module;
the analysis module is used for analyzing the webpage file from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage and shielding the non-text information in the webpage;
and the display module is used for displaying the text information.
The analysis module is specifically used for distinguishing the text information and the non-text information according to the analyzed label and shielding the non-text information in the webpage; or,
the analysis module is specifically used for identifying the type of the requested resource, distinguishing whether the requested resource is a text information resource or a non-text information resource and shielding the request for the non-text information resource when the network resource related to the label is downloaded; or,
and the analysis module is specifically used for distinguishing the text information from the non-text information according to the structure and the attribute of the adjusted DOM tree and shielding the non-text information nodes in the webpage.
A client browser comprises a request module, an analysis module and a display module, wherein,
the request module is used for requesting the webpage file from the webpage server and outputting the webpage file to the analysis module;
the analysis module is used for analyzing the webpage file from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage and shielding the non-text information in the webpage;
and the display module is used for displaying the text information.
The analysis module is specifically used for distinguishing the text information and the non-text information according to the analyzed label and shielding the non-text information in the webpage; or,
the analysis module is specifically used for identifying the type of the requested resource, distinguishing whether the requested resource is a text information resource or a non-text information resource and shielding the request for the non-text information resource when the network resource related to the label is downloaded; or,
and the analysis module is specifically used for distinguishing the text information from the non-text information according to the structure and the attribute of the adjusted DOM tree and shielding the non-text information nodes in the webpage.
According to the technical scheme provided by the invention, the text information and the non-text information are distinguished according to the resource attribute in the webpage, the non-text information in the webpage is shielded, and the text information is displayed. Wherein the resource attribute in the web page may be a tag, or a requested resource type, or a node attribute of the DOM tree, etc. By the scheme of the invention, other webpage resources except the text information resource are shielded, the text browsing mode is realized, the webpage rendering speed and the user browsing speed are improved, particularly when the user browses the network text information, the interference of other network resource information on the user browsing text information is avoided, and the webpage browsing is quickly and simply realized.
Drawings
FIG. 1 is a flow chart of a method for implementing text browsing in accordance with the present invention;
FIG. 2 is a schematic diagram of the structure of the system for browsing text according to the present invention;
FIG. 3 is a diagram of a DOM tree model of a typical HTML document.
Detailed Description
Web pages are primarily written in HyperText markup Language (HTML), also known as HyperText markup Language. HTML text is descriptive text consisting of HTML commands that can specify words, graphics, sounds, tables, links, graphics, and the like. The HTML structure includes two major parts, a header (Head) which describes information required by the browser, and a Body (Body) which contains specific content to be specified. The browser parses the web page through Tags (Tags) contained in the HTML and presents the web page to the user. Here, as one skilled in the art will appreciate, tags are present from beginning to end of each segment, such as when a web page is opened and its source code is viewed, the content inside two sharp brackets in the source code, i.e., "< >" is a tag.
Besides the HTML language, the web page can assist and optimize the web page display effect through languages such as Javascript, Cascading Style sheets (CSS, Cascading Style sheets), and the like, wherein the Javascript language is an object-oriented dynamic type script language inherited by prototypes, and can increase the interactivity of the web page, dynamically update the content of the web page, and the like; CSS is a set of format setting rules that control the appearance of a Web page.
The existing web browsing process roughly includes:
firstly, a user types a website or clicks a link to request to open a webpage; the client analyzes the domain name and acquires a series of network interactions such as IP (Internet protocol) through an HTTP (hyper text transport protocol) protocol to find a corresponding webpage server;
then, the web server analyzes the requested web page and transmits the file written in the HTML language to the client browser; the client browser obtains all content from the web server and parses the HTML file.
A standard HTML file comprises character information and other resource information such as images and videos, and the client browser requests the server to acquire the other resource information such as the images and the videos again according to the result analyzed by the HTML. Meanwhile, the client browser renders the text information contained in the HTML file on a screen. After obtaining the resources such as images and videos every time, the client browser updates the downloaded contents such as images and videos at the relevant position of the screen, and continuously requests the resources which are not downloaded at the same time until the downloading of the webpage is completed or the user selects to stop downloading.
Fig. 1 is a flowchart of a method for implementing text browsing, as shown in fig. 1, including the following steps:
step 100: and distinguishing the text information and the non-text information according to the resource attribute in the webpage.
The concrete implementation of the step comprises the following three modes:
the first mode is as follows: in the process of browsing the web page, after the client browser obtains the web page file, such as a file written by an HTML language, the HTML file is analyzed and analyzed, and the text information and the non-text information are distinguished according to the analyzed tag. The label corresponding to the non-text information may include but is not limited to: image-related labeling: < img src ═ URL >, sound-related label: < voc src ═ URL "> and the like.
Further, the method also comprises the following steps: and self-defining each label level, and setting the shielding degree of other resources for text browsing according to the label level. For example, it is assumed that the browser only has three tags, such as text, picture, Flash, etc., and the priorities of the three tags are set to 1, 2, and 3 levels respectively (assuming that level 3 is the highest, level 2 is the lowest, and level 1 is the lowest). If the level of the client browser is set to be 2, the client browser can only display the content, namely the characters and the pictures, corresponding to the tags with the level less than or equal to 2, and the tags related to Flash are shielded.
The second mode is as follows: in the process of browsing a webpage, after a client browser obtains a file written in an HTML language, before or during rendering text information, the client browser downloads network resources related to a label, identifies the type of the requested resource, and distinguishes whether the requested resource is a text information resource or a non-text information resource according to the identified type, for example, the type of an image resource is: image/jpeg.
Further, the method also comprises the following steps: and self-defining the grade of the type of each resource, and setting the shielding degree of other resources for text browsing according to the grade of the type.
The third mode is as follows: and distinguishing the text information and the non-text information according to the node attribute of the DOM tree. Wherein,
the Document Object Model (DOM) is a programming interface for HTML and XML documents. A DOM is typically a tree structure that provides a structured representation of a document that can change the content and presentation of the document.
FIG. 3 is a diagram illustrating a DOM tree model of a typical HTML document, as shown in FIG. 3, where each web page element (e.g., an HTML tag) corresponds to an object (object). Tags on a web page are nested layer by layer, with the outermost layer being < HTML >, and the DOM is so nested, but is generally understood to be in the shape of a tree. The tree root (also called root object) is a window or document object, which corresponds to the outer periphery of the tag of the outermost layer, i.e., the entire document; the tree root is the object at the level of the child, the child object also has its own child object, except the root object, all objects have their own parent objects, and the child objects of the same object have a sibling relationship.
In the DOM tree, all elements in the HTML document are a node. All documents are a document node. Such as the "Document" node of fig. 3. Each element in the HTML element is called an element node. Each text in the HTML element is a text node. As in "Text" of fig. 3: the "My link" node. Each HTML attribute is an attribute node. As "Attribute" of fig. 3: "href" node. Each annotation is an annotation node.
All characters have a character node. In addition, the DOM tree provides a number of flexible methods, including adding or deleting children nodes, obtaining or modifying attributes of each node, and so forth. In this step, the text information and the non-text information are distinguished by the node attribute of the DOM tree.
Step 101: and shielding non-text information in the webpage and displaying the text information.
The implementation corresponding to step 100 includes:
the first mode is as follows: and shielding (or deleting) the label corresponding to the non-character information according to the distinguished label, and then re-browsing the modified HTML file.
The second mode is as follows: the request for non-textual resource information is masked (or deleted) based on the distinguished type of resource requested.
The third mode is as follows: deleting other resource nodes except the text nodes according to the distinguished nodes with different attributes; or changing the attribute of other resource nodes except the literal node to shield the resource node.
By the method, the text information required by the text browsing mode is screened out, and then the required text information is displayed by the client browser. Many existing kernels can render the web page contents, such as the Trident kernel used by an IE browser, the Gecko kernel used by FireFox, the Presto kernel used by Opera, and the Webkit kernel used by Safari, Chrome, etc.; by calling the rendering interfaces of the kernels, the display of the character browsing mode is realized, and the display effect of the character browsing mode is adjusted in a personalized manner. The specific implementation of the display is conventional to those skilled in the art and will not be described further herein.
Fig. 2 is a schematic structural diagram of a system for implementing text browsing according to the present invention, as shown in fig. 2, including a web server and a client browser, wherein,
the client browser is used for requesting a webpage file from the webpage server and distinguishing character information and non-character information according to the resource attribute in the webpage; shielding non-text information in the webpage and displaying the text information;
and the webpage server is used for providing a webpage file according to the request of the client browser.
The client browser comprises a request module, an analysis module and a display module, wherein,
and the request module is used for requesting the webpage file from the webpage server and outputting the webpage file to the analysis module.
And the analysis module is used for analyzing the webpage file from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage and shielding the non-text information in the webpage.
And the display module is used for displaying the text information.
The analysis module is specifically used for distinguishing the text information and the non-text information according to the analyzed tag and shielding the non-text information in the webpage. Or,
and the analysis module is specifically used for identifying the type of the requested resource when the network resource related to the label is downloaded, distinguishing whether the request is a text information resource or a non-text information resource, and shielding the request for the non-text information resource. Or,
and the analysis module is specifically used for distinguishing the text information from the non-text information according to the structure and the attribute of the adjusted DOM tree and shielding the non-text information nodes in the webpage.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. that are within the spirit and principle of the present invention should be included in the present invention.

Claims (5)

the analysis module is used for analyzing the webpage from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage, and shielding or deleting the non-text information in the webpage, wherein the resource attribute in the webpage comprises a label corresponding to the non-text information; the label corresponding to the non-text information comprises: image-related labeling: < img src ═ URL ">, sound-related label: < voc src ═ URL' >; the distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps: in the process of browsing the webpage, distinguishing character information and non-character information according to the label corresponding to the non-character information of the analyzed webpage file;
CN201010501897.8A2010-09-292010-09-29Realize method, system and client browser that word browsesActiveCN102436455B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201010501897.8ACN102436455B (en)2010-09-292010-09-29Realize method, system and client browser that word browses

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201010501897.8ACN102436455B (en)2010-09-292010-09-29Realize method, system and client browser that word browses

Publications (2)

Publication NumberPublication Date
CN102436455A CN102436455A (en)2012-05-02
CN102436455Btrue CN102436455B (en)2016-12-07

Family

ID=45984521

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201010501897.8AActiveCN102436455B (en)2010-09-292010-09-29Realize method, system and client browser that word browses

Country Status (1)

CountryLink
CN (1)CN102436455B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102722563B (en)*2012-05-312014-12-03优视科技有限公司Method and device for displaying page
CN102955852A (en)*2012-11-012013-03-06北京小米科技有限责任公司Method, device and equipment for webpage resource processing
CN102999636B (en)*2012-12-192016-11-16北京奇虎科技有限公司 Method and browser for intercepting and processing pop-up windows in web pages
CN103034727A (en)*2012-12-192013-04-10北京奇虎科技有限公司System for intercepting pop-up window in webpage
CN103605688B (en)*2013-11-012017-05-10北京奇虎科技有限公司Intercept method and intercept device for homepage advertisements and browser
CN104133865A (en)*2014-07-172014-11-05可牛网络技术(北京)有限公司Advertisement filtering method and device
CN106202072B (en)*2015-04-292019-12-03阿里巴巴集团控股有限公司The method and apparatus of display content are provided
US10423711B2 (en)*2015-10-232019-09-24Oracle International CorporationGenerating style sheets during runtime
CN105335340B (en)*2015-11-132018-04-10天脉聚源(北京)传媒科技有限公司A kind of font interpretation method and device
CN105956026B (en)*2016-04-222019-08-02北京小米移动软件有限公司Webpage rendering method and device
CN108073647B (en)*2016-11-142020-06-30腾讯科技(深圳)有限公司Webpage display method and device
CN106775971B (en)*2016-12-022020-01-31杭州中天微系统有限公司Data processing apparatus
US20180217964A1 (en)*2017-02-022018-08-02Futurewei Technologies, Inc.Content-aware energy savings for web browsing utilizing selective loading priority
CN108874393B (en)*2018-06-062021-07-02腾讯科技(深圳)有限公司Rendering method, rendering device, storage medium and computer equipment
CN110177089A (en)*2019-05-202019-08-27维沃移动通信有限公司A kind of page access method and terminal device
CN111274519A (en)*2020-01-202020-06-12杭州熊猫智云企业服务有限公司Page loading speed-up method

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101089856A (en)*2007-07-202007-12-19李沫南Method for abstracting network data and web reptile system
CN101706796A (en)*2008-11-142010-05-12北京搜狗科技发展有限公司Method and device for showing webpage resources

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101089856A (en)*2007-07-202007-12-19李沫南Method for abstracting network data and web reptile system
CN101706796A (en)*2008-11-142010-05-12北京搜狗科技发展有限公司Method and device for showing webpage resources

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
网页文本信息提取及结果评价;张恒等;《微计算机应用》;20070930;第28卷(第9期);摘要,第922页第3-4行,第922页第倒数9-11行*

Also Published As

Publication numberPublication date
CN102436455A (en)2012-05-02

Similar Documents

PublicationPublication DateTitle
CN102436455B (en)Realize method, system and client browser that word browses
US10185704B2 (en)Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal
CN103605688B (en)Intercept method and intercept device for homepage advertisements and browser
US10956531B2 (en)Dynamic generation of mobile web experience
CN104021172B (en)Advertisement filter method and advertisement filter device
JP6051337B2 (en) Client-side page processing
US7865544B2 (en)Method and system for providing XML-based asynchronous and interactive feeds for web applications
US7631260B1 (en)Application modification based on feed content
US10599727B2 (en)Transcoding and serving resources
US10387535B2 (en)System and method for selectively displaying web page elements
US20040100501A1 (en)Method of dragging and dropping defined objects to or from a web page
US20100299589A1 (en)Keyword display method and keyword display system
US20140101539A1 (en)Website presenting method and browser
JP2013517556A (en) Preview functionality for increased browsing speed
TW201003438A (en)Method and system to selectively secure the display of advertisements on web browsers
US20110314368A1 (en)Method to Generate a Software Part of a Web Page and Such Software Part
CN103581232B (en)Web page transmission, web page display device and comprise the system of this device
CN105094804A (en)Method and apparatus for adding animation to page
US10282172B2 (en)Authoring and deploying television apps and pages in a content management system
CN104899212B (en)Web page display method, server and system
Artail et al.Device-aware desktop web page transformation for rendering on handhelds
AryalDesign principles for responsive web
Krishnakumar et al.Greasemonkey Script Extensions to Web Pages
CN108959325A (en)Uniform resource locator methods of exhibiting, information expression method and its Related product
TW200951737A (en)Method for selecting an object from a web page

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp