CN102436455B

Movatterモバイル変換

Info

Publication number: CN102436455B
Application number: CN201010501897.8A
Authority: CN
Inventors: 严峻; 李鹤
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2010-09-29
Filing date: 2010-09-29
Publication date: 2016-12-07
Anticipated expiration: 2030-09-29
Also published as: CN102436455A

Abstract

The invention provides and a kind of realize method, system and the client browser that word browses, including according to the Resource Properties in webpage, distinguish Word message and non-legible information, the non-legible information in shielding web page, and show Word message.Wherein the Resource Properties in webpage can be label or request resource type or the nodal community etc. of dom tree.Pass through the present invention program, shield other web page resources outside Word message resource, achieve word browse mode, improve webpage rendering speed and user's surfing, particularly when user's browse network Word message, avoid other network resource informations and user is browsed the interference of Word message, quickly and compactly achieve web page browsing.

Description

Method, system and client browser for realizing character browsing

Technical Field

The present invention relates to web browsing technologies, and in particular, to a method, a system, and a client browser for implementing text browsing.

Background

The browser has multiple browsing modes, and different browsing modes correspondingly optimize different requirements of users, so that the users can browse webpages better. The browsing modes of the existing browser are roughly defined as a full browsing mode, a safe browsing mode, a text browsing mode and a custom shielding mode. Wherein,

in the full browsing mode, for any content download of the webpage, the operation of any script has no any shielding and runs according to the default rule completely; in the safe browsing mode, in order to protect the local computer from being threatened by a webpage, malicious plug-in downloading and JavaScript operation are selectively shielded; in the character browsing mode, pictures, videos, Flash and the like are not displayed, sound is not played, all other resources except characters required to be displayed by the webpage are shielded, and webpage content is displayed in a simple formatting mode, so that the webpage becomes cleaner and browsing becomes faster; in the custom browsing mode, based on the options provided by the browsing mode, the custom browsing mode enables the user to better select the browsing mode required by the user, such as: inhibit (or allow) Flash download and play, inhibit (or allow) picture download and display, inhibit (or allow) video download and play, inhibit (or allow) sound download and play, inhibit (or allow) web scripts run, inhibit (or allow) Java applets run, and the like.

Besides the text information, the web page resources also include many other contents, such as pictures, sound, video, Flash, and the like. At present, most domestic browsers realize a safe browsing mode, intercept advertisements and the like. Malicious scripts, advertisements, and plug-ins can be masked from running. For simplicity, most browsers offer a browsing mode that is mainly to directly shield JavaScript, plug-in execution, or verify advertisements by specific script string matching, and shield related presentations.

Based on the current browser technology maturity, the diversity of webpage display, the existence of various system bugs and other reasons, the operation of the plug-ins and scripts and the interception of advertisements are not intelligent enough, and the capability is extremely limited. Such as: the existing advertisement blocking is realized based on the shielding of the pop-up window and the mouse operation of the user according to the pop-up window or through the shielding of the URL blacklist, and for the advertisement which is not in the pop-up window and is not in the blacklist, the existing method can not correctly identify the related content, so that the advertisement which the user wants to block is still displayed, and thus, the downloading of useless resources can not be thoroughly shielded.

At present, the network speed is not a problem, but as the network speed increases, the content of the web page is richer and the injection amount of the spam information is more and more, such as most web page advertisements which are not concerned by the user. When a user only needs to browse text contents, a great amount of resources such as pictures, videos and audios loaded in the full browsing mode, particularly advertisements, affect the webpage rendering speed and the user browsing speed. Particularly, when a user accesses a website such as a novel, the user only cares about the text content of the novel, but is not concerned about other contents such as pictures and advertisements, and the contents such as the pictures and the advertisements can be complicated web pages, so that the user can be disturbed to browse the text information.

For the text browsing mode, no specific implementation method is provided in the prior art.

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide a method, a system and a client browser for implementing text browsing, which can implement a text browsing mode and quickly and simply implement web browsing.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a method for realizing text browsing comprises the following steps:

distinguishing character information and non-character information according to the resource attribute in the webpage;

and shielding non-text information in the webpage and displaying the text information.

The distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps:

and in the process of browsing the webpage, distinguishing the text information from the non-text information according to the analyzed label of the webpage file.

The method further comprises the following steps: and presetting the level of each label, and setting the shielding degree of other resources browsed by characters according to the level of the label.

The shielding of the non-text information in the webpage comprises: and shielding the labels corresponding to the non-character information according to the distinguished labels, and browsing the modified webpage file again.

in the process of browsing the webpage, when the network resources related to the labels in the webpage file are downloaded, whether the request is the text information resource or the non-text information resource is distinguished according to the identified type of the request resource.

The method further comprises the following steps: the grade of the type of each resource is preset, and the shielding degree of other resources for text browsing is set according to the grade of the type.

The shielding of the non-text information in the webpage comprises: and shielding the request of the non-character resource information according to the distinguished type of the requested resource.

The shielding of the request for the non-literal resource information comprises: and selectively shielding the selected requests for other resources except the text resource information according to a preset strategy.

The distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps: and distinguishing the text information and the non-text information according to the node attribute of the Document Object Model (DOM) tree.

The shielding of the request for the non-literal resource information comprises: deleting other resource nodes except the text nodes according to the distinguished nodes with different attributes; or,

and changing the attributes of the distinguished resource nodes except the character nodes to shield the resource nodes.

A character browsing system at least comprises a web server and a client browser, wherein,

the client browser is used for requesting a webpage file from the webpage server and distinguishing character information and non-character information according to the resource attribute in the webpage; shielding non-text information in the webpage and displaying the text information;

and the webpage server is used for providing a webpage file according to the request of the client browser.

The client browser comprises a request module, an analysis module and a display module, wherein,

the request module is used for requesting the webpage file from the webpage server and outputting the webpage file to the analysis module;

the analysis module is used for analyzing the webpage file from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage and shielding the non-text information in the webpage;

and the display module is used for displaying the text information.

The analysis module is specifically used for distinguishing the text information and the non-text information according to the analyzed label and shielding the non-text information in the webpage; or,

the analysis module is specifically used for identifying the type of the requested resource, distinguishing whether the requested resource is a text information resource or a non-text information resource and shielding the request for the non-text information resource when the network resource related to the label is downloaded; or,

and the analysis module is specifically used for distinguishing the text information from the non-text information according to the structure and the attribute of the adjusted DOM tree and shielding the non-text information nodes in the webpage.

A client browser comprises a request module, an analysis module and a display module, wherein,

and the display module is used for displaying the text information.

According to the technical scheme provided by the invention, the text information and the non-text information are distinguished according to the resource attribute in the webpage, the non-text information in the webpage is shielded, and the text information is displayed. Wherein the resource attribute in the web page may be a tag, or a requested resource type, or a node attribute of the DOM tree, etc. By the scheme of the invention, other webpage resources except the text information resource are shielded, the text browsing mode is realized, the webpage rendering speed and the user browsing speed are improved, particularly when the user browses the network text information, the interference of other network resource information on the user browsing text information is avoided, and the webpage browsing is quickly and simply realized.

Drawings

FIG. 1 is a flow chart of a method for implementing text browsing in accordance with the present invention;

FIG. 2 is a schematic diagram of the structure of the system for browsing text according to the present invention;

FIG. 3 is a diagram of a DOM tree model of a typical HTML document.

Detailed Description

Web pages are primarily written in HyperText markup Language (HTML), also known as HyperText markup Language. HTML text is descriptive text consisting of HTML commands that can specify words, graphics, sounds, tables, links, graphics, and the like. The HTML structure includes two major parts, a header (Head) which describes information required by the browser, and a Body (Body) which contains specific content to be specified. The browser parses the web page through Tags (Tags) contained in the HTML and presents the web page to the user. Here, as one skilled in the art will appreciate, tags are present from beginning to end of each segment, such as when a web page is opened and its source code is viewed, the content inside two sharp brackets in the source code, i.e., "< >" is a tag.

Besides the HTML language, the web page can assist and optimize the web page display effect through languages such as Javascript, Cascading Style sheets (CSS, Cascading Style sheets), and the like, wherein the Javascript language is an object-oriented dynamic type script language inherited by prototypes, and can increase the interactivity of the web page, dynamically update the content of the web page, and the like; CSS is a set of format setting rules that control the appearance of a Web page.

The existing web browsing process roughly includes:

firstly, a user types a website or clicks a link to request to open a webpage; the client analyzes the domain name and acquires a series of network interactions such as IP (Internet protocol) through an HTTP (hyper text transport protocol) protocol to find a corresponding webpage server;

then, the web server analyzes the requested web page and transmits the file written in the HTML language to the client browser; the client browser obtains all content from the web server and parses the HTML file.

A standard HTML file comprises character information and other resource information such as images and videos, and the client browser requests the server to acquire the other resource information such as the images and the videos again according to the result analyzed by the HTML. Meanwhile, the client browser renders the text information contained in the HTML file on a screen. After obtaining the resources such as images and videos every time, the client browser updates the downloaded contents such as images and videos at the relevant position of the screen, and continuously requests the resources which are not downloaded at the same time until the downloading of the webpage is completed or the user selects to stop downloading.

Fig. 1 is a flowchart of a method for implementing text browsing, as shown in fig. 1, including the following steps:

step 100: and distinguishing the text information and the non-text information according to the resource attribute in the webpage.

The concrete implementation of the step comprises the following three modes:

the first mode is as follows: in the process of browsing the web page, after the client browser obtains the web page file, such as a file written by an HTML language, the HTML file is analyzed and analyzed, and the text information and the non-text information are distinguished according to the analyzed tag. The label corresponding to the non-text information may include but is not limited to: image-related labeling: < img src ═ URL >, sound-related label: < voc src ═ URL "> and the like.

Further, the method also comprises the following steps: and self-defining each label level, and setting the shielding degree of other resources for text browsing according to the label level. For example, it is assumed that the browser only has three tags, such as text, picture, Flash, etc., and the priorities of the three tags are set to 1, 2, and 3 levels respectively (assuming that level 3 is the highest, level 2 is the lowest, and level 1 is the lowest). If the level of the client browser is set to be 2, the client browser can only display the content, namely the characters and the pictures, corresponding to the tags with the level less than or equal to 2, and the tags related to Flash are shielded.

The second mode is as follows: in the process of browsing a webpage, after a client browser obtains a file written in an HTML language, before or during rendering text information, the client browser downloads network resources related to a label, identifies the type of the requested resource, and distinguishes whether the requested resource is a text information resource or a non-text information resource according to the identified type, for example, the type of an image resource is: image/jpeg.

Further, the method also comprises the following steps: and self-defining the grade of the type of each resource, and setting the shielding degree of other resources for text browsing according to the grade of the type.

The third mode is as follows: and distinguishing the text information and the non-text information according to the node attribute of the DOM tree. Wherein,

the Document Object Model (DOM) is a programming interface for HTML and XML documents. A DOM is typically a tree structure that provides a structured representation of a document that can change the content and presentation of the document.

FIG. 3 is a diagram illustrating a DOM tree model of a typical HTML document, as shown in FIG. 3, where each web page element (e.g., an HTML tag) corresponds to an object (object). Tags on a web page are nested layer by layer, with the outermost layer being < HTML >, and the DOM is so nested, but is generally understood to be in the shape of a tree. The tree root (also called root object) is a window or document object, which corresponds to the outer periphery of the tag of the outermost layer, i.e., the entire document; the tree root is the object at the level of the child, the child object also has its own child object, except the root object, all objects have their own parent objects, and the child objects of the same object have a sibling relationship.

In the DOM tree, all elements in the HTML document are a node. All documents are a document node. Such as the "Document" node of fig. 3. Each element in the HTML element is called an element node. Each text in the HTML element is a text node. As in "Text" of fig. 3: the "My link" node. Each HTML attribute is an attribute node. As "Attribute" of fig. 3: "href" node. Each annotation is an annotation node.

All characters have a character node. In addition, the DOM tree provides a number of flexible methods, including adding or deleting children nodes, obtaining or modifying attributes of each node, and so forth. In this step, the text information and the non-text information are distinguished by the node attribute of the DOM tree.

Step 101: and shielding non-text information in the webpage and displaying the text information.

The implementation corresponding to step 100 includes:

the first mode is as follows: and shielding (or deleting) the label corresponding to the non-character information according to the distinguished label, and then re-browsing the modified HTML file.

The second mode is as follows: the request for non-textual resource information is masked (or deleted) based on the distinguished type of resource requested.

The third mode is as follows: deleting other resource nodes except the text nodes according to the distinguished nodes with different attributes; or changing the attribute of other resource nodes except the literal node to shield the resource node.

By the method, the text information required by the text browsing mode is screened out, and then the required text information is displayed by the client browser. Many existing kernels can render the web page contents, such as the Trident kernel used by an IE browser, the Gecko kernel used by FireFox, the Presto kernel used by Opera, and the Webkit kernel used by Safari, Chrome, etc.; by calling the rendering interfaces of the kernels, the display of the character browsing mode is realized, and the display effect of the character browsing mode is adjusted in a personalized manner. The specific implementation of the display is conventional to those skilled in the art and will not be described further herein.

Fig. 2 is a schematic structural diagram of a system for implementing text browsing according to the present invention, as shown in fig. 2, including a web server and a client browser, wherein,

and the request module is used for requesting the webpage file from the webpage server and outputting the webpage file to the analysis module.

And the analysis module is used for analyzing the webpage file from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage and shielding the non-text information in the webpage.

And the display module is used for displaying the text information.

The analysis module is specifically used for distinguishing the text information and the non-text information according to the analyzed tag and shielding the non-text information in the webpage. Or,

and the analysis module is specifically used for identifying the type of the requested resource when the network resource related to the label is downloaded, distinguishing whether the request is a text information resource or a non-text information resource, and shielding the request for the non-text information resource. Or,

The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. that are within the spirit and principle of the present invention should be included in the present invention.

Claims

1. A method for implementing text browsing is characterized by comprising the following steps:

under a character browsing mode of a browser, requesting a webpage server to obtain the webpage, wherein pictures, videos and Flash are not displayed under the character browsing mode, sound is not played, and all other resources except characters needing to be displayed on the webpage are shielded;

distinguishing character information and non-character information according to resource attributes in a webpage, wherein the resource attributes in the webpage comprise tags corresponding to the non-character information; wherein, the label corresponding to the non-character information comprises: image-related labeling: < img src ═ URL ">, sound-related label: < voc src ═ URL' >; the distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps: in the process of browsing the webpage, distinguishing character information and non-character information according to the label corresponding to the non-character information of the analyzed webpage file;

and shielding or deleting the non-text information in the webpage and displaying the text information.

2. The method of claim 1, further comprising: and presetting the level of each label, and setting the shielding degree of other resources browsed by characters according to the level of the label.

3. The method of claim 1 or 2, wherein the blocking of non-textual information in the web page comprises: and shielding the labels corresponding to the non-character information according to the distinguished labels, and browsing the modified webpage file again.

4. A character browsing system is characterized in that the system at least comprises a web server and a client browser, wherein,

the client browser is used for requesting the webpage server to obtain a webpage in a character browsing mode of the browser, and distinguishing character information and non-character information according to resource attributes in the webpage; shielding or deleting non-text information in the webpage and displaying the text information;

wherein: in the character browsing mode, pictures, videos and Flash are not displayed, sound is not played, and all other resources except the characters required to be displayed on the webpage are shielded; the resource attribute in the webpage comprises a label corresponding to non-character information, or a requested resource type, or a node attribute of a DOM tree; the label corresponding to the non-text information comprises: image-related labeling: < img src ═ URL ">, sound-related label: < voc src ═ URL' >;

the distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps: in the process of browsing the webpage, distinguishing character information and non-character information according to the label corresponding to the non-character information of the analyzed webpage file;

5. A client browser is characterized by comprising a request module, an analysis module and a display module, wherein,

the request module is used for requesting a webpage server to obtain a webpage in a character browsing mode of the browser and outputting the webpage to the analysis module; the method comprises the following steps that in the character browsing mode, pictures, videos and Flash are not displayed, sound is not played, and all other resources except characters needing to be displayed on the webpage are shielded;

the analysis module is used for analyzing the webpage from the request module, distinguishing the text information and the non-text information according to the resource attribute in the webpage, and shielding or deleting the non-text information in the webpage, wherein the resource attribute in the webpage comprises a label corresponding to the non-text information; the label corresponding to the non-text information comprises: image-related labeling: < img src ═ URL ">, sound-related label: < voc src ═ URL' >; the distinguishing the text information and the non-text information according to the resource attribute in the webpage comprises the following steps: in the process of browsing the webpage, distinguishing character information and non-character information according to the label corresponding to the non-character information of the analyzed webpage file;

and the display module is used for displaying the text information.