Disclosure of Invention
In order to solve the above problems, the present invention provides a method for processing web page data based on a database, which can quickly process data in a web page, preferentially display content with the highest relevance through analysis and processing, and ensure high readability of the web page data for a user.
In order to achieve the above object, the present invention comprises the steps of:
s1: setting related words of the website industry through a web keyword acquisition tool, and mining keywords based on a search engine;
s2: importing the keywords into a cloud database in the same environment as the web page, and filtering, classifying and labeling the keywords;
s3: constructing a page template layout and corresponding module elements required by web page data processing;
s4: and the web page loads keywords through the page template, processes the module elements and the content data and generates a web page data display effect.
Further, in S1, website industry related words are added to the web keyword collection tool, and the industry keywords are mined in the search engine by a simulation search.
Further, S2 further includes:
s21: leading the mined keywords into a temporary data table of a database;
s22: filtering keywords irrelevant to the website, and taking the screened keywords as keywords to be classified; filtering is to filter and delete the key words irrelevant to the industry in the database through character extraction;
s23: matching keyword classification, and determining keyword classification and presentation forms; the classification is to determine the classification and content display form of the keywords by identifying the core semantics in the keywords;
s24: marking key word attributes, marking all levels of attributes of the key words through matching of a pre-constructed database dictionary, and specifically, carrying out step-by-step matching with contents in the database according to the sequence from large word meaning range to small word meaning range of the dictionary;
s25: and moving the keywords which are filtered, classified and labeled from the temporary table of the database to the formal table to be used as core keywords.
Further, in S3, the layout for constructing the page template is to perform information level setting on the page layout area; and sets a content presentation form for the module elements in the page layout area.
Further, in S4, the web page reads the corresponding relationship of the keywords stored in the database table and the characteristics of the categories, forms, semantics, and the like thereof, and simultaneously loads the page template and sends the keywords to the page template through the interface, and the page template identifies the categories of the keywords and dynamically matches the module element forms in the layout area of the loaded page; the content of each module element is classified according to the information of the layout area where the module element is located, and the related content with the highest matching degree is displayed by combining the corresponding keyword attributes.
Compared with the prior art, the invention has the beneficial effects that:
1. the method adopts a keyword mining mode, mines the keywords concerned by the user based on the search engine, and takes the keywords as the core keywords of the web page processing data for processing and displaying the data required to be displayed by the web page, thereby improving the readability of the web page data for the user.
2. When the method is used for processing the keywords of the web page, the keywords are analyzed from three aspects of classification, form and content, the semantics of the keywords are mined to the minimum semantic degree, the identification and labeling effect on the keywords is improved, and the data processing dimensionality and accuracy in the web page are wider.
3. The web page uses the dynamic page template technology when processing data, and has the advantages of flexibility and accuracy when processing data compared with the traditional single fixed template form.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 1, a method for processing web page data based on a database according to an embodiment of the present invention includes the following steps:
s1, setting related words of the website industry through an acquisition tool, and mining keywords based on a search engine;
firstly, according to the industry to which a website belongs, industry related words are set in a collection tool, and the mining of industry keywords is carried out, wherein in order to ensure the accuracy and the relevance of the mined keywords, the set industry related words need to be added with industry long-tail words as initiator words for mining, the industry long-tail words refer to a combined word of a word under different semantic contexts, for example, the automobile price is used as a standard related word of the automobile industry, so that the lowest query of the automobile price can be used as the industry long-tail words under the semantic of a question sentence and under the regional condition. In the mining process, the set industry related words are searched and simulated in the search engine, and the recommended words of the search result of the search engine are extracted to be used as the keywords to be screened. The collection tool is a web keyword collection tool that can be loaded on a web page, and is a very mature technology in the prior art, and is not described herein again.
S2, importing the keywords into a cloud database in the same environment as the web page, and filtering, classifying and labeling the keywords;
s2 is further subdivided into S21-S25:
s21: in order to ensure the purity of database table data before and after key screening, keywords to be screened are firstly led into a temporary table of a database;
s22, filtering keywords irrelevant to the website, and taking the screened keywords as keywords to be classified; the filtering is to filter and delete the key words irrelevant to the industry in the database through character extraction. Due to the industry relevance of the keywords, the readability of the aggregated module elements and content data is directly determined, so that the keywords to be screened need to be compared with an industry database dictionary, the industry database dictionary can be constructed by recording words of industry vertical websites, and the matching coincidence times of the keywords to be screened and the database dictionary are determined by recording. When the number of matching times outside the industry is greater than the number of matching times inside the industry, the keyword can be judged to belong to a non-industry keyword. And when the key words do not belong to the industry key words, filtering and deleting, and when the key words belong to the industry key words, keeping in the database.
S23, matching the keyword classification, and determining the keyword classification and display form; the classification is to determine the keyword classification and content display form by identifying the core semantics in the keywords.
It should be noted that, the keyword classification is general, such as characters, products, buildings, events and brands, and in combination with the characteristics of website building, differentiated keyword classification is added, core semantic recognition of the keyword is performed, a tag set of the keyword is determined, each classification coefficient of the tag set is counted, and the classification with the largest matching number can be determined as the keyword classification; further, the content form tag in the keyword is extracted, and if the content form tag can be directly extracted from the keyword, the content form tag can be directly used, such as characters, videos and pictures. If the content form label cannot be extracted, the content display form can be determined according to the invisible meaning of the keyword, if the playback of a certain event can be determined as a video form, and if a certain player accesses a brief draft, the content display form can be determined as a character form.
And S24, labeling the key word attributes, and labeling all levels of attributes of the key words through matching of a pre-constructed database dictionary, wherein the step-by-step matching is performed with the contents in the database dictionary according to the sequence from large word meaning range to small word meaning range of the dictionary.
Since there may be more than one attribute of a keyword and the range of semantic inclusion of the attribute of the keyword is from large to small, there is a large difference. Therefore, in the process of labeling the keywords, in order to present accurate and effective content, the keywords need to be labeled by deep matching. In the matching process, a semantic dictionary base is needed, data in the semantic dictionary base is obtained, professional terms in each field and each scene in each industry are taken, semantic words in the semantic dictionary words are divided according to the size of a semantic range, and meanwhile the full-network content data quantity of the semantic words is used as the weight of the semantic words. And matching the keywords with the data in the semantic dictionary base according to the sequence of the semantics from large to small, wherein when the matched attributes of the keywords are one, the attributes are the primary attributes of the keywords. When the two attributes matched by the key words are provided, the smaller range is the first-level attribute, and the larger range is the second-level attribute. When the number of the attributes matched by the key words is equal to or more than three, if the number of the attributes is an odd number, the middle value of the range is taken as a first-level attribute, the minimum value of the range is taken as a second-level attribute, the maximum value of the range is taken as a third-level attribute, other residual attributes are four-level attributes, if the number of the attributes is an even number, the weights of the two semantic words in the middle are taken as the first-level attributes, the minimum value of the range is taken as the second-level attribute, the maximum value of the range is taken as the third-level attribute, and other residual attributes are four-level attributes.
And S25, moving the filtered, classified and labeled keywords from the temporary database table to the formal database table to be used as core keywords.
S3, constructing a page template layout and corresponding module elements required by web page data processing;
the layout for constructing the page template is to set the information level of a page layout area; and sets a content presentation form for the module elements in the page layout area. And constructing the layout of the page template, and setting the information types of the page layout area according to the browsing habits of the user, wherein the page layout area is divided into a first type information area, a second type information area, a third type information area and a fourth type information area. The first-class information is used for displaying first-class attribute related information of the keywords, the first-class information area preferentially displays the first-class attribute related information of the keywords, the second-class information area preferentially displays the second-class attribute related information of the keywords, the third-class information area preferentially displays the third-class attribute related information of the keywords, and the fourth-class information area preferentially displays the remaining fourth-class attribute related information of the keywords; it should be noted that when the key word attributes are less than the four types of settings of the page layout, replacement and supplement are performed in sequence according to the order of the key word attributes from large to small; and determining the information category, and selecting module elements according to the category and setting a content display form in each area.
As shown in fig. 3, the first-class information region is set in the central region of the page template, has the largest area and the highest attention, and can be observed without rotating the mouse or dragging the vertical scroll bar; the second-class information area is arranged in the top area of the page template, has the second-highest attention degree and can be observed without rotating a mouse or dragging a vertical scroll bar; the three types of information areas are arranged in the right area of the page template, and can not be displayed completely, the attention degree is not high, and the information can be observed completely only by dragging a horizontal scroll bar; the four types of information areas are arranged in the area at the lower right of the page template, and can not be displayed completely, and the attention can be completely observed only by rotating a mouse or dragging a vertical scroll bar or a horizontal scroll bar.
S4, loading keywords on the web page through the page template, processing the module elements and the content data, and generating the web page data display effect;
specifically, a web page reads the corresponding relation of keywords stored in a database table and the characteristics of the keywords, such as classification, form, semantics and the like, simultaneously loads a page template and sends the keywords to the page template through an interface, and the page template identifies the classification of the keywords and dynamically matches and loads the module element form in a page layout area; the content of each module element is classified according to the information of the layout area where the module element is located, and the related content with the highest matching degree is displayed by combining the corresponding keyword attributes. Taking an example that a certain player of a certain football team participates in a certain event, by the processing method, the keywords are labeled, the result obtained by the matching mode of the semantic range from large to small is football- > event- > team- > player, and at the moment, the football is put into three types of information areas as three-level attributes, and simultaneously, the related content of the football is displayed; the player can be placed in a second-class information area as a second-class attribute, and relevant contents of the player are displayed; for semantic words between the event and the team, as the semantic word weight of the event is higher than that of the team, the event is taken as a first-level attribute and put in a first-level information area to display the related content of the event; and putting the team as a four-level attribute into a four-type information area, and displaying the relevant content of the team.
The above embodiments are merely technical solutions of the present invention and not limitations, it should be noted that, for those skilled in the art, modifications or equivalents may be made to the specific embodiments of the present invention without departing from the technical principles of the present invention, and it should be understood that all modifications or equivalents may fall within the scope of the claims of the present invention.