CN101290633A

Movatterモバイル変換

Info

Publication number: CN101290633A
Application number: CNA2008101142748A
Authority: CN
Inventors: 刘建明; 魏晓菁; 王继业; 崔丙锋; 范鹏展; 陈德胜
Original assignee: State Grid Information and Telecommunication Group Co Ltd
Current assignee: State Grid Information and Telecommunication Group Co Ltd
Priority date: 2008-06-02
Filing date: 2008-06-02
Publication date: 2008-10-22

Abstract

本发明公开了一种内容管理集成方法及系统，以解决现有技术中获取的网站内容无法随网站的更新而变化的问题。所述方法包括：从信息源网站截取网页内容，并分析页面结构；保存页面结构的分析结果；选取保存的页面结构分析结果并进行集成；当访问集成的结果时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并进行展现。本发明通过对网页结构进行分析得到信息源网站的网址、被截取网页内容所在的频道、展现的起始位置、结束位置等信息，然后根据所述分析结果可以实时地到信息源网站上获取相关信息，从而实现了定制内容随网站变化的实时更新。而且，当信息源网站的频道、样式等发生改变时，也能够通过网页结构的分析结果正确获取所需内容。

The invention discloses a content management integration method and system to solve the problem in the prior art that the acquired website content cannot be changed with the update of the website. The method includes: intercepting the webpage content from the information source website, and analyzing the page structure; saving the analysis result of the page structure; selecting and integrating the saved page structure analysis result; when accessing the integrated result, according to the analysis result of the page structure from The corresponding information source website obtains the content of the webpage and displays it. The present invention obtains information such as the website address of the information source website, the channel where the content of the intercepted webpage is located, the start position and the end position of the display by analyzing the structure of the webpage, and then according to the analysis results, the relevant information can be obtained from the information source website in real time. Information, so as to realize the real-time update of customized content as the website changes. Moreover, when the channel, style, etc. of the information source website change, the required content can also be correctly obtained through the analysis result of the web page structure.

Description

Translated fromChinese

一种内容管理集成方法及系统An integrated method and system for content management

技术领域technical field

本发明涉及网络技术领域，特别是涉及一种内容管理集成方法及系统。The invention relates to the field of network technology, in particular to a content management integration method and system.

背景技术Background technique

随着网络应用的丰富和发展，很多网站往往不能迅速跟进大量信息衍生及业务模式变革的脚步，常常需要花费许多时间、人力和物力来处理信息更新和维护工作；遇到网站扩充的时候，整合内外网及分支网站的工作就变得更加复杂，甚至还需重新建设网站；如此下去，用户始终在一个高成本、低效率的循环中升级、整合。对于网站建设和信息发布人员来说，他们最关注系统的易用性和功能的完善性，这对网站建设及网络信息管理工具提出了很高的要求，由此，一套专业的内容管理系统(Content Management System，CMS)应运而生了，来有效解决用户网站建设与信息发布中常见的问题和需求。With the enrichment and development of network applications, many websites are often unable to quickly follow the steps of a large amount of information derivation and business model changes, and often need to spend a lot of time, manpower and material resources to deal with information update and maintenance work; when encountering website expansion, The work of integrating internal and external networks and branch websites becomes more complicated, and even the website needs to be rebuilt; if this continues, users will always be upgrading and integrating in a high-cost, low-efficiency cycle. For website builders and information publishers, they are most concerned about the ease of use and completeness of functions of the system, which puts forward high requirements for website construction and network information management tools. Therefore, a professional content management system (Content Management System, CMS) came into being to effectively solve the common problems and needs in user website construction and information release.

内容管理是网站发布内容的后台统一管理平台，是一种位于WEB前端(Web服务器)和后端办公系统或流程(内容创作、编辑)之间的软件系统。内容的创作人员、编辑人员、发布人员使用内容管理系统来提交、修改、审批、发布内容。这里指的“内容”包括文件、表格、图片、数据库中的数据甚至视频等一切想要发布到网站的信息。也就是说，网站发布的内容并不是直接编辑并上传静态页面到访问服务器上，而是在内容管理平台上首先进行编辑、审核流程后再利用平台自动发布到访问服务器上。例如，某网站管理人员并不是把逐条制作新闻的网页发布到访问服务器上，而是在内容管理平台上编辑审核好新闻后，利用内容管理平台将新闻网页发布到访问服务器上。Content management is a background unified management platform for publishing content on a website. It is a software system located between the WEB front-end (Web server) and the back-end office system or process (content creation, editing). Content creators, editors, and publishers use content management systems to submit, modify, approve, and publish content. The "content" referred to here includes files, tables, pictures, data in databases, and even videos, all information that you want to publish on the website. That is to say, the content published by the website is not directly edited and uploaded to the access server on a static page, but first edited and reviewed on the content management platform and then automatically released to the access server by using the platform. For example, a website manager does not publish news pages one by one to the access server, but uses the content management platform to publish the news pages to the access server after editing and reviewing the news on the content management platform.

例如，对于集团公司的网站，每天都需要录入基层单位的网站新闻信息。传统方式是网站采编人员每天访问这些网站，然后将网站上的内容进行下载，并手工录入到内容管理平台中。但是对于大量的新闻内容，如果一条一条手工录入到内容管理平台上，对于网站采编人员来讲，其工作量是非常巨大的。For example, for the website of a group company, it is necessary to enter the news information of the website of the grassroots unit every day. The traditional method is that website editors visit these websites every day, then download the content on the website and manually enter it into the content management platform. However, for a large amount of news content, if one by one is manually entered on the content management platform, the workload for the editors of the website will be very huge.

现有技术解决该问题的方法是采用简单的网页裁剪技术，该技术是对网站指定位置的内容进行抓取和裁剪。例如，对各基层单位的网站指定位置的内容进行抓取和裁剪，将裁剪到的网页内容完整地下载到本地，然后通过内容管理平台显示在集团公司的网站上。The prior art method to solve this problem is to adopt a simple web page clipping technology, which is to grab and clip the content at a specified location on the website. For example, grab and cut the content at the designated location of the website of each grassroots unit, download the cut webpage content completely to the local, and then display it on the website of the group company through the content management platform.

该现有技术的缺点在于，由于只能对指定位置的内容进行抓取和裁剪，所以被裁剪的网站内容为静态形式，不能随网站的更新而变化；而且，当网站的频道、样式等调整后，裁剪功能往往不能成功，需要重新设定裁剪对象。The shortcoming of this prior art is that, because only the content of specified position can be grabbed and clipped, so the website content that is clipped is static form, can't change along with the update of website; Finally, the cropping function often fails, and the cropping object needs to be reset.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种内容管理集成方法及系统，以解决现有技术中获取的网站内容无法随网站的更新而变化的问题。In view of this, the purpose of the present invention is to provide an integrated method and system for content management, so as to solve the problem in the prior art that the acquired website content cannot be changed with the update of the website.

为实现上述目的，本发明提供了如下方案：To achieve the above object, the present invention provides the following scheme:

一种内容管理集成方法，包括：An integrated approach to content management comprising:

从信息源网站截取网页内容，并分析页面结构；Intercept web page content from the information source website and analyze the page structure;

保存页面结构的分析结果；Save the results of the analysis of the page structure;

选取保存的页面结构分析结果并进行集成；当访问集成的结果时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并进行展现。Select and integrate the saved page structure analysis results; when accessing the integrated results, obtain the web page content from the corresponding information source website according to the page structure analysis results, and display it.

其中，所述页面结构的分析结果包括信息源网站的网址、被截取网页内容所在的频道、展现的起始位置、结束位置，则根据页面结构的分析结果从相应的信息源网站获取网页内容的具体实现包括：根据信息源网站的网址，找到被截取网页内容所在的网站；根据被截取网页内容所在的频道、展现的起始位置、结束位置，从该网站的相应位置获取网页内容。Wherein, the analysis result of the page structure includes the website address of the information source website, the channel where the intercepted webpage content is located, the start position and the end position of the display, and then according to the analysis result of the page structure, the information of the webpage content is obtained from the corresponding information source website. The specific realization includes: according to the URL of the information source website, finding the website where the content of the intercepted webpage is located; according to the channel where the content of the intercepted webpage is located, the starting position and the ending position of display, obtaining the webpage content from the corresponding position of the website.

其中，保存页面结构分析结果的具体实现包括：将页面结构的分析结果按条目进行存储，并定义条目的标识名，其中每个条目对应一个网页截取对象。Wherein, the specific implementation of saving the page structure analysis results includes: storing the page structure analysis results by items, and defining the identification names of the items, wherein each item corresponds to a web page interception object.

优选的，选取保存的页面结构分析结果并进行集成的具体实现包括：读取存储的条目，并根据条目的标识名选择需要集成的条目；对所述需要集成的条目进行封装，生成PortLet；将生成的PortLet进行保存。Preferably, the specific implementation of selecting the saved page structure analysis results and integrating them includes: reading the stored items, and selecting the items that need to be integrated according to the identification name of the items; encapsulating the items that need to be integrated to generate a PortLet; Save the generated PortLet.

其中，展现集成结果的具体实现包括：定制要展现的PortLet；将选中的PortLet发布到访问服务器上展现。Wherein, the specific implementation of displaying the integration result includes: customizing the PortLet to be displayed; publishing the selected PortLet to the access server for display.

一种内容管理集成系统，包括：An integrated content management system comprising:

网页抓取分析单元，用于从信息源网站截取网页内容，并分析页面结构；A webpage crawling analysis unit is used to intercept webpage content from the information source website and analyze the page structure;

存储单元，用于保存页面结构的分析结果；The storage unit is used to save the analysis result of the page structure;

集成展现单元，用于选取保存的页面结构分析结果并进行集成；当访问集成的结果时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并进行展现。The integrated display unit is used to select and integrate the saved page structure analysis results; when the integrated results are accessed, the web page content is obtained from the corresponding information source website according to the page structure analysis results and displayed.

其中，所述页面结构的分析结果包括信息源网站的网址、被截取网页内容所在的频道、展现的起始位置、结束位置，则所述集成展现单元通过以下方式根据页面结构的分析结果从相应的信息源网站获取网页内容：根据信息源网站的网址，找到被截取网页内容所在的网站；根据被截取网页内容所在的频道、展现的起始位置、结束位置，从该网站的相应位置获取网页内容。Wherein, the analysis result of the page structure includes the URL of the information source website, the channel where the content of the intercepted webpage is located, the start position and the end position of the presentation, and the integrated presentation unit selects from the corresponding page structure analysis result in the following manner: Obtain webpage content from the information source website: find the website where the intercepted webpage content is located according to the URL of the information source website; obtain the webpage from the corresponding position of the website according to the channel where the intercepted webpage content is located, the start position, and the end position of the display content.

其中，所述存储单元通过以下方式保存页面结构的分析结果：将页面结构的分析结果按条目进行存储，并定义条目的标识名，其中每个条目对应一个网页截取对象。Wherein, the storage unit saves the analysis results of the page structure in the following manner: storing the analysis results of the page structure as entries, and defining identification names of the entries, wherein each entry corresponds to a webpage interception object.

优选的，所述集成展现单元采用PortLet封装，具体包括：PortLet生成单元，用于选取保存的页面结构分析结果并进行封装，生成PortLet；PortLet库，用于存储PortLet；PortLet框架，用于当访问定制的PortLet时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并发布到访问服务器上展现。Preferably, the integrated presentation unit adopts PortLet packaging, specifically comprising: a PortLet generation unit, used to select and package the saved page structure analysis results to generate a PortLet; a PortLet library, used to store the PortLet; a PortLet framework, used for When customizing the PortLet, according to the analysis results of the page structure, the web page content is obtained from the corresponding information source website, and published to the access server for display.

其中，所述PortLet生成单元采用配置方式。Wherein, the PortLet generating unit adopts configuration mode.

根据本发明提供的具体实施例，本发明公开了以下技术效果：According to the specific embodiments provided by the invention, the invention discloses the following technical effects:

首先，针对抓取回来的网页，通过对网页结构进行分析得到信息源网站的网址、被截取网页内容所在的频道、展现的起始位置、结束位置等信息，然后根据所述分析结果可以实时地到信息源网站上获取相关信息，从而实现了定制内容随网站变化的实时更新。Firstly, for the captured web pages, by analyzing the structure of the web pages, information such as the URL of the information source website, the channel where the content of the intercepted web page is located, the starting position and the ending position of the presentation, etc. can be obtained, and then according to the analysis results, real-time Obtain relevant information from the information source website, thus realizing the real-time update of customized content as the website changes.

其次，当信息源网站的频道、样式等发生改变时，也能够通过网页结构的分析结果正确获取所需内容。Secondly, when the channel, style, etc. of the information source website change, the required content can also be obtained correctly through the analysis result of the webpage structure.

再次，能够灵活地实现任意网站内容的订阅，通过建立各种内容的资源库，可以实现对集成内容的个性化定制。Thirdly, it can flexibly realize the subscription of any website content, and realize the personalized customization of the integrated content by establishing a resource library of various content.

最后，实现了统一的页面抓取过程，由抓取网页服务的服务器链接到信息源网站统一抓取网页。Finally, a unified page crawling process is realized, and the server of the webpage crawling service is linked to the information source website to uniformly crawl the webpage.

附图说明Description of drawings

图1是本发明实施例提供的内容管理集成方法流程图；Fig. 1 is a flowchart of a content management integration method provided by an embodiment of the present invention;

图2是图1所述方法中选取保存的页面结构分析结果并进行集成的流程图；Fig. 2 is the flow chart that selects and preserves the page structure analysis result and integrates in the method described in Fig. 1;

图3是本发明优选实施例提供的方法流程图；Fig. 3 is the flow chart of the method provided by the preferred embodiment of the present invention;

图4是本发明实施例提供的内容管理集成系统结构图。Fig. 4 is a structural diagram of a content management integration system provided by an embodiment of the present invention.

具体实施方式Detailed ways

本发明提供了一种内容管理集成方法，下面结合附图对该方法进行详细地描述。The present invention provides a content management integration method, which will be described in detail below in conjunction with the accompanying drawings.

实施例、参见图1，本发明实施例提供的内容管理集成方法包括以下步骤：Embodiment Referring to FIG. 1, the content management integration method provided by the embodiment of the present invention includes the following steps:

S101：从信息源网站截取网页内容，并分析页面结构；S101: intercept web page content from the information source website, and analyze the page structure;

需要说明的是，本发明实施例是从实际应用的需要出发，考虑到用户在定制网页内容时，或集团公司在定制基层单位网页内容时，通常都是针对某个栏目进行定制，例如，用户定制某门户网站的天气预报栏目内容，集团公司定制某基层单位的政工信息栏目内容等。对于一个成熟的门户网站来讲，页面上包括哪些栏目以及各栏目所在的位置通常是固定的，各栏目所在的频道各不相同，并且每个栏目各自的频道通常不会发生变化。It should be noted that the embodiment of the present invention is based on the needs of practical applications, considering that when users customize the content of webpages, or when group companies customize the content of webpages of grassroots units, they usually customize for a certain column. For example, users Customize the content of the weather forecast column of a certain portal website, and the group company customize the content of the political information column of a grassroots unit, etc. For a mature portal website, the columns included on the page and the positions of each column are usually fixed, and the channels of each column are different, and the respective channels of each column usually do not change.

因此本步骤中截取网页内容是指对信息源网站上的某个栏目进行截取，然后对被截取的栏目进行页面结构的分析，分析出信息源网站的网址、该栏目所在频道以及在信息源网站上展现的起始位置、结束位置、样式等信息。根据这些信息，就可以根据信息源网站的网址，到被截取的栏目所在频道的相应位置处实时抓取被截取栏目中的信息。Therefore, intercepting webpage content in this step refers to intercepting a certain column on the information source website, and then analyzing the page structure of the intercepted column, and analyzing the URL of the information source website, the channel where the column is located, and the information on the information source website. Information such as the start position, end position, and style displayed on the screen. Based on these information, the information in the intercepted column can be captured in real time at the corresponding position of the channel where the intercepted column is located according to the website address of the information source website.

其中，截取的网页内容可以由用户根据需要进行选择，本发明实施例可以利用页面裁剪技术，系统提供页面裁剪工具，该工具可以集成在内容定制网站上(如集团公司的网站上)，用户可以在该网站上打开该工具，并在该工具中输入信息源的网站地址，页面显示出来之后，使用鼠标选取要截取的内容块，系统便可以对被截取的页面结构进行分析。Wherein, the intercepted web page content can be selected by the user according to the needs. The embodiment of the present invention can use the page cutting technology, and the system provides a page cutting tool, which can be integrated on the content customization website (such as the website of the group company), and the user can Open the tool on the website, and enter the website address of the information source in the tool. After the page is displayed, use the mouse to select the content block to be intercepted, and the system can analyze the intercepted page structure.

S102：保存页面结构分析的结果；S102: saving the result of page structure analysis;

如前文所述，页面结构分析的结果包括信息源网站的网址、截取的内容模块的频道、展现的起始位置、结束位置、样式等，本发明实施例可以提供内容资料库，将这些页面结构信息进行格式化，并按条目存储在内容资料库中，每条记录对应一条网站网页抓取对象，如某一网站上的北京天气、体育新闻等。As mentioned above, the results of page structure analysis include the URL of the information source website, the channel of the intercepted content module, the start position, end position, style, etc. The information is formatted and stored in the content database by entry. Each record corresponds to a web page crawling object, such as Beijing weather and sports news on a certain website.

由于内容资料库中保存的条目很多，用户通常不是对所有截取的内容都进行集成，而是从中选取一部分来集成，因此，为了便于用户识别，按条目存储时允许用户对每个条目定义标识名。一条记录中的内容包括：记录标示名、网址、所在频道、内容起始位置、内容结束位置等。例如在集团公司中需要定制基层单位的基层政工信息，则该条记录为：某基层单位网站的政工信息、该基层网站的网址、政工信息的起始位置、政工信息结束位置。其中，某基层单位网站的政工信息即为用户定义的条目的标识名。Because there are many items saved in the content database, users usually do not integrate all the intercepted content, but select a part of it to integrate. Therefore, in order to facilitate user identification, users are allowed to define an identification name for each item when storing by item . The content of a record includes: record label name, URL, channel, content start position, content end position, etc. For example, if the group company needs to customize the grassroots political work information of the grassroots unit, the record is: the political work information of a certain grassroots unit website, the website address of the grassroots website, the starting position of the political work information, and the ending position of the political work information. Among them, the political information of the website of a grassroots unit is the identification name of the user-defined entry.

S103：选取保存的页面结构分析结果并进行集成，然后显示到内容定制网站上；S103: Select and integrate the saved page structure analysis results, and then display them on the content customization website;

该过程是将截取的页面内容重新拼装到内容定制网站上，当然，如前文所述，这里不一定是截取的所有内容都重新拼装，用户可以根据实际情况进行选择需要拼装的内容，然后由本发明的系统完成拼装。This process is to reassemble the intercepted page content on the content customization website. Of course, as mentioned above, it is not necessarily that all the intercepted contents are reassembled here. The user can select the content to be assembled according to the actual situation, and then the present invention The system is assembled.

需要说明的是，此时显示到内容定制网站上的内容还是栏目的形式，用户可以通过访问各栏目来获取栏目中的详细信息。It should be noted that, at this time, the content displayed on the content customization website is still in the form of columns, and users can obtain detailed information in the columns by accessing each column.

S104：当用户访问集成的结果时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并进行展现。S104: When the user accesses the integrated result, obtain the webpage content from the corresponding information source website according to the analysis result of the page structure, and display it.

当用户选择需要展现的定制内容时，系统会根据信息源网站的网址，找到被截取网页内容所在的网站；然后根据被截取网页内容所在的频道、展现的起始位置、结束位置等信息，从该网站的相应位置获取网页内容。When the user selects the customized content to be displayed, the system will find the website where the content of the intercepted web page is located according to the URL of the information source website; The corresponding location of the website to get the content of the web page.

由上可见，本发明针对被截取的网页，不仅仅是简单地直接抓取，而是通过对网页结构进行分析得到信息源网站的网址、被截取网页内容所在的频道、展现的起始位置、结束位置等信息，然后根据所述分析结果可以实时地到信息源网站上获取相关信息，从而实现了定制内容随网站变化的实时更新。并且，当信息源网站的频道、样式(指网页颜色、栏目内相对位置等)等发生改变时，也能够通过网页结构的分析结果从信息源网站正确获取所需内容，从而实时更新定制内容。As can be seen from the above, the present invention is aimed at intercepted webpages, not simply grabbing directly, but by analyzing the structure of the webpage to obtain the URL of the information source website, the channel where the content of the intercepted webpage is located, the starting position of the display, End location and other information, and then according to the analysis results, relevant information can be obtained from the information source website in real time, thereby realizing the real-time update of the customized content as the website changes. Moreover, when the channel and style of the information source website change (referring to the color of the web page, the relative position in the column, etc.), it is also possible to correctly obtain the required content from the information source website through the analysis results of the web page structure, thereby updating the customized content in real time.

其中，选取保存的页面结构分析结果并进行集成的步骤可以采用Portlet技术来实现，下面对该过程进行详细的描述。参见图2，该方法包括以下步骤：Wherein, the steps of selecting and integrating the saved page structure analysis results can be realized by using Portlet technology, and the process will be described in detail below. Referring to Figure 2, the method comprises the following steps:

S201：将保存的条目生成Portlet，也就是采用配置方式将内容资料库中保存的条目封装成Portlet，这些被封装的Potrtlet能够在内容定制网站的Portlet框架下进行展现。S201: Generating Portlets from the stored items, that is, packaging the items stored in the content database into Portlets in a configuration manner, and these encapsulated Portlets can be displayed under the Portlet framework of the content customization website.

为了可以支持个性化定制，可以在将保存的条目生成Portlet时，首先读取内容资料库中的条目，然后由用户来选择需要集成的条目，再将这些需要集成的条目封装成Portlet，生成符合Portlet规范的数据，并保存到内容定制网站的Portlet数据库中，完成对Portlet的注册。In order to support personalized customization, when generating portlets for saved items, first read the items in the content database, then let the user select the items that need to be integrated, and then package these items that need to be integrated into portlets to generate Portlet specification data and save it in the Portlet database of the content customization website to complete the portlet registration.

S202：将生成的Portlet进行保存，也就是将封装好的Portlet存放在Portlet数据库中，并进行展现，从用户的角度看来，就是将定制的网页内容显示在网站上。当用户希望展现某页面内容时，则可以选择相应的Portlet，选中的Portlet将在用户定制内容的网站上得到展现。S202: Save the generated portlet, that is, store the packaged portlet in the portlet database and display it. From the user's point of view, it is to display the customized webpage content on the website. When the user wants to display the content of a certain page, he can select the corresponding portlet, and the selected portlet will be displayed on the website of the user-customized content.

其中Portlet框架可以看做是存放所有Portlet的容器，负责作为页面组成基础模块和门户管理核心基础组件的Portlet的生成、修改、删除、共享等，以及对Portlet属性的管理。通俗来讲，Portlet是具体的功能模块，而Portlet框架是存放这些模块的平台。Among them, the Portlet framework can be regarded as a container for storing all Portlets, and is responsible for the generation, modification, deletion, sharing, etc. of Portlets as the basic modules of page composition and core basic components of portal management, as well as the management of Portlet attributes. Generally speaking, Portlet is a specific functional module, and the Portlet framework is a platform for storing these modules.

为更好地理解本发明的技术方案，下面结合实际应用对本发明提供的方法进行描述。In order to better understand the technical solution of the present invention, the method provided by the present invention will be described below in combination with practical applications.

参见图3，本发明优选实施例提供的方法包括以下步骤：Referring to Fig. 3, the method provided by the preferred embodiment of the present invention includes the following steps:

S301：用户登录门户系统(即内容定制网站)；S301: the user logs in to the portal system (that is, the content customization website);

S302：用户利用剪裁工具对信息源网站的网页内容进行截取，系统辅助分析网页内容的信息构成形式；S302: The user intercepts the content of the webpage of the information source website by using a clipping tool, and the system assists in analyzing the information composition form of the content of the webpage;

S303：将分析的内容按照条目存入内容资料库中；S303: storing the analyzed content in the content database according to items;

S304：Portlet生成模块读取内容资料库中的条目；S304: the portlet generation module reads the entries in the content database;

S305：用户选择需要集成的内容条目；S305: The user selects a content item to be integrated;

S306：Portlet生成模块对内容条目进行封装，生成Portlet；S306: the portlet generation module encapsulates the content item to generate a portlet;

S307：将生成后的Portlet存储到Portlet库中；S307: storing the generated portlet in a portlet library;

这样生成后的Portlet将展现在用户的门户系统上；The portlet generated in this way will be displayed on the user's portal system;

S308：用户选择希望展现的被封装的portlet；S308: The user selects the encapsulated portlet to be displayed;

S309：选中的portlet对应的信息在门户系统中得到展现。S309: The information corresponding to the selected portlet is displayed in the portal system.

用户选中希望展现的被封装的portlet后，系统会根据保存的页面结构分析结果，到信息源网站上抓取portlet对应的信息，并在用户的门户系统中进行展示。After the user selects the encapsulated portlet to be displayed, the system will capture the information corresponding to the portlet from the information source website according to the saved page structure analysis results, and display it in the user's portal system.

综上所述，当用户定制内容后，每次访问时系统实时到相应网站上抓取这些信息，因而内容是实时更新的，本发明所述的内容管理是动态的。而且，这种新型的内容管理集成方法，利用页面裁剪技术可以在网络互联范围内抓取任意网站的指定频道内容，并对抓取的内容进行数据分析，然后与已有的门户系统集成，从而扩展内容管理的信息获取源，实现对集成内容的个性化定制。并且，实现了统一的页面抓取过程，由抓取网页服务的服务器链接到信息源网站统一抓取网页。To sum up, after the user customizes the content, the system grabs the information on the corresponding website in real time each time they visit, so the content is updated in real time, and the content management described in the present invention is dynamic. Moreover, this new method of content management integration can capture the specified channel content of any website within the network interconnection range by using the page cutting technology, and perform data analysis on the captured content, and then integrate with the existing portal system, thereby Expand the source of information acquisition for content management, and realize the personalized customization of integrated content. Moreover, a unified page crawling process is realized, and the webpage crawling server is linked to the information source website to uniformly crawl the webpage.

针对上述方法，本发明还提供了一种内容管理集成系统的实施例。参照图4，是实施例提供的内容管理集成系统结构图。所述系统主要包括：For the above method, the present invention also provides an embodiment of a content management integration system. Referring to FIG. 4 , it is a structural diagram of the integrated content management system provided by the embodiment. The system mainly includes:

网页抓取分析单元U401，用于从信息源网站截取网页内容，并分析页面结构；A webpage capture analysis unit U401, used to intercept webpage content from the information source website and analyze the page structure;

存储单元U402，用于保存页面结构的分析结果；A storage unit U402, configured to store the analysis results of the page structure;

集成展现单元U403，用于选取保存的页面结构分析结果并进行集成；当访问集成的结果时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并进行展现。The integrated display unit U403 is used to select and integrate the stored page structure analysis results; when accessing the integrated results, obtain the webpage content from the corresponding information source website according to the page structure analysis results, and display it.

其中，所述页面结构的分析结果包括信息源网站的网址、被截取网页内容所在的频道、展现的起始位置、结束位置，则所述集成展现单元U403通过以下方式根据页面结构的分析结果从相应的信息源网站获取网页内容：Wherein, the analysis result of the page structure includes the URL of the information source website, the channel where the content of the intercepted web page is located, the start position and the end position of the presentation, and the integrated presentation unit U403 uses the following methods to obtain the results from the analysis result of the page structure: The corresponding information source website obtains the content of the web page:

根据信息源网站的网址，找到被截取网页内容所在的网站；According to the URL of the information source website, find the website where the content of the intercepted webpage is located;

根据被截取网页内容所在的频道、展现的起始位置、结束位置，从该网站的相应位置获取网页内容。According to the channel where the intercepted webpage content is located, the start position and the end position of the display, the webpage content is obtained from the corresponding position of the website.

其中，所述存储单元U402即指前述的内容资料库，通过以下方式保存页面结构的分析结果：Wherein, the storage unit U402 refers to the aforementioned content database, and stores the analysis results of the page structure in the following manner:

将页面结构的分析结果按条目进行存储，并定义条目的标识名，其中每个条目对应一个网页截取对象。The analysis results of the page structure are stored as entries, and the identification names of the entries are defined, wherein each entry corresponds to a webpage interception object.

优选的，所述集成展现单元U403采用PortLet封装，Portlet封装能够通过配置的方式将存储单元U402中的条目封装成Portlet，这些被封装Portlet能够在门户的Portlet框架下进行展现和个性化定制。Preferably, the integrated presentation unit U403 adopts Portlet encapsulation, and the portlet encapsulation can encapsulate the entries in the storage unit U402 into portlets through configuration, and these encapsulated portlets can be displayed and customized under the portlet framework of the portal.

PortLet封装具体包括：The PortLet package specifically includes:

PortLet生成单元U4031，用于选取保存的页面结构分析结果并进行封装，生成PortLet；封装过程采用配置方式；The PortLet generation unit U4031 is used to select the saved page structure analysis results and encapsulate them to generate PortLet; the encapsulation process adopts the configuration method;

PortLet库U4032，用于存储PortLet；PortLet library U4032, used to store PortLet;

PortLet框架U4033，用于当用户访问定制的PortLet时，根据页面结构的分析结果从相应的信息源网站获取网页内容，并发布到访问服务器上展现。其中，所述访问服务器即对应前述的内容定制网站的服务器。PortLet框架，作为被封装的PortLet展现框架，用户可以对符合要求的被封装的Portlet进行个性化定制，从而确定在登录后的个人门户上的显示内容。The PortLet framework U4033 is used to obtain the web page content from the corresponding information source website according to the analysis result of the page structure when the user accesses the customized PortLet, and publish it to the access server for display. Wherein, the access server is a server corresponding to the aforementioned content customization website. Portlet framework, as a packaged Portlet display framework, users can customize the packaged portlets that meet the requirements, so as to determine the display content on the personal portal after login.

图4所示系统中未详述的部分可以参见图1、图2和图3所示方法的相关部分，为了篇幅考虑，在此不再详述。For parts not detailed in the system shown in FIG. 4 , reference may be made to relevant parts of the methods shown in FIG. 1 , FIG. 2 and FIG. 3 , and will not be described in detail here for the sake of space.

以上对本发明所提供的一种内容管理集成方法及系统，进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本发明的限制。The content management integration method and system provided by the present invention have been introduced above in detail. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the present invention. method and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and application range. In summary, the contents of this specification should not be construed as limiting the present invention.