Movatterモバイル変換


[0]ホーム

URL:


WO2025087150A1 - Network information generation method and system, electronic device, and storage medium - Google Patents

Network information generation method and system, electronic device, and storage medium
Download PDF

Info

Publication number
WO2025087150A1
WO2025087150A1PCT/CN2024/125611CN2024125611WWO2025087150A1WO 2025087150 A1WO2025087150 A1WO 2025087150A1CN 2024125611 WCN2024125611 WCN 2024125611WWO 2025087150 A1WO2025087150 A1WO 2025087150A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
preset
network data
network
prompt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/125611
Other languages
French (fr)
Chinese (zh)
Inventor
沈慧
魏昱丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Alibaba International Internet Industry Co Ltd
Original Assignee
Hangzhou Alibaba International Internet Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Alibaba International Internet Industry Co LtdfiledCriticalHangzhou Alibaba International Internet Industry Co Ltd
Publication of WO2025087150A1publicationCriticalpatent/WO2025087150A1/en
Pendinglegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Definitions

Landscapes

Abstract

Embodiments of the present disclosure provide a network information generation method and system, an electronic device, and a medium. The method comprises: respectively acquiring page information of preset network data sources to obtain a first information acquisition result corresponding to each network data source; carrying out summarization on the first information acquisition result to obtain a second information acquisition result; and on the basis of a preset prompt template and the second information acquisition result, calling a preset large-scale language model to generate network information corresponding to the preset network data sources. According to the present method, by combining page information automatic progressive acquisition and an AIGC content processing capability, the full-process automation of information acquisition and generation is achieved without manual summarization and refinement, and the time for responding to a network information generation request is shortened to a minute level, so that the timeliness and efficiency of network information generation are improved; information processing rules are configured in the form of the prompt template, network information extraction is carried out as required, and the quality of generating network information can be improved.

Description

Translated fromChinese
网络资讯生成方法、系统、电子设备、存储介质Network information generation method, system, electronic device, and storage medium

本公开要求于2023年10月27日提交中国专利局、申请号为202311408987.6、申请名称为“网络资讯生成方法、系统、电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims priority to a Chinese patent application filed with the Chinese Patent Office on October 27, 2023, with application number 202311408987.6 and application name “Network Information Generation Method, System, Electronic Device, Storage Medium”, all contents of which are incorporated by reference in this disclosure.

技术领域Technical Field

本公开涉及计算机技术领域,特别是涉及一种网络资讯生成方法、一种网络资讯生成系统、一种电子设备和一种存储介质。The present disclosure relates to the field of computer technology, and in particular to a network information generation method, a network information generation system, an electronic device and a storage medium.

背景技术Background Art

随着互联网技术的发展,网络应用领域增加,网络信息激增。用户很难从海量信息中提取到有价值的信息。以AI(Artificial Intelligence,人工智能)领域为例,AI领域的信息量激增,导致许多人难以跟上最新的业界进展。现有技术中,虽然有些工具可以根据用户查询,输出相应的网络资讯,但是,现有技术中的网络信息处理工具在快速跟踪垂直领域方面不够专精,需要花费用户大量精力进行多源搜索,然后对多源内容进行信息整合,总结提炼形成网络资讯。现有技术中生成网络资讯的方法,耗时耗力,并且输出不及时。With the development of Internet technology, the application fields of the Internet have increased, and the amount of Internet information has increased dramatically. It is difficult for users to extract valuable information from the massive amount of information. Taking the field of AI (Artificial Intelligence) as an example, the amount of information in the field of AI has increased dramatically, making it difficult for many people to keep up with the latest industry developments. In the prior art, although some tools can output corresponding Internet information based on user queries, the Internet information processing tools in the prior art are not specialized enough in quickly tracking vertical fields, and users need to spend a lot of energy on multi-source searches, and then integrate the multi-source content, summarize and refine it to form Internet information. The methods of generating Internet information in the prior art are time-consuming and labor-intensive, and the output is not timely.

综上,现有技术中的网络资讯生成方法还需要改进。In summary, the network information generation method in the prior art still needs to be improved.

发明内容Summary of the invention

本公开实施例提供了一种网络资讯生成方法,可以提升网络资讯生成效率和时效性。The disclosed embodiment provides a network information generation method, which can improve the efficiency and timeliness of network information generation.

相应的,本公开实施例还提供了一种网络资讯生成系统、一种电子设备和一种存储介质,用以保证上述方法的实现及应用。Correspondingly, the embodiments of the present disclosure also provide a network information generation system, an electronic device and a storage medium to ensure the implementation and application of the above method.

为了解决上述问题,本公开实施例公开了一种网络资讯生成方法,所述方法包括:In order to solve the above problems, the present disclosure discloses a method for generating network information, the method comprising:

分别采集预设的网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果;Collecting page information of preset network data sources respectively to obtain first information collection results corresponding to each of the network data sources;

对所述第一信息采集结果进行汇总处理,得到第二信息采集结果;Summarizing the first information collection results to obtain second information collection results;

基于预设的提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述预设的网络数据源对应的网络资讯。Based on the preset prompt template and the second information collection result, a preset large-scale language model is called to generate network information corresponding to the preset network data source.

本公开实施例还公开了一种网络资讯生成方法,所述方法包括:The present disclosure also discloses a method for generating network information, the method comprising:

响应于HTTP请求,获取所述HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板;In response to the HTTP request, obtaining a prompt template corresponding to a network data source and a preset information processing operation carried in the HTTP request;

分别采集所述网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果;Collecting page information of the network data sources respectively to obtain first information collection results corresponding to each of the network data sources;

对所述第一信息采集结果进行汇总处理,得到第二信息采集结果;Summarizing the first information collection results to obtain second information collection results;

基于所述提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述网络数据源对应的网络资讯;Based on the prompt template and the second information collection result, calling a preset large-scale language model to generate network information corresponding to the network data source;

针对所述HTTP请求,输出所述网络资讯。In response to the HTTP request, the network information is output.

本公开实施例还公开了一种网络资讯生成系统,所述系统包括:客户端和服务端,其中,The present disclosure also discloses a network information generation system, the system comprising: a client and a server, wherein:

所述客户端,用于获取用户配置的网络数据源和预设信息处理操作对应的提示模板;The client is used to obtain a user-configured network data source and a prompt template corresponding to a preset information processing operation;

所述服务端,用于分别采集所述网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果;以及,对所述第一信息采集结果进行汇总处理,得到第二信息采集结果;The server is used to collect page information of the network data sources respectively to obtain first information collection results corresponding to each of the network data sources; and to summarize the first information collection results to obtain second information collection results;

所述服务端,还用于基于所述提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述网络数据源对应的网络资讯;The server is further configured to call a preset large-scale language model based on the prompt template and the second information collection result to generate network information corresponding to the network data source;

所述服务端,还用于向所述用户对应的资讯展示接口推送所述网络资讯。The server is also used to push the network information to the information display interface corresponding to the user.

本公开实施例还公开了一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;所述存储器存储计算机执行指令;所述处理器执行所述存储器存储的计算机执行指令,以实现如本公开实施例所述的方法。The present disclosure also discloses an electronic device, comprising: a processor, and a memory communicatively connected to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method described in the present disclosure.

本公开实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如本公开实施例所述的方法。The embodiment of the present disclosure further discloses a computer-readable storage medium, in which computer-executable instructions are stored. When the computer-executable instructions are executed by a processor, they are used to implement the method described in the embodiment of the present disclosure.

本公开实施例还公开了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如本公开实施例所述的方法。The embodiments of the present disclosure also disclose a computer program product, including a computer program, which implements the method described in the embodiments of the present disclosure when executed by a processor.

与现有技术相比,本公开实施例包括以下优点:Compared with the prior art, the embodiments of the present disclosure have the following advantages:

结合页面信息自动递进采集和AIGC(Artificial Intelligence Generated Content,生成式人工智能)内容处理能力,实现信息采集和资讯生成的全流程自动化,不再需要人工总结提炼,响应网络资讯生成需求的时间缩减到分钟级别,极大提升了网络资讯生成时效性;支持用户配置多种网络数据源,生成的网络资讯更加全面;支持用户以提示模板的方式配置信息处理规则,例如配置信息过滤规则、信息提取规则、总结规则、格式转换规则等,按需进行网络信息提取。另一方面,通过以提示模板的方式配置信息处理规则,如配置信息过滤规则、去重规则等,进一步提升生成网络资讯的质量。Combining the automatic progressive collection of page information and AIGC (Artificial Intelligence Generated Content) content processing capabilities, the entire process of information collection and information generation is automated, and manual summarization and refinement are no longer required. The time to respond to network information generation needs is reduced to minutes, greatly improving the timeliness of network information generation; it supports users to configure a variety of network data sources, and the generated network information is more comprehensive; it supports users to configure information processing rules in the form of prompt templates, such as configuring information filtering rules, information extraction rules, summary rules, format conversion rules, etc., to extract network information on demand. On the other hand, by configuring information processing rules in the form of prompt templates, such as configuring information filtering rules, deduplication rules, etc., the quality of generated network information is further improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本公开公开的一种网络资讯生成方法的一个实施例的步骤流程图;FIG1 is a flowchart of a method for generating network information according to an embodiment of the present disclosure;

图2是本公开公开的网络资讯生成方法的一个实施架构示意图;FIG2 is a schematic diagram of an implementation architecture of the network information generation method disclosed in the present disclosure;

图3是本公开公开的一种网络资讯生成方法生成的网络资讯展示效果示意图;FIG3 is a schematic diagram of a network information display effect generated by a network information generation method disclosed in the present disclosure;

图4是本公开公开的一种网络资讯生成方法另一个实施例的步骤流程图;FIG4 is a flowchart of another embodiment of a method for generating network information disclosed in the present disclosure;

图5是本公开公开的一种网络资讯生成系统的结构示意图;FIG5 is a schematic diagram of the structure of a network information generation system disclosed in the present disclosure;

图6是本公开一个实施例提供的示例性装置的结构示意图。FIG. 6 is a schematic diagram of the structure of an exemplary device provided by an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

为使本公开的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本公开作进一步详细的说明。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and understandable, the present disclosure is further described in detail below with reference to the accompanying drawings and specific implementation methods.

随着网络信息量激增,导致许多人难以跟上最新的业界进展。现有技术中虽然出现了一些工具,可以搜索行业或通用信息,但在快速跟踪垂直领域方面不够专精,需要花费用户大量精力进行多源搜索,然后,对多源内容进行信息整合,总结提炼形成网络资讯,耗时耗力还缺乏及时性。本公开旨在解决某一专精领域“信息过载”的问题,为用户提供专精、全面、及时的专业领域信息。例如,对于人工智能领域,通过自动化处理该领域内的多种渠道(如学术论文研究、工具、应用、博客、论坛等信息渠道)内最新信息,并总结提炼,输出为网络资讯。为了提升网络资讯获取的时效性和效率,本公开实施例中提出的网络资讯生成方法将页面信息自动递进采集技术与AIGC(Artificial Intelligence Generated Content,生成式人工智能)的归纳总结能力相结合,实现专业领域最新的理论研究、工具、应用、新闻的信息内容进行处理、排序、整合,生成网络资讯。With the surge in the amount of information on the Internet, many people find it difficult to keep up with the latest industry developments. Although some tools have emerged in the prior art that can search for industry or general information, they are not specialized enough in quickly tracking vertical fields. Users need to spend a lot of energy on multi-source searches, and then integrate the multi-source content, summarize and refine it to form network information, which is time-consuming and labor-intensive and lacks timeliness. The present disclosure aims to solve the problem of "information overload" in a certain specialized field and provide users with specialized, comprehensive and timely professional field information. For example, in the field of artificial intelligence, the latest information in multiple channels in the field (such as academic paper research, tools, applications, blogs, forums and other information channels) is automatically processed, summarized and refined, and output as network information. In order to improve the timeliness and efficiency of network information acquisition, the network information generation method proposed in the embodiment of the present disclosure combines the automatic progressive acquisition technology of page information with the inductive and summarizing ability of AIGC (Artificial Intelligence Generated Content) to realize the processing, sorting and integration of the information content of the latest theoretical research, tools, applications and news in the professional field to generate network information.

本公开实施例中公开的网络资讯生成方法,可以基于如图2所示的软件架构实现。其中,网络数据源和提示模板库可以根据网络资讯需求动态配置,本公开实施例中对网络数据源的数量、类型不做限制,对提示模板库的数量和格式、内容不做限制。工具集和AIGC模型集可以灵活集汰,具备较强的可扩展性、兼容性。主程序基于上述输入和环境实现本公开公开的一种网络资讯生成方法。采用图2中所示的软件架构,实现本方法,可以实现本方法不局限于某特定信息领域,完全具备多信息领域迁移复用能力,能以极低成本复用至其他专精领域。本方法首次将页面信息自动递进采集技术和AIGC能力进行有机结合,在内容生成上实现了全自动化、时效性强、内容质量高。The network information generation method disclosed in the embodiment of the present disclosure can be implemented based on the software architecture shown in Figure 2. Among them, the network data source and the prompt template library can be dynamically configured according to the network information demand. In the embodiment of the present disclosure, there is no restriction on the number and type of network data sources, and there is no restriction on the number, format and content of the prompt template library. The tool set and the AIGC model set can be flexibly integrated and eliminated, and have strong scalability and compatibility. The main program implements a network information generation method disclosed in the present disclosure based on the above input and environment. Using the software architecture shown in Figure 2 to implement this method, it can be realized that this method is not limited to a specific information field, and it is fully capable of migration and reuse in multiple information fields, and can be reused in other specialized fields at extremely low cost. This method organically combines the automatic progressive acquisition technology of page information and AIGC capabilities for the first time, and realizes full automation, strong timeliness and high content quality in content generation.

本公开实施例中公开的网络资讯生成方法,可以有效避免网络信息过载,提升网络信息获取的时效性和网络资讯生成效率。下面进一步对本公开实施例公开的网络资讯生成方法的具体实施方式进行说明。The network information generation method disclosed in the embodiment of the present disclosure can effectively avoid network information overload, improve the timeliness of network information acquisition and network information generation efficiency. The specific implementation of the network information generation method disclosed in the embodiment of the present disclosure is further described below.

参照图1,一个可选的实施例中,本公开公开的一种网络资讯生成方法,方法包括:步骤102至步骤106。1 , in an optional embodiment, the present disclosure discloses a method for generating network information, the method comprising: steps 102 to 106 .

下面分别介绍各步骤的具体实施方式。The specific implementation methods of each step are introduced below.

步骤102,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果。Step 102: collect page information of preset network data sources respectively to obtain first information collection results corresponding to each network data source.

本公开实施例中所提到的预设网络数据源,可以根据需要进行配置。The preset network data source mentioned in the embodiments of the present disclosure can be configured as needed.

在一些可选的实施例中,预设网络数据源包括但不限于:理论研究性网站、微博、论坛,应用或工具、社交平台等的页面。可选的,预设网络数据源可以由用户通过客户端界面或者配置文件进行配置。例如,可以通过对目标领域进行调研并人工整理的待获取信息的网络数据源的列表。具体举例而言,可以首先确定需要高效率获取的信息的维度,例如包括:AI(Artificial Intelligence,人工智能)资讯、AI学术论文、AI工具应用这三个信息类别,然后,通过主流和权威的信息获取渠道(如主流搜索引擎、热门公众号和技术论坛等)搜索上述类别的信息,并选择搜索结果中内容更新频次高、信息全面的热门网站、应用、工具、微博、论坛等数据源,作为预设网络数据源。In some optional embodiments, the preset network data source includes, but is not limited to: pages of theoretical research websites, microblogs, forums, applications or tools, social platforms, etc. Optionally, the preset network data source can be configured by the user through a client interface or a configuration file. For example, a list of network data sources for information to be acquired that has been manually sorted and investigated in the target field can be obtained. For example, the dimensions of information that need to be acquired efficiently can be first determined, such as the three information categories of AI (Artificial Intelligence) information, AI academic papers, and AI tool applications. Then, the above categories of information are searched through mainstream and authoritative information acquisition channels (such as mainstream search engines, popular public accounts, and technical forums, etc.), and popular websites, applications, tools, microblogs, forums, and other data sources with high content update frequency and comprehensive information in the search results are selected as preset network data sources.

本公开的实施例中,首先对每个数据源都会分别进行数据采集、信息提取,得到一个信息内容的列表。In the embodiment of the present disclosure, data collection and information extraction are first performed on each data source to obtain a list of information contents.

本公开的实施例中,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果,包括:采用页面信息自动递进采集技术,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果。In an embodiment of the present disclosure, page information of preset network data sources is collected respectively to obtain first information collection results corresponding to each network data source, including: using page information automatic progressive collection technology to collect page information of preset network data sources respectively to obtain first information collection results corresponding to each network data source.

页面信息自动递进采集技术是按照一定的规则,自动地、逐层全面采集万维网页面信息的网络技术,可以实现为信息采集程序或者脚本。例如,页面信息自动递进采集技术可以按照目标页面的页面结构,逐条采集目标页面中的信息,并通过目标页面上的入口自动进入下一级页面,进行页面信息采集。现有技术中,常用的网络页面信息自动递进采集技术可以基于HTTP client(HyperText Transfer Protocol,协议的客户端)技术实现,或者,基于无头浏览器技术实现。The automatic progressive collection technology of page information is a network technology that automatically and comprehensively collects World Wide Web page information layer by layer according to certain rules, and can be implemented as an information collection program or script. For example, the automatic progressive collection technology of page information can collect information in the target page one by one according to the page structure of the target page, and automatically enter the next level page through the entrance on the target page to collect page information. In the prior art, the commonly used automatic progressive collection technology of network page information can be implemented based on HTTP client (HyperText Transfer Protocol, client of the protocol) technology, or based on headless browser technology.

HTTP(Hypertext Transfer Protocol,超文本传输协议)协议是万维网上使用得最多、最重要的协议之一,越来越多的应用程序需要直接通过HTTP协议来访问网络资源。HttpClient客户端工具包实现了所有HTTP协议的方法,发送请求、接收响应数据。采用HttpClient发起HTTP请求,进行指定目标网页的页面信息采集,其优势是系统开销小,采集效率高,缺点是具有适用页面局限性。例如,有一些页面内容需要JS(JavaScript应用)异步加载,有些页面内容有加密处理等,这种情况下,采用HttpClient客户端工具包无法采集到页面信息。HTTP (Hypertext Transfer Protocol) is one of the most widely used and important protocols on the World Wide Web. More and more applications need to access network resources directly through the HTTP protocol. The HttpClient client toolkit implements all HTTP protocol methods, sending requests and receiving response data. Using HttpClient to initiate HTTP requests and collect page information of the specified target web page has the advantages of low system overhead and high collection efficiency, but the disadvantage is that it has limitations on applicable pages. For example, some page content requires JS (JavaScript application) asynchronous loading, and some page content is encrypted. In this case, the HttpClient client toolkit cannot collect page information.

无头浏览器技术能模拟人工在有头浏览器中的各种操作,操作灵活,适用页面广泛,信息获取成功率高,但是系统开销大,采集效率相对偏低。Headless browser technology can simulate various manual operations in a headed browser. It is flexible in operation, widely applicable to pages, and has a high success rate in information acquisition. However, it has high system overhead and relatively low collection efficiency.

本公开的实施例中,结合HTTP client技术和无头浏览器技术采集各预设网络数据源的每个页面的页面信息,用于提取信息内容。In the embodiments of the present disclosure, HTTP client technology and headless browser technology are combined to collect page information of each page of each preset network data source for extracting information content.

在一些可选的实施例中,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果,包括:采用HTTP协议客户端工具包,采集预设的各网络数据源的各页面的页面信息;响应于页面中目标页面的页面信息采集失败,采用无头浏览器技术采集目标页面的页面信息;针对各网络数据源,对从网络数据源的页面采集的页面信息进行解析和提取,得到网络数据源对应的第一信息采集结果。其中,第一信息采集结果中包括相应网络数据源的各页面的一条或多条信息内容。In some optional embodiments, the page information of the preset network data sources is collected respectively to obtain the first information collection result corresponding to each network data source, including: using the HTTP protocol client toolkit to collect the page information of each page of each preset network data source; in response to the failure of collecting the page information of the target page in the page, using the headless browser technology to collect the page information of the target page; for each network data source, collecting the page information from the page of the network data source Parsing and extraction are performed to obtain a first information collection result corresponding to the network data source, wherein the first information collection result includes one or more pieces of information content of each page of the corresponding network data source.

例如,对于某个预设网络数据源,优先选择采用HTTP协议客户端工具包自动递进采集该预设网络数据源的各页面中的页面信息,若有采用HTTP协议客户端工具包采集不到页面信息的页面(例如,采用HTTP协议客户端工具包请求页面时,无法获取页面的响应数据),进一步采用无头浏览器技术自动递进采集该页面的页面信息,从而补全该预设网络数据源的页面信息。For example, for a preset network data source, it is preferred to use the HTTP protocol client toolkit to automatically and progressively collect page information from each page of the preset network data source. If there is a page whose page information cannot be collected using the HTTP protocol client toolkit (for example, when the HTTP protocol client toolkit is used to request a page, the response data of the page cannot be obtained), the headless browser technology is further used to automatically and progressively collect the page information of the page, thereby completing the page information of the preset network data source.

在一些可选的实施例中,页面信息,包括:相应页面的前N屏的页面信息,其中,N为大于或等于3的自然数。例如,当N等于3时,每次采集页面的前3屏页面信息,以获取最新资讯,同时避免采集过多的数据冗余。In some optional embodiments, the page information includes: page information of the first N screens of the corresponding page, where N is a natural number greater than or equal to 3. For example, when N is equal to 3, the page information of the first 3 screens of the page is collected each time to obtain the latest information while avoiding the collection of excessive data redundancy.

具体举例而言,以预设网络数据源为网站A举例,首先使用HTTP Client库中的方法发送HTTP请求到网站A的网络地址,并启动接收网站A的服务器响应数据。在接收到网站A的服务器响应数据之后,便可以通过解析网站A的服务器响应数据,根据需求提取所需的数据,进行数据的处理和保存。如果接收不到网站A的服务器响应数据,则使用无头浏览器技术,打开网站A的网络地址,并等待页面加载完全,以确保所有的动态内容都已加载;之后,获取页面中的数据,通过解析网站A的页面信息,根据需求提取所需的数据,进行数据的处理和保存。For example, taking the preset network data source as website A, first use the method in the HTTP Client library to send an HTTP request to the network address of website A, and start receiving the server response data of website A. After receiving the server response data of website A, you can parse the server response data of website A, extract the required data according to the needs, and process and save the data. If the server response data of website A cannot be received, use the headless browser technology to open the network address of website A and wait for the page to load completely to ensure that all dynamic content has been loaded; then, get the data in the page, parse the page information of website A, extract the required data according to the needs, and process and save the data.

采用HTTP协议客户端工具包,自动递进采集各预设网络数据源的各页面的页面信息的具体实施方式,以及,采用无头浏览器技术自动递进采集目标页面的页面信息的具体实施方式,参见现有技术,本公开实施例中不再赘述。For specific implementation methods of using the HTTP protocol client toolkit to automatically and progressively collect page information of each page of each preset network data source, and for specific implementation methods of using headless browser technology to automatically and progressively collect page information of the target page, please refer to the prior art and will not be repeated in the embodiments of the present disclosure.

通过首先采用采用HTTP协议客户端工具包,采集各预设网络数据源的各页面的页面信息,对于采集失败的页面,再采用无头浏览器技术采集目标页面的页面信息,既可以保障数据采集效率和减小系统开销,又可以保障数去采集的全面性。By first using the HTTP protocol client toolkit to collect page information of each page of each preset network data source, and then using headless browser technology to collect page information of the target page for pages that failed to be collected, we can not only ensure data collection efficiency and reduce system overhead, but also ensure the comprehensiveness of data collection.

本公开的一些实施例中,可以通过设定适当的采集深度和验证数据的完整性,进一步保障采集数据的全面性。例如,可以根据数据采集需求和目标网站的页面结构,设定合适的采集深度,如设置分页数=3、驱动浏览器下拉2次获取瀑布流数据等。通过获取最新多屏数据,确保获取所有的近期发布信息。又例如,对于采集到的数据,可以先经过验证脚本进行信息完整度校验,包括但不限于:检查数据的格式、必备字段(标题、发布时间)是否为空,确保数据的完整性。若页面信息不完整,则启动重试策略,并可以设置最大重试次数。In some embodiments of the present disclosure, the comprehensiveness of the collected data can be further guaranteed by setting an appropriate collection depth and verifying the integrity of the data. For example, according to the data collection requirements and the page structure of the target website, an appropriate collection depth can be set, such as setting the number of paging = 3, driving the browser to pull down 2 times to obtain waterfall data, etc. By obtaining the latest multi-screen data, ensure that all recently released information is obtained. For another example, for the collected data, the information integrity can be verified by a verification script first, including but not limited to: checking the format of the data, whether the required fields (title, release time) are empty, to ensure the integrity of the data. If the page information is incomplete, the retry strategy is initiated, and the maximum number of retries can be set.

采集到各预设网络数据源的每个页面的页面信息之后,针对每个预设网络数据源,进一步对从该预设网络数据源的各页面采集的数据进行解析和信息提取,得到该预设网络数据源的信息内容的列表,作为第一信息采集结果。After collecting the page information of each page of each preset network data source, for each preset network data source, further parse and extract information from the data collected from each page of the preset network data source to obtain a list of information contents of the preset network data source as the first information collection result.

在一些可选的实施例中,针对各网络数据源,对从网络数据源的页面采集的页面信息进行解析和提取,得到网络数据源对应的第一信息采集结果,包括:针对各网络数据源,按照与网络数据源对应的页面数据解析规则对从网络数据源的页面采集的页面信息进行解析和提取,得到网络数据源对应的第一信息采集结果。In some optional embodiments, for each network data source, the page information collected from the page of the network data source is parsed and extracted to obtain the first information collection result corresponding to the network data source, including: for each network data source, The page information collected from the page of the network data source is parsed and extracted according to the page data parsing rule corresponding to the network data source to obtain the first information collection result corresponding to the network data source.

例如,可以预先针对每种网络数据源,配置用于解析该网络数据源的页面信息的页面信息提取技术,并配置针对每种网络数据源的页面数据解析规则。其中,页面数据解析规则包括但不限于:解析方式、采用规则表达式描述的解析策略。For example, a page information extraction technology for parsing the page information of each network data source can be configured in advance, and a page data parsing rule for each network data source can be configured, wherein the page data parsing rule includes but is not limited to: a parsing method and a parsing strategy described by a regular expression.

在一些可选的实施例中,可以采用以下一种或多种页面信息提取技术,对各页面的页面信息进行解析:Xpath(XML Path Language,XML路径语言)、Jsoup(Java HTML Parse,Java HTML解析器)、Regex(Regular Expression,正则表达式)、JSONPath(JSON Path Language,JSON路径语言)、XML(extensible Markup Language,可扩展标记语言)等。In some optional embodiments, one or more of the following page information extraction technologies may be used to parse the page information of each page: Xpath (XML Path Language), Jsoup (Java HTML Parse), Regex (Regular Expression), JSONPath (JSON Path Language), XML (extensible Markup Language), etc.

Xpath是一门在HTML(HyperText Markup Language,超文本标记语言)文档中查找信息的语言。在HTML文档中对元素和属性进行遍历,节点是沿着页面层级结构路径来选取的。Xpath is a language for finding information in HTML (HyperText Markup Language) documents. Elements and attributes are traversed in HTML documents, and nodes are selected along the path of the page hierarchy.

Jsoup是一款优秀的HTML解析器,可以灵活高效的解析HTML文档内容,且对非标准HTML文档容错能力较强。Jsoup is an excellent HTML parser that can parse HTML document content flexibly and efficiently, and has strong tolerance for non-standard HTML documents.

Regex(正则表达式)是一种强大的文本匹配工具,可以用来从字符串中提取特定模式的数据。Regex (regular expression) is a powerful text matching tool that can be used to extract data with specific patterns from strings.

JSONPath是一种用于从JSON数据中查询和提取特定数据的语言JSONPath is a language for querying and extracting specific data from JSON data.

XML是一种用于描述和传输数据的标记语言,可以使用XPath来查询和提取XML文档中的数据。XML is a markup language used to describe and transmit data. You can use XPath to query and extract data from XML documents.

在一些可选的实施例中,可以预先采用上述任意一种技术实现的页面信息解析接口,并通过调用页面信息解析接口,对每个页面的数据分别进行解析。In some optional embodiments, a page information parsing interface implemented by any of the above-mentioned technologies may be used in advance, and the data of each page may be parsed separately by calling the page information parsing interface.

在一些可选的实施例中,对从网络数据源的页面采集的页面信息进行解析和提取,得到网络数据源对应的第一信息采集结果,包括:对从预设网络数据源的页面采集的页面信息进行解析和提取,得到每个文章页面对应的包括预设信息内容的至少一条信息内容;将信息内容的列表,作为网络数据源的对应的第一信息采集结果。其中,预设信息内容包括以下一个或多个字段内容:文章标题、发布时间、文章链接地址(即URL,Uniform Resource Locato,统一资源定位符)、作者、文本内容、主图URL,以及个性化属性。In some optional embodiments, parsing and extracting the page information collected from the page of the network data source to obtain the first information collection result corresponding to the network data source includes: parsing and extracting the page information collected from the page of the preset network data source to obtain at least one information content corresponding to each article page including the preset information content; using the list of information content as the first information collection result corresponding to the network data source. The preset information content includes one or more of the following field contents: article title, publishing time, article link address (i.e., URL, Uniform Resource Locato), author, text content, main image URL, and personalized attributes.

其中,个性化属性表示不同类型网页特有的非通用属性,可以用于衡量信息质量和/或热度。不同类型页面的个性化属性可以根据专家经验进行配置。例如,在文章信息提取过程中,发现不同的新闻网站、工具网站、论文网站等网站有非通用的,各网站个性化的属性。例如,对于新闻网站,文章页面的评论数、浏览量可以用于衡量该页面内文章内容质量和/或热度,则文章页面的评论数、浏览量可以作为新闻网站的文章页面的个性化属性;而对于工具网站,页面内工具的下载量、收藏量可以用于衡量该工具的内容质量和热度,则页面内工具的下载量、收藏量可以作为该工具页面的各性化属性;又例如,对于论文网站,文章页面的检索量、被引用量则可以用于衡量该论文的内容质量和热度,作为论文网站的文章页面的个性化属性。本公开的实施例中,通过解析出上述文章页面的个性化属性,并记录在解析结果中,用于后续在对解析结果汇总排序时作为参考依据。Among them, personalized attributes refer to non-universal attributes that are unique to different types of web pages, and can be used to measure information quality and/or popularity. The personalized attributes of different types of pages can be configured based on expert experience. For example, in the process of extracting article information, it is found that different news websites, tool websites, paper websites and other websites have non-universal, personalized attributes for each website. For example, for news websites, the number of comments and page views on article pages can be used to measure the quality and/or popularity of the content of the article on the page, and the number of comments and page views on the article page can be used as personalized attributes of the article page on the news website; and for tool websites, the number of downloads and collections of the tools on the page can be used to measure the content quality and popularity of the tool, and the number of downloads and collections of the tools on the page can be used as personalized attributes of the tool page; for another example, for paper websites, the number of searches and citations on the article page can be used to measure the content quality and popularity of the paper, and as a measure of the paper's popularity. In the embodiment of the present disclosure, the personalized attributes of the article page are parsed and recorded in the parsing result, which is used as a reference for summarizing and sorting the parsing results later.

通过执行本步骤,可以得到每个预设网络数据源的信息内容的一个列表,该列表中包括一个或多个文章页面的信息内容。每条信息内容又包括多个字段的内容。By executing this step, a list of information contents of each preset network data source can be obtained, and the list includes information contents of one or more article pages. Each piece of information content includes contents of multiple fields.

步骤104,对第一信息采集结果进行汇总处理,得到第二信息采集结果。Step 104, summarizing the first information collection result to obtain a second information collection result.

接下来,汇总所有预设的网络数据源对应的第一信息采集结果,得到包括从预设的所有网络数据源的页面采集的信息内容的第二信息采集结果。Next, the first information collection results corresponding to all preset network data sources are summarized to obtain a second information collection result including information contents collected from pages of all preset network data sources.

在一些可选的实施例中,对第一信息采集结果进行汇总处理,得到第二信息采集结果,包括:基于预设历史时间内采集的预设的网络数据源的信息内容,对第一信息采集结果中的信息内容进行增量去重汇总处理,得到第二信息采集结果。In some optional embodiments, the first information collection result is summarized to obtain a second information collection result, including: based on the information content of a preset network data source collected within a preset historical time, incremental deduplication and aggregation processing is performed on the information content in the first information collection result to obtain the second information collection result.

本公开的实施例中,采用了增量采集网络数据源数据的方式,以减小系统资源消耗。即保留一段时间之内的已经采集的网络数据源的信息内容,并将新采集的网络数据源的信息内容与保留的信息内容进行汇总,以丰富采集的数据数据源的信息内容。同时,可以采用先进先出的方式定期更新已经采集的网络数据源的信息内容。例如,始终保持采集的最近一段时间(如近30天)的信息内容,对该时间段之前的数据予以清除,以节约存储资源。又例如,可以设置保存数据的大小阈值,按照时间由近到远的顺序,优先保存近期采集的数据。In the embodiments of the present disclosure, an incremental method of collecting data from a network data source is adopted to reduce system resource consumption. That is, the information content of the network data source that has been collected within a period of time is retained, and the information content of the newly collected network data source is summarized with the retained information content to enrich the information content of the collected data source. At the same time, the information content of the collected network data source can be regularly updated in a first-in-first-out manner. For example, the information content collected in the most recent period of time (such as the past 30 days) is always kept, and the data before this time period is cleared to save storage resources. For another example, a size threshold for saving data can be set, and recently collected data is saved first in order from recent to far in time.

在对从各网络数据源的页面采集的信息内容进行汇总时,还需要对重复采集的信息内容进行去重处理,即对与在不同时间或者从不同页面采集的重复信息内容进行过滤,对于重复采集的信息内容,仅保留一次采集结果,以减少后续步骤的数据处理量,同时避免输出相同的网络资讯。When summarizing the information content collected from the pages of various network data sources, it is also necessary to deduplicate the repeatedly collected information content, that is, to filter the repeated information content collected at different times or from different pages. For the repeatedly collected information content, only one collection result is retained to reduce the amount of data processing in subsequent steps and avoid outputting the same network information.

在一些可选的实施例中,基于预设历史时间内采集的预设的网络数据源的信息内容,对第一信息采集结果中的信息内容进行增量去重汇总处理,得到第二信息采集结果,包括:获取网络数据源的历史采集结果,历史采集结果是对指定历史时间内采集的网络数据源的信息内容进行汇总得到的;基于历史采集结果中的信息内容包括的文章标题和/或文章链接地址,对第一信息采集结果中重复采集的信息内容进行去重处理,得到增量信息内容;将增量信息内容和历史采集结果中的信息内容,作为预设的网络数据源的信息内容的第二信息采集结果。其中,指定历史时间为当前时间之前的时间。In some optional embodiments, based on the information content of the preset network data source collected within the preset historical time, the information content in the first information collection result is incrementally deduplicated and summarized to obtain the second information collection result, including: obtaining the historical collection result of the network data source, the historical collection result is obtained by summarizing the information content of the network data source collected within the specified historical time; based on the article title and/or article link address included in the information content of the historical collection result, deduplicated the information content repeatedly collected in the first information collection result to obtain incremental information content; using the incremental information content and the information content in the historical collection result as the second information collection result of the information content of the preset network data source. Wherein, the specified historical time is the time before the current time.

例如,以历史采集的各网络数据源的信息内容为准,若根据文章标题和文章链接地址确定某条新采集的信息内容与历史采集的信息内容重复,则丢弃该条新采集的信息内容;如果根据文章标题和文章链接地址确定某条新采集的信息内容与历史采集的信息内容不重复,并且该条新采集的信息内容与其他新采集的内容不重复,则将该条新采集的信息内容作为一条增量信息内容;如果根据文章标题和文章链接地址确定多条新采集的信息内容重复,且与历史采集的信息内容不重复,则从该多条新采集的信息内容中选择一条作为一条增量信息内容,丢弃另外几条新采集的信息内容。之后,将得到的所有增量信息内容和历史采集的各网络数据源的信息内容,共同作为第二信息采集结果。For example, based on the information content of each network data source collected historically, if it is determined based on the article title and the article link address that a certain piece of newly collected information content is repeated with the information content collected historically, then the newly collected information content is discarded; if it is determined based on the article title and the article link address that a certain piece of newly collected information content is not repeated with the information content collected historically, and the newly collected information content is not repeated with other newly collected content, then the newly collected information content is regarded as an incremental information content; if it is determined based on the article title and the article link address that multiple pieces of newly collected information content are repeated and are not repeated with the information content collected historically, then one piece of the multiple pieces of newly collected information content is selected as one The incremental information contents are collected, and the other newly collected information contents are discarded. After that, all the obtained incremental information contents and the information contents of each network data source collected historically are taken as the second information collection result.

在一些可选的实施例中,基于历史采集结果中的信息内容包括的文章标题和/或文章链接地址,对第一信息采集结果中重复采集的信息内容进行去重处理,得到增量信息内容,包括:将历史采集结果中的信息内容加载入缓存;依次将第一信息采集结果中的信息内容作为当前条信息内容,执行以下比较操作:将当前条信息内容的文章标题和/或文章链接地址,与缓存中各条信息内容的文章标题和/或文章链接地址进行对应比较;响应于当前条信息内容与缓存中任意一条信息内容的文章标题和文章链接地址满足第一预设,则继续遍历第一信息采集结果中未比较的信息内容;响应于当前条信息内容与缓存中信息内容的文章标题和文章链接地址满足第二预设,则将当前条信息内容标记为增量信息内容,并将当前条信息内容增量写入缓存,继续遍历第一信息采集结果中未比较的信息内容。In some optional embodiments, based on the article titles and/or article link addresses included in the information content in the historical collection results, the information content repeatedly collected in the first information collection results is deduplicated to obtain incremental information content, including: loading the information content in the historical collection results into the cache; taking the information content in the first information collection results as the current information content in turn, and performing the following comparison operations: comparing the article title and/or article link address of the current information content with the article titles and/or article link addresses of each information content in the cache; in response to the current information content and the article title and article link address of any information content in the cache satisfying a first preset, continuing to traverse the information content that has not been compared in the first information collection results; in response to the current information content and the article title and article link address of the information content in the cache satisfying a second preset, marking the current information content as incremental information content, and writing the current information content incrementally into the cache, and continuing to traverse the information content that has not been compared in the first information collection results.

在一些可选的实施例中,第一预设包括:当前条信息内容的文章标题与缓存中任意一条信息内容的文章标题相同,或者,当前条信息内容的文章链接地址与缓存中任意一条信息内容的文章链接地址相同;第二预设包括:当前条信息内容的文章标题与缓存中任意一条信息内容的文章标题不同,且当前条信息内容的文章链接地址与缓存中任意一条信息内容的文章链接地址不同。In some optional embodiments, the first preset includes: the article title of the current information content is the same as the article title of any information content in the cache, or the article link address of the current information content is the same as the article link address of any information content in the cache; the second preset includes: the article title of the current information content is different from the article title of any information content in the cache, and the article link address of the current information content is different from the article link address of any information content in the cache.

例如,首先将历史采集结果中的信息内容加载入缓存,之后,从第一信息采集结果中选择一条信息内容作为当前信息内容,然后,将当前条信息内容与缓存中的每条信息内容依次进行文章标题和文章连接地址比较,确定当前信息内容是否与缓存中的信息内容重复。如果重复,则继续比较第一信息采集结果中的下一条未进行比较的信息内容;如果不重复,则将当前条信息内容标记为增量信息内容,并将当前条信息添加至缓存,之后,从第一信息采集结果中选择一条未进行比较的信息内容作为当前信息内容,继续进行与缓存中信息内容的比较,直至第一信息采集结果中所有信息内容均比较完成。For example, first load the information content in the historical collection results into the cache, then select an information content from the first information collection result as the current information content, then compare the current information content with each information content in the cache in terms of article title and article connection address, and determine whether the current information content is repeated with the information content in the cache. If repeated, continue to compare the next information content in the first information collection result that has not been compared; if not repeated, mark the current information content as incremental information content, and add the current information to the cache, then select an information content from the first information collection result that has not been compared as the current information content, and continue to compare with the information content in the cache until all information content in the first information collection result has been compared.

在对当前信息内容与缓存中的信息内容进行比较时,可以根据当前条信息内容的文章标题与缓存中信息内容的文章标题的比较结果和/或当前条信息内容的文章链接地址与缓存中信息内容的文章链接地址的比较结果,确定当前信息内容与缓存中的信息内容是否重复。例如,可在在当前条信息内容的文章标题与缓存中任意一条信息内容的文章标题相同,或者,当前条信息内容的文章链接地址与缓存中任意一条信息内容的文章链接地址相同时,认为当前信息内容与缓存中的信息内容重复;反之,则认为当前信息内容与缓存中的信息内容不重复。When comparing the current information content with the information content in the cache, it can be determined whether the current information content and the information content in the cache are repeated based on the comparison result of the article title of the current information content and the article title of the information content in the cache and/or the comparison result of the article link address of the current information content and the article link address of the information content in the cache. For example, when the article title of the current information content is the same as the article title of any information content in the cache, or the article link address of the current information content is the same as the article link address of any information content in the cache, it is considered that the current information content and the information content in the cache are repeated; otherwise, it is considered that the current information content and the information content in the cache are not repeated.

本公开的一些实施例中,可以将历史采集的网络数据源的信息内容以文章标题和文章链接地址作为索引,存储在Redis(Remote Dictionary Server,远程字典服务器)缓存,用于进行比较。In some embodiments of the present disclosure, the information content of historically collected network data sources can be stored in a Redis (Remote Dictionary Server) cache using article titles and article link addresses as indexes for comparison.

步骤106,基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯。Step 106, based on the preset prompt template and the second information collection result, call the preset large-scale language model to generate network information corresponding to the preset network data source.

接下来,可以基于AIGC技术,基于从多个网络数据源采集的信息内容,自动生成网络资讯。Next, based on AIGC technology, network information can be automatically generated based on the information content collected from multiple network data sources.

在一些可选的实施例中,可以预先设置基于第二信息采集结果中的信息内容,生成网络资讯需要执行的预设信息处理操作,并配置执行各预设信息处理操作的执行顺序。相应的,设置相应的提示模板,提示模板用于描述生成调用预设大规模语言模型依次执行上述预设信息处理操作时输入的提示词的格式和提示词内容。其中,提示词内容包括但不限于:预设信息处理操作的执行顺序、各预设信息处理操作对应的信息处理规则。例如,提示模板可以配置为包括如下内容:模板名称=”通用提示词.第一信息处理操作提示词.第二信息处理操作提示词.输入内容来源提示词.输出内容格式提示词”。其中,输入内容来源提示词通过占位符表示。In some optional embodiments, it is possible to pre-set the preset information processing operations that need to be performed to generate network information based on the information content in the second information collection result, and configure the execution order of each preset information processing operation. Accordingly, a corresponding prompt template is set, and the prompt template is used to describe the format and prompt word content of the prompt word input when generating and calling the preset large-scale language model to perform the above-mentioned preset information processing operations in sequence. Among them, the prompt word content includes but is not limited to: the execution order of the preset information processing operations, and the information processing rules corresponding to each preset information processing operation. For example, the prompt template can be configured to include the following content: Template name = "General prompt word. First information processing operation prompt word. Second information processing operation prompt word. Input content source prompt word. Output content format prompt word". Among them, the input content source prompt word is represented by a placeholder.

在需要生成网络资讯时,以第二信息采集结果替换输入内容来源提示词通过占位符,以格式化提示模板,生成提示词,之后,以生成的提示词调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯。When it is necessary to generate network information, the input content source prompt word is replaced by the second information collection result through the placeholder to format the prompt template and generate the prompt word. Then, the preset large-scale language model is called with the generated prompt word to generate network information corresponding to the preset network data source.

在一些可选的实施例中,可以预先设置基于第二信息采集结果中的信息内容,生成网络资讯需要执行的预设信息处理操作,并配置执行各预设信息处理操作的执行顺序。相应的,为执行每个预设信息处理操作设置相应的提示模板,提示模板用于描述生成调用预设大规模语言模型执行相应预设信息处理操作时输入的提示词的格式和提示词内容。其中,提示词内容包括但不限于:信息处理规则。例如,提示词内容包括:相应信息处理操作对应的通用提示词、信息处理规则提示词、输入内容格式提示词、生成内容格式提示词等。In some optional embodiments, it is possible to pre-set the preset information processing operations that need to be performed to generate network information based on the information content in the second information collection result, and configure the execution order of each preset information processing operation. Accordingly, a corresponding prompt template is set for executing each preset information processing operation, and the prompt template is used to describe the format and prompt word content of the prompt word input when generating and calling the preset large-scale language model to perform the corresponding preset information processing operation. Among them, the prompt word content includes but is not limited to: information processing rules. For example, the prompt word content includes: general prompt words corresponding to the corresponding information processing operation, information processing rule prompt words, input content format prompt words, generated content format prompt words, etc.

在一些可选的实施例中,信息处理操作包括但不限于:信息过滤、去重处理、信息提取、排序、格式变化、总结。相应的,信息处理规则包括但不限于:过滤规则、去重规则、信息提取规则、排序规则、变化格式、总结规则。相应的,提示模板包括但不限于:过滤规则提示模板、去重规则提示模板、信息提取规则提示模板、排序规则提示模板、变化格式提示模板、总结规则提示模板。In some optional embodiments, information processing operations include but are not limited to: information filtering, deduplication processing, information extraction, sorting, format change, and summary. Accordingly, information processing rules include but are not limited to: filtering rules, deduplication rules, information extraction rules, sorting rules, format change, and summary rules. Accordingly, prompt templates include but are not limited to: filtering rule prompt templates, deduplication rule prompt templates, information extraction rule prompt templates, sorting rule prompt templates, format change prompt templates, and summary rule prompt templates.

在一些可选的实施例中,提示模板是基于测试结果得到。例如,在针对网络数据源中采集的信息内容进行汇总输出的过程中,总结对信息内容的文本处理需求,包括但不限于以下一种或多种需求信息:通用提示词、信息处理规则、输入内容格式、生成内容格式,根据总结的需求信息创建提示词,并根据预设大规模语言模型基于提示词生成的网络资讯,调整提示词,反复沉淀,得到提示模板。In some optional embodiments, the prompt template is obtained based on the test results. For example, in the process of summarizing and outputting the information content collected from the network data source, the text processing requirements for the information content are summarized, including but not limited to one or more of the following requirement information: general prompt words, information processing rules, input content format, generated content format, prompt words are created according to the summarized requirement information, and the prompt words are adjusted according to the network information generated by the preset large-scale language model based on the prompt words, and the prompt template is obtained by repeated precipitation.

在一些可选的实施例中,预设的提示模板可以存储在配置文件中,可以方便编辑和维护预设提示模板。预设的提示模板中的提示词可以用配置化的方式迭代升级。In some optional embodiments, the preset prompt template can be stored in a configuration file, which can facilitate editing and maintaining the preset prompt template. The prompt words in the preset prompt template can be iteratively upgraded in a configurable manner.

不同种类提示模板提示词分别用于提示预设大规模语言模型执行相应的预设信息处理操作。下面分别阐述上述各种提示模板的使用方法。The prompt words of different prompt templates are used to prompt the preset large-scale language model to perform corresponding preset information processing operations. The following describes the usage of the above-mentioned various prompt templates.

1、过滤规则提示模板1. Filter rule prompt template

过滤规则提示模板用于描述提示预设大规模语言模型对输入内容进行过滤处理的提示词的格式和提示词内容,其中,提示词内容包括:通用提示词、过滤规则提示词、输出结果格式提示词等。其中,过滤规则提示词用于描述对输入内容的过滤规则。例如,过滤规则提示词可以用于描述:过滤掉24小时之外的输入内容、过滤相关度高的输入内容等。具体举例而言,过滤规则提示模板可以配置为包括如下内容:模板名称=”通用提示词.过滤规则提示词.输出结果格式提示词”。The filtering rule prompt template is used to describe the format and content of the prompt words that prompt the preset large-scale language model to filter the input content, wherein the prompt word content includes: general prompt words, filtering rule prompt words, output result format prompt words, etc. Among them, the filtering rule prompt words are used to describe the filtering rules for the input content. For example, the filtering rule prompt words can be used to describe: filtering out input content outside of 24 hours, filtering input content with high relevance, etc. For a specific example, the filtering rule prompt template can be configured to include the following content: Template name = "General prompt words. Filtering rule prompt words. Output result format prompt words".

2、去重规则提示模板2. Deduplication rule prompt template

去重规则提示模板用于描述提示预设大规模语言模型对输入内容进行去重操作的提示词的格式和提示词内容,其中,提示词内容包括:通用提示词、去重规则提示词、输出结果格式提示词等。The deduplication rule prompt template is used to describe the format and prompt word content of the prompt word that prompts the preset large-scale language model to perform deduplication operations on the input content, wherein the prompt word content includes: general prompt words, deduplication rule prompt words, output result format prompt words, etc.

提示词中的去重规则,用于通过预设大规模语言模型对信息内容进行精细化去重。例如,对于步骤104中初步去重处理后得到的第二信息采集结果中信息内容,基于标题相似度、内容相似度等进行去重处理。The deduplication rules in the prompt words are used to perform refined deduplication of information content through a preset large-scale language model. For example, for the information content in the second information collection result obtained after the preliminary deduplication processing in step 104, deduplication processing is performed based on title similarity, content similarity, etc.

在一些可选的实施例中,去重规则提示模板可以配置为包括如下内容:模板名称=”通用提示词.去重规则提示词.输出结果格式提示词”。In some optional embodiments, the deduplication rule prompt template can be configured to include the following content: Template name = "General prompt word. Deduplication rule prompt word. Output result format prompt word".

具体举例而言,去重规则提示模板可以配置如下:For example, the deduplication rule prompt template can be configured as follows:

prompt_template_deduplication=”'According to the following deduplication rules to deduplicate the initial resultList,then return the deduplicated resultList in the format of a JSONArray formatted string.prompt_template_deduplication="'According to the following deduplication rules to deduplicate the initial resultList, then return the deduplicated resultList in the format of a JSONArray formatted string.

Rules are as follows:Rules are as follows:

rule1:remove entries with title similarity>=80%and keep only the latest entry based on publication time.rule1:remove entries with title similarity>=80% and keep only the latest entry based on publication time.

rule2:remove entr ies with text content similarity>=70%and keep only the latest entry based on publication time.rule2:remove entr ies with text content similarity>=70% and keep only the latest entry based on publication time.

The initial resultList is as follows:\n\n{{resultList}}”'.The initial resultList is as follows:\n\n{{resultList}}”'.

其中,{{resultList}}表示输入内容和输出内容的占位符。Among them, {{resultList}} represents the placeholder for input content and output content.

进一步的,还可以配置输出结果格式提示词。Furthermore, you can also configure the output result format prompt word.

在需要执行去重处理操作时,将待去重处理的信息内容替换对应的占位符对去重规则提示模板进行格式化,得到调用预设大规模语言模型并提示预设大规模语言模型对执行信息内容进行去重处理的提示词。通过基于提示模板,运用预设大规模语言模型的自然语言处理能力,对多网络数据源的信息内容进行一致性内容去重,保证增量信息提取。例如,在提示词模板中可以指定对于标题相似度大于等于80%的信息内容,仅保留最新的一条,从而保证增量信息提取。When deduplication processing is required, the information content to be deduplicated is replaced with the corresponding placeholder to format the deduplication rule prompt template, and a prompt word is obtained to call the preset large-scale language model and prompt the preset large-scale language model to perform deduplication processing on the executed information content. Based on the prompt template, the natural language processing capability of the preset large-scale language model is used to deduplicate the information content of multiple network data sources to ensure the extraction of incremental information. For example, in the prompt word template, it can be specified that for information content with a title similarity greater than or equal to 80%, only the latest one is retained, thereby ensuring the extraction of incremental information.

在一些可选的实施例中,例如可以采用如下提示词(即Prompt)指令按照去重规则提示模板,将待去重处理的信息内容和去重处理规则提示词进行拼接,得到用于调用预设大规模语言模型提示词Prompt1。In some optional embodiments, for example, the following prompt word (i.e., Prompt) instruction can be used to splice the information content to be deduplicated and the deduplication rule prompt word according to the deduplication rule prompt template to obtain the prompt word Prompt1 for calling the preset large-scale language model.

Prompt1=prompt_template.replace("{{resultList}}",str(context["resultList"]))Prompt1=prompt_template.replace("{{resultList}}",str(context["resultList"]))

print(Prompt1)print(Prompt1)

其中,str(context["resultList"]表示待去重处理的信息内容,Among them, str(context["resultList"] represents the information content to be deduplicated.

"{{resultList}}"表示去重规则提示模板中的占位符。"{{resultList}}" represents the placeholder in the deduplication rule prompt template.

然后,通过拼接得到的提示词调用预设大规模语言模型,预设大规模语言模型便可以按照提示词中描述的去重规则对提示词中的待处理信息内容进行去重处理,并按照提示词中描述的输出格式,生成去重处理后的信息内容。Then, the preset large-scale language model is called through the concatenated prompt words, and the preset large-scale language model can deduplicate the information content to be processed in the prompt words according to the deduplication rules described in the prompt words, and generate the deduplicated information content according to the output format described in the prompt words.

以上去重规则提示模板中的内容仅仅是一种可能的提示词,本领域技术人员还可以根据具体应用场景配置其他格式和内容的去重规则提示模板,本公开实施例中对去重规则提示模板的具体格式和去重规则提示词的构成不做限制。The content in the above deduplication rule prompt template is only a possible prompt word. Technical personnel in this field can also configure deduplication rule prompt templates with other formats and contents according to specific application scenarios. The specific format of the deduplication rule prompt template and the composition of the deduplication rule prompt words are not limited in the embodiments of the present disclosure.

3、信息提取规则提示模板3. Information extraction rule prompt template

信息提取规则提示模板用于描述提示预设大规模语言模型对输入内容进行信息提取的提示词的格式和提示词内容,其中,提示词内容包括:通用提示词、信息提取规则提示词、输出结果格式提示词等。信息提取规则提示词用于描述信息提取规则,例如,信息提取规则提示词可以为按照字段名进行结构化提取、提取得到的资讯字符数小于等于20字等。The information extraction rule prompt template is used to describe the format and content of the prompt words that prompt the preset large-scale language model to extract information from the input content, wherein the prompt word content includes: general prompt words, information extraction rule prompt words, output result format prompt words, etc. The information extraction rule prompt words are used to describe the information extraction rules. For example, the information extraction rule prompt words can be structured extraction according to the field name, the number of characters of the extracted information is less than or equal to 20 words, etc.

4、排序规则提示模板4. Sorting rule prompt template

排序规则提示模板用于描述提示预设大规模语言模型对输入内容进行排序的提示词的格式和提示词内容,其中,提示词内容包括:通用提示词、排序规则提示词、输出结果格式提示词等。排序规则提示词用于描述信息内容排序规则,例如,信息提取规则提示词可以为按照采集生成时间排序、按照热度排序、按照信息内容来源优先级排序等。The sorting rule prompt template is used to describe the format and content of the prompt words that prompt the preset large-scale language model to sort the input content, where the prompt word content includes: general prompt words, sorting rule prompt words, output result format prompt words, etc. The sorting rule prompt words are used to describe the information content sorting rules. For example, the information extraction rule prompt words can be sorting by collection generation time, sorting by popularity, sorting by information content source priority, etc.

5、变化格式提示模板5. Change format prompt template

变化格式提示模板用于描述提示预设大规模语言模型对输入内容进行格式变化的提示词的格式和提示词内容,其中,提示词内容包括:通用提示词、变化格式提示词、输出结果格式提示词等。变化格式提示词用于描述预设大规模语言模型生成的网络资讯的文本格式,以便将生成的网络资讯对接至指定的信息接收端。具体举例而言,当网络资讯需要对接至邮箱时,可以将输入至预设大规模语言模型的提示词中的变化格式提示词配置为RichText格式(即富文本格式)对应的预设提示词;当网络资讯需要对接至即时通信软件钉钉时,可以将输入至预设大规模语言模型的提示词中的变化格式提示词配置为MarkDown格式(即钉钉群可以识别和展示的文本格式)。可选的,可以根据待生成的网络资讯的应用场景接受的文本格式,配置变化格式提示词,以满足输出文本格式多样化的需求。The change format prompt template is used to describe the format and content of the prompt words that prompt the preset large-scale language model to change the format of the input content, wherein the prompt word content includes: general prompt words, change format prompt words, output result format prompt words, etc. The change format prompt words are used to describe the text format of the network information generated by the preset large-scale language model, so as to connect the generated network information to the designated information receiving end. For example, when the network information needs to be connected to the mailbox, the change format prompt words in the prompt words input to the preset large-scale language model can be configured as the preset prompt words corresponding to the RichText format (i.e., rich text format); when the network information needs to be connected to the instant messaging software DingTalk, the change format prompt words in the prompt words input to the preset large-scale language model can be configured as the MarkDown format (i.e., the text format that can be recognized and displayed by the DingTalk group). Optionally, the change format prompt words can be configured according to the text format accepted by the application scenario of the network information to be generated to meet the needs of diversified output text formats.

在使用过程中,将待处理信息内容和配置的变化格式提示词,按照变化格式提示模板进行拼接后,得到预设大规模语言模型的提示词,之后,基于该提示词调用预设大规模语言模型,预设大规模语言模型便可以将待处理信息内容转换为变化格式提示词所提示的文本格式,并输出。During use, the information content to be processed and the configured change format prompt words are spliced according to the change format prompt template to obtain the prompt words of the preset large-scale language model. After that, the preset large-scale language model is called based on the prompt words. The preset large-scale language model can convert the information content to be processed into the text format prompted by the change format prompt words and output it.

6、总结规则提示模板6. Summary rule prompt template

总结规则提示模板用于描述提示预设大规模语言模型对输入内容进行总结的提示词的格式和提示词内容,其中,提示词内容包括:通用提示词、总结策略提示词、输出结果格式提示词等。其中,总结策略包括但不限于:分模块总结、分标题总结等。The summary rule prompt template is used to describe the format and content of prompt words that prompt the preset large-scale language model to summarize the input content, wherein the prompt word content includes: general prompt words, summary strategy prompt words, output result format prompt words, etc. Among them, the summary strategy includes but is not limited to: summary by module, summary by title, etc.

以上仅对提示模板库中包括的提示模板的部分种类进行了举例说明,在另一些可选的实施例中,提示模板库中还可以配置其他种类的提示模板,提示模板的格式和内容亦不局限于以上举例部分。The above only exemplifies some types of prompt templates included in the prompt template library. In other optional embodiments, other types of prompt templates may be configured in the prompt template library, and the format and content of the prompt template are not limited to the above examples.

在一些可选的实施例中,基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,包括:基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯。In some optional embodiments, based on a preset prompt template and the second information collection result, a preset large-scale language model is called to generate network information corresponding to a preset network data source, including: based on the second information collection result and the prompt template, in accordance with a preset execution order of preset information processing operations, a preset large-scale language model is chain-called to generate network information corresponding to a preset network data source.

递进多次调用预设大规模语言模型,每次执行一种预设信息处理操作,相比于一次调用执行多种操作可以提升生成的网络资讯的质量。By progressively calling the preset large-scale language model multiple times and performing a preset information processing operation each time, the quality of the generated network information can be improved compared to performing multiple operations in one call.

在一些可选的实施例中,基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,包括:基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式执行预设信息处理操作,直至生成预设的网络数据源对应的网络资讯,其中,预设信息处理操作包括以下一种或两种操作:第一种,基于第二信息采集结果,格式化当前预设信息处理操作对应的提示模板,生成第一提示词;以及,基于第一提示词调用预设大规模语言模型,得到生成内容;第二种,基于执行前一个预设信息处理操作得到的生成内容,格式化当前预设信息处理操作对应的提示模板,生成第二提示词;以及,基于第二提示词调用预设大规模语言模型,得到生成内容。In some optional embodiments, based on the second information collection result and the prompt template, in accordance with the preset execution order of the preset information processing operation, the preset large-scale language model is chain-called to generate network information corresponding to the preset network data source, including: based on the second information collection result and the prompt template, in accordance with the preset execution order of the preset information processing operation, the preset information processing operation is chain-executed until the network information corresponding to the preset network data source is generated, wherein the preset information processing operation includes one or two of the following operations: first, based on the second information collection result, formatting the prompt template corresponding to the current preset information processing operation to generate a first prompt word; and, based on the first prompt word, calling the preset large-scale language model to obtain generated content; second, based on the generated content obtained by executing the previous preset information processing operation, formatting the prompt template corresponding to the current preset information processing operation to generate a second prompt word; and, based on the second prompt word, calling the preset large-scale language model to obtain generated content.

在一些可选的实施例中,当执行第一个预设信息处理操作时,以第二信息采集结果中的信息内容填充到当前预设信息处理操作(即第一个预设信息处理)对应的提示模板中信息内容对应的占位符位置,以格式化当前预设信息处理操作对应的提示模板,生成第一提示词。之后,基于第一提示词调用预设大规模语言模型,预设大规模语言模型将按照第一提示词给出的提示执行当前预设信息处理操作,并输出对第二信息采集结果中的信息内容执行当前预设信息处理操作(即第一个预设信息处理)后得到的生成内容。In some optional embodiments, when the first preset information processing operation is executed, the information content in the second information collection result is filled into the placeholder position corresponding to the information content in the prompt template corresponding to the current preset information processing operation (i.e., the first preset information processing) to format the prompt template corresponding to the current preset information processing operation and generate a first prompt word. Afterwards, the preset large-scale language model is called based on the first prompt word, and the preset large-scale language model will execute the current preset information processing operation according to the prompt given by the first prompt word, and output the generated content obtained after executing the current preset information processing operation (i.e., the first preset information processing) on the information content in the second information collection result.

相应的,当执行第M(M大于1)个预设信息处理操作时,以第M-1个执行的预设信息处理操作的生成内容填充到当前预设信息处理操作(即第M个执行的预设信息处理操作)对应的提示模板中信息内容对应的占位符位置,以格式化当前预设信息处理操作对应的提示模板,生成第二提示词。之后,基于第二提示词调用预设大规模语言模型,预设大规模语言模型将按照第二提示词给出的提示执行当前预设信息处理操作,并输出对前一个预设信息处理操作得到的处理结果执行当前预设信息处理操作(即第M个预设信息处理操作)后得到的生成内容。Accordingly, when the Mth (M is greater than 1) preset information processing operation is executed, the generated content of the M-1th executed preset information processing operation is filled into the placeholder position corresponding to the information content in the prompt template corresponding to the current preset information processing operation (i.e., the Mth executed preset information processing operation) to format the prompt template corresponding to the current preset information processing operation. The template is used to generate a second prompt word. After that, the preset large-scale language model is called based on the second prompt word, and the preset large-scale language model performs the current preset information processing operation according to the prompt given by the second prompt word, and outputs the generated content obtained by performing the current preset information processing operation (i.e., the Mth preset information processing operation) on the processing result obtained by the previous preset information processing operation.

当第M个预设信息处理操作为最后一个操作时,执行第M个预设信息处理操作后得到的生成内容即为预设的网络数据源对应的网络资讯。When the Mth preset information processing operation is the last operation, the generated content obtained after executing the Mth preset information processing operation is the network information corresponding to the preset network data source.

在一些可选的实施例中,可以根据对待生成的网络资讯的需求,预先配置需要顺序执行的预设信息处理操作,并配置与每个预设信息处理操作对应的提示模板。In some optional embodiments, preset information processing operations that need to be executed sequentially can be pre-configured according to the requirements for the network information to be generated, and a prompt template corresponding to each preset information processing operation can be configured.

以网络资讯时需要顺序执行信息过滤、去重处理、信息提取、排序、格式变化、总结这六个信息处理操作为例,各信息处理操作对应的提示模板依次为:过滤规则提示模板、去重规则提示模板、信息提取规则提示模板、排序规则提示模板、变化格式提示模板、总结规则提示模板,基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式执行预设信息处理操作,直至生成预设的网络数据源对应的网络资讯,包括以下六个步骤。Taking the six information processing operations of information filtering, deduplication, information extraction, sorting, format change and summarization as an example, the prompt templates corresponding to each information processing operation are: filtering rule prompt template, deduplication rule prompt template, information extraction rule prompt template, sorting rule prompt template, format change prompt template and summarizing rule prompt template. Based on the second information collection result and the prompt template, the preset information processing operations are chained in the preset execution order of the preset information processing operations until the network information corresponding to the preset network data source is generated, which includes the following six steps.

步骤1,基于第二信息采集结果,格式化过滤规则提示模板,生成提示词Prompt1,之后,基于提示词Prompt1,调用预设大规模语言模型对第二信息采集结果执行信息过滤操作,得到生成内容,即过滤后的信息内容。Step 1, based on the second information collection result, format the filtering rule prompt template to generate the prompt word Prompt1, then, based on the prompt word Prompt1, call the preset large-scale language model to perform information filtering operation on the second information collection result to obtain generated content, that is, filtered information content.

步骤2,以滤后的信息内容,格式化去重规则提示模板,生成提示词Prompt2,之后,基于提示词Prompt2,调用预设大规模语言模型对过滤后的信息内容执行去重操作,得到生成内容,即去重后的信息内容。Step 2, formatting the deduplication rule prompt template with the filtered information content to generate the prompt word Prompt2, and then, based on the prompt word Prompt2, calling the preset large-scale language model to perform a deduplication operation on the filtered information content to obtain the generated content, that is, the deduplicated information content.

步骤3,以去重后的信息内容,格式化信息提取规则提示模板,生成提示词Prompt3,之后,基于提示词Prompt3,调用预设大规模语言模型对去重后的信息内容执行信息提取操作,得到生成内容,即提取的信息内容。Step 3, format the information extraction rule prompt template with the deduplicated information content to generate the prompt word Prompt3, and then, based on the prompt word Prompt3, call the preset large-scale language model to perform information extraction operations on the deduplicated information content to obtain the generated content, that is, the extracted information content.

步骤4,以提取的信息内容,格式化去排序规则提示模板,生成提示词Prompt4,之后,基于提示词Prompt4,调用预设大规模语言模型对提取的信息内容执行排序操作,得到生成内容,即排序后的信息内容。Step 4, formatting the extracted information content and removing the sorting rule prompt template to generate a prompt word Prompt4, and then, based on the prompt word Prompt4, calling the preset large-scale language model to perform a sorting operation on the extracted information content to obtain the generated content, that is, the sorted information content.

步骤5,以排序后的信息内容,格式化变化格式提示模板,生成提示词Prompt5,之后,基于提示词Prompt5,调用预设大规模语言模型对排序后的信息内容执行格式变化操作,得到生成内容,即格式变化后的信息内容。Step 5, formatting the format change prompt template with the sorted information content to generate the prompt word Prompt5, and then, based on the prompt word Prompt5, calling the preset large-scale language model to perform the format change operation on the sorted information content to obtain the generated content, that is, the information content after the format change.

步骤6,以格式变化后的信息内容,格式化总结规则提示模板,生成提示词Prompt6,之后,基于提示词Prompt6,调用预设大规模语言模型对格式变化后的信息内容执行内容总结操作,得到生成内容,即对应前述预设的网络数据源且按照预设配置执行了信息处理的网络资讯。Step 6, formatting the summary rule prompt template with the information content after the format change to generate the prompt word Prompt6, and then, based on the prompt word Prompt6, calling the preset large-scale language model to perform content summarization operation on the information content after the format change to obtain the generated content, that is, the network information corresponding to the aforementioned preset network data source and having performed information processing according to the preset configuration.

在总结规则提示模板中,可以配置对生成的每条网络资讯的分类规则等信息,以通过人工智能自动对生成的网络资讯进行分类。例如,当配置了总结规则提示模板,并在总结规则提示模板中设置了生成的网络资讯的分类规则为“按照理论研究、应用或工具、最新资讯分为三个类别”时,预设大规模语言模型将根据配置的分类规则的提示,对生成的网络资讯进行分类,生成如图3所示的内容。In the summary rule prompt template, you can configure the classification rules and other information for each generated network information, so as to automatically classify the generated network information through artificial intelligence. For example, when the summary rule prompt template is configured and in the summary When the classification rule of the generated network information is set in the rule prompt template as "divided into three categories according to theoretical research, application or tools, and latest information", the preset large-scale language model will classify the generated network information according to the prompt of the configured classification rule and generate the content shown in Figure 3.

本公开实施例公开的网络资讯生成方法,通过分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;对第一信息采集结果进行汇总处理,得到第二信息采集结果,以初步去除重复采集的信息内容,减少后续步骤处理数据量,提示信息处理效率;之后,基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,结合页面信息自动递进采集和AIGC内容处理能力,实现信息采集和生成的全流程自动化,不再需要人工总结提炼,响应网络资讯生成需求的时间缩减到分钟级别,极大提升了网络资讯生成时效性。并且,采用本公开实施例公开的网络资讯生成方法,支持用户配置若干网络数据源,生成的网络资讯更加全面。支持用户以提示模板的方式配置信息处理规则,例如配置信息过滤规则、信息提取规则、总结规则、格式转换规则等,按需进行网络信息提取。另一方面,通过以提示模板的方式配置信息处理规则,如配置信息过滤规则、去重规则等,进一步提升生成网络资讯的质量。The network information generation method disclosed in the embodiment of the present disclosure obtains the first information collection result corresponding to each network data source by respectively collecting the page information of the preset network data source; the first information collection result is summarized and processed to obtain the second information collection result, so as to preliminarily remove the information content collected repeatedly, reduce the amount of data processed in the subsequent steps, and prompt the information processing efficiency; then, based on the preset prompt template and the second information collection result, the preset large-scale language model is called to generate the network information corresponding to the preset network data source, and the full process automation of information collection and generation is realized by combining the automatic progressive collection of page information and the AIGC content processing capability, so as to realize the full process automation of information collection and generation, and no longer need manual summary and refinement, and the time to respond to the network information generation demand is reduced to the minute level, which greatly improves the timeliness of network information generation. In addition, the network information generation method disclosed in the embodiment of the present disclosure supports users to configure several network data sources, and the generated network information is more comprehensive. It supports users to configure information processing rules in the form of prompt templates, such as configuring information filtering rules, information extraction rules, summary rules, format conversion rules, etc., and extract network information on demand. On the other hand, by configuring information processing rules in the form of prompt templates, such as configuring information filtering rules, deduplication rules, etc., the quality of generated network information is further improved.

基于上述实施例,本公开实施例还公开了一种网络资讯生成方法,应用于服务端,通过对上述网络资讯生成方法进行全流程封装,以网络服务的形式提供给用户,实现便捷、快速生成多网络数据源的网络资讯。Based on the above embodiments, the embodiments of the present disclosure also disclose a network information generation method, which is applied to the server. By encapsulating the entire process of the above network information generation method and providing it to users in the form of network services, it is possible to conveniently and quickly generate network information from multiple network data sources.

参照图4,网络资讯生成方法包括:步骤402至步骤410。4 , the network information generating method includes: step 402 to step 410 .

步骤402,响应于HTTP请求,获取HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板。Step 402: In response to the HTTP request, obtain the network data source carried in the HTTP request and the prompt template corresponding to the preset information processing operation.

在一些可选的实施例中,可以将上述实施例的技术全运行流程,封装为一个对外服务接口。不同的应用场景,可根据需求,周期或定时或根据应用触发,调用该服务接口,实现自动化获取网络资讯。获取网络资讯的频次、推送场景,均由场景侧决定。例如,在即时通信应用用户群推送应用场景,则由即时通信应用根据需要调用服务接口获取网络资讯,再通过即时通信应用预先注册的群内容推送接口,将该网络资讯推送至群组中。In some optional embodiments, the entire operation process of the technology in the above embodiments can be encapsulated as an external service interface. Different application scenarios can call the service interface according to demand, periodically or regularly, or according to application triggers to achieve automatic acquisition of network information. The frequency of obtaining network information and the push scenario are determined by the scenario side. For example, in the instant messaging application user group push application scenario, the instant messaging application calls the service interface as needed to obtain network information, and then pushes the network information to the group through the group content push interface pre-registered by the instant messaging application.

在一些可选的实施例中,将该网络数据源的页面信息采集方案和AIGC内容生成方案,整体封装为一个对外开放的HTTP POST(Hypertext Transfer Protocol Post,超文本传输协议-POST方法)服务接口。以使用Python(一种编程语言)来封装为例,具体的技术实现过程如下:In some optional embodiments, the page information collection scheme of the network data source and the AIGC content generation scheme are encapsulated as an open HTTP POST (Hypertext Transfer Protocol Post) service interface. Taking Python (a programming language) as an example, the specific technical implementation process is as follows:

首先,安装Python的Web(World Wide Web,万维网)框架(如Flask,Flask是一个用Python编写的轻量级Web应用框架)及其依赖项。例如,可以使用pip命令(一个管理Python的包的工具)来安装所需的库。First, install Python's Web (World Wide Web) framework (such as Flask, Flask is a lightweight Web application framework written in Python) and its dependencies. For example, you can use the pip command (a tool for managing Python packages) to install the required libraries.

之后,创建一个新的Python文件,导入所选框架的必要模块,并在文件中创建一个应用实例,定义一个路由来处理HTTP POST请求(即请求数据的HTTP请求),并在路由处理函数中,调用前述是实施例中实现的网络数据源的页面信息采集解析方法和和AIGC内容生成方案。After that, create a new Python file, import the necessary modules of the chosen framework, create an application instance in the file, define a route to handle HTTP POST requests (that is, HTTP requests that request data), and In the processing function, the page information collection and analysis method of the network data source and the AIGC content generation solution implemented in the above embodiment are called.

最后,实现生成内容输出。例如,返回生成内容。Finally, implement the generated content output, for example, return the generated content.

以上封装方案仅仅是一种可行的封装方案,在另一些可选的实施例中,还可以采用其他方法对前述实施例中步骤102、步骤104和步骤106实现的网络资讯生成方法全流程进行封装,得到HTTP服务形式的网络资讯服务,本公开实施例中对封装前述网络数据源采集方法和前述AIGC内容生成方法得到网络资讯的具体实施方式不做限制。The above encapsulation scheme is only a feasible encapsulation scheme. In other optional embodiments, other methods can be used to encapsulate the entire process of the network information generation method implemented in steps 102, 104 and 106 in the aforementioned embodiments to obtain a network information service in the form of an HTTP service. The specific implementation method of encapsulating the aforementioned network data source collection method and the aforementioned AIGC content generation method to obtain network information is not limited in the embodiments of the present disclosure.

封装得到的网络资讯服务可以部署到服务端运行。The encapsulated network information service can be deployed to the server for operation.

在封装得到的网络资讯服务运行于服务端的过程中,服务端响应于接收到HTTP请求,解析接收到的HTTP请求,获取HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板。其中,网络数据源和提示模板可以为网络资讯服务调用端根据用户的配置生成的。网络数据源和提示模板的具体内容参见前文实施例中的描述,此处不再赘述。In the process of the encapsulated network information service running on the server, the server responds to receiving the HTTP request, parses the received HTTP request, and obtains the network data source carried in the HTTP request and the prompt template corresponding to the preset information processing operation. Among them, the network data source and the prompt template can be generated by the network information service caller according to the user's configuration. The specific contents of the network data source and the prompt template refer to the description in the previous embodiment, which will not be repeated here.

步骤404,分别采集网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果。Step 404: collect page information of network data sources respectively to obtain first information collection results corresponding to each network data source.

步骤406,对第一信息采集结果进行汇总处理,得到第二信息采集结果。Step 406: Summarize the first information collection result to obtain a second information collection result.

步骤408,基于提示模板和第二信息采集结果,调用预设大规模语言模型,生成网络数据源对应的网络资讯。Step 408, based on the prompt template and the second information collection result, call the preset large-scale language model to generate network information corresponding to the network data source.

步骤404、步骤406和步骤408的具体实施方式,参见前文实施例中相关步骤的描述,此处不再赘述。For the specific implementation of step 404, step 406 and step 408, please refer to the description of the relevant steps in the previous embodiment, which will not be repeated here.

步骤410,针对HTTP请求,输出网络资讯。Step 410, output network information in response to the HTTP request.

服务端在运行上述网络资讯服务生成网络资讯后,向服务接口调用方,即HTTP请求的发送方返回生成内容。After the server runs the above network information service to generate network information, it returns the generated content to the service interface caller, that is, the sender of the HTTP request.

综上,本公开实施例公开的网络资讯生成方法,通过接收HTTP请求,并获取HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板,之后,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;对第一信息采集结果进行汇总处理,得到第二信息采集结果,以初步去除重复采集的信息内容,减少后续步骤处理数据量,提示信息处理效率;之后,基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,结合页面信息自动递进采集和AIGC内容处理能力,实现信息采集和生成的全流程自动化,不再需要人工总结提炼,响应网络资讯生成需求的时间缩减到分钟级别,极大提升了网络资讯生成时效性。并且,采用本公开实施例公开的网络资讯生成方法,支持用户配置若干网络数据源,生成的网络资讯更加全面。支持用户以提示模板的方式配置信息处理规则,例如配置信息过滤规则、信息提取规则、总结规则、格式转换规则等,按需进行网络信息提取。另一方面,通过以提示模板的方式配置信息处理规则,如配置信息过滤规则、去重规则等,进一步提升生成网络资讯的质量。In summary, the network information generation method disclosed in the embodiment of the present disclosure receives an HTTP request and obtains the prompt template corresponding to the network data source and the preset information processing operation carried in the HTTP request. After that, the page information of the preset network data source is collected respectively to obtain the first information collection result corresponding to each network data source; the first information collection result is summarized and processed to obtain the second information collection result, so as to preliminarily remove the information content collected repeatedly, reduce the amount of data processed in the subsequent steps, and prompt the information processing efficiency; then, based on the preset prompt template and the second information collection result, the preset large-scale language model is called to generate the network information corresponding to the preset network data source, and the automatic progressive collection of page information and the AIGC content processing capabilities are combined to realize the full process automation of information collection and generation, and no manual summary and refinement are required. The time to respond to the network information generation demand is reduced to the minute level, which greatly improves the timeliness of network information generation. In addition, the network information generation method disclosed in the embodiment of the present disclosure supports users to configure several network data sources, and the generated network information is more comprehensive. It supports users to configure information processing rules in the form of prompt templates, such as configuring information filtering rules, information extraction rules, summary rules, format conversion rules, etc., and extract network information on demand. On the other hand, by configuring information processing rules in the form of prompt templates, such as configuring information filtering rules, deduplication rules, etc., the quality of generated network information can be further improved.

为了实现上述网络资讯生成方法,本公开实施例还公开了一种网络资讯生成系统,参照图5,网络资讯生成系统500包括:客户端501和服务端502,其中,In order to implement the above network information generation method, the embodiment of the present disclosure also discloses a network information generation system. Referring to FIG. 5 , the network information generation system 500 includes: a client 501 and a server 502, wherein:

客户端501,用于获取用户配置的网络数据源和预设信息处理操作对应的提示模板;The client 501 is used to obtain the network data source configured by the user and the prompt template corresponding to the preset information processing operation;

服务端502,用于分别采集网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;以及,对第一信息采集结果进行汇总处理,得到第二信息采集结果;The server 502 is used to collect page information of network data sources respectively to obtain first information collection results corresponding to each network data source; and to summarize the first information collection results to obtain second information collection results;

服务端502,还用于基于提示模板和第二信息采集结果,调用预设大规模语言模型,生成网络数据源对应的网络资讯;The server 502 is further used to call a preset large-scale language model based on the prompt template and the second information collection result to generate network information corresponding to the network data source;

服务端502,还用于向用户对应的资讯展示接口推送网络资讯。The server 502 is also used to push network information to the user's corresponding information display interface.

在一些可选的实施例中,资讯展示接口可以为用户登录的客户端501。In some optional embodiments, the information display interface may be a client 501 where the user logs in.

在另一些可选的实施例中,客户端501还用于获取用户配置的资讯展示接口其中,资讯展示接口可以为用户通过客端配置的资讯接收工具的接口、资讯上传地址等。In some other optional embodiments, the client 501 is also used to obtain an information display interface configured by the user, wherein the information display interface may be an interface of an information receiving tool configured by the user through the client, an information upload address, etc.

其中,网络数据源和提示模板的具体内容参见前文实施例中的描述,此处不再赘述。本实施例中对网络数据源的数量和类型、提示模板的数量和内容不做限制。The specific contents of the network data source and the prompt template refer to the description in the previous embodiment, which will not be repeated here. In this embodiment, there is no restriction on the number and type of network data sources and the number and content of prompt templates.

分别采集网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果,以及,对第一信息采集结果进行汇总处理,得到第二信息采集结果的具体实施方式,参见前文实施例中的描述,此处不再赘述。For specific implementation methods of respectively collecting page information of network data sources to obtain first information collection results corresponding to each network data source, and aggregating the first information collection results to obtain second information collection results, please refer to the description in the previous embodiment and will not be repeated here.

基于提示模板和第二信息采集结果,调用预设大规模语言模型,生成网络数据源对应的网络资讯的具体实施方式,参见前文实施例中的描述,此处不再赘述。For a specific implementation method of calling a preset large-scale language model based on the prompt template and the second information collection result to generate network information corresponding to the network data source, please refer to the description in the previous embodiment, which will not be repeated here.

综上,本公开实施例公开的网络资讯生成系统,由客户端获取用户配置的网络数据源和预设信息处理操作对应的提示模板,之后,由服务端分别采集网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果,并对第一信息采集结果进行汇总处理,得到第二信息采集结果,之后,服务端基于提示模板和第二信息采集结果,调用预设大规模语言模型,生成网络数据源对应的网络资讯,并向用户对应的资讯展示接口推送网络资讯,用户只需要配置提示模板和网络数据源,即可获得从各网络数据源中提取的符合该用户需求的网络资讯,网络资讯的生成响应时间达到分钟级别,时效性高。并且,网络资讯生成系统对用户配置的网络数据源的数量和类型不做限制,支持若干网络数据源,从而可以获取更加全面的网络资讯。在用户通过客户端配置网络数据源和提示模板之后,启动全自动运行,并向用户反馈生成的网络资讯,无需人工整合,效率更高。In summary, the network information generation system disclosed in the embodiment of the present disclosure obtains the prompt template corresponding to the network data source configured by the user and the preset information processing operation by the client, and then the page information of the network data source is collected by the server respectively to obtain the first information collection result corresponding to each network data source, and the first information collection result is summarized and processed to obtain the second information collection result. After that, the server calls the preset large-scale language model based on the prompt template and the second information collection result to generate network information corresponding to the network data source, and pushes the network information to the corresponding information display interface of the user. The user only needs to configure the prompt template and the network data source to obtain the network information extracted from each network data source that meets the user's needs. The generation response time of the network information reaches the minute level, and the timeliness is high. In addition, the network information generation system does not limit the number and type of network data sources configured by the user, and supports several network data sources, so that more comprehensive network information can be obtained. After the user configures the network data source and the prompt template through the client, the fully automatic operation is started, and the generated network information is fed back to the user, without manual integration, and the efficiency is higher.

需要说明的是,本公开实施例中可能会涉及到对用户数据的使用,在实际应用中,可以在符合所在国的适用法律法规要求的情况下(例如,用户明确同意,对用户切实通知,等),在适用法律法规允许的范围内在本文描述的方案中使用用户特定的个人数据。It should be noted that the embodiments of the present disclosure may involve the use of user data. In actual applications, user-specific personal data may be used in the scheme described herein within the scope permitted by applicable laws and regulations, subject to the requirements of applicable laws and regulations of the country where the user is located (for example, with the user's explicit consent, effective notification to the user, etc.).

需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开实施例并不受所描述的动作顺序的限制,因为依据本公开实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本公开实施例所必须的。It should be noted that, for the sake of simplicity, the method embodiments are described as a series of action combinations, but those skilled in the art should be aware that the embodiments of the present disclosure are not limited to the order of the actions described, because according to the embodiments of the present disclosure, some steps can be performed in other orders or simultaneously. It is understood that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the embodiments of the present disclosure.

在上述实施例的基础上,本实施例还提供了一种网络资讯生成装置,装置包括:Based on the above embodiment, this embodiment further provides a network information generating device, the device comprising:

网络数据源信息采集模块,用于分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;A network data source information collection module is used to collect page information of preset network data sources respectively, and obtain first information collection results corresponding to each network data source;

信息汇总模块,用于对第一信息采集结果进行汇总处理,得到第二信息采集结果;An information aggregation module, used to aggregate the first information collection result to obtain a second information collection result;

资讯生成模块,用于基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯。The information generation module is used to call a preset large-scale language model based on a preset prompt template and a second information collection result to generate network information corresponding to a preset network data source.

在一些可选的实施例中,提示模板与预设信息处理操作对应,资讯生成模块,进一步用于:In some optional embodiments, the prompt template corresponds to a preset information processing operation, and the information generation module is further used to:

基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯。Based on the second information collection result and the prompt template, in accordance with the preset execution order of the preset information processing operation, the preset large-scale language model is chain-called to generate network information corresponding to the preset network data source.

在一些可选的实施例中,基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,包括:In some optional embodiments, based on the second information collection result and the prompt template, according to the preset execution order of the preset information processing operation, the preset large-scale language model is chain-called to generate network information corresponding to the preset network data source, including:

基于第二信息采集结果和提示模板,按照预设信息处理操作的预设执行顺序,链式执行预设信息处理操作,直至生成预设的网络数据源对应的网络资讯,其中,预设信息处理操作包括以下一种或两种操作:Based on the second information collection result and the prompt template, the preset information processing operations are chained and executed in a preset execution order of the preset information processing operations until network information corresponding to the preset network data source is generated, wherein the preset information processing operations include one or both of the following operations:

第一种,基于第二信息采集结果,格式化当前预设信息处理操作对应的提示模板,生成第一提示词;以及,基于第一提示词调用预设大规模语言模型,得到生成内容;The first method is to format a prompt template corresponding to the current preset information processing operation based on the second information collection result to generate a first prompt word; and to call a preset large-scale language model based on the first prompt word to obtain generated content;

第二种,基于执行前一个预设信息处理操作得到的生成内容,格式化当前预设信息处理操作对应的提示模板,生成第二提示词;以及,基于第二提示词调用预设大规模语言模型,得到生成内容。The second method is to format a prompt template corresponding to the current preset information processing operation based on the generated content obtained by executing the previous preset information processing operation to generate a second prompt word; and to call a preset large-scale language model based on the second prompt word to obtain the generated content.

在一些可选的实施例中,信息汇总模块,进一步用于:In some optional embodiments, the information aggregation module is further used to:

基于预设历史时间内采集的预设的网络数据源的信息内容,对第一信息采集结果中的信息内容进行增量去重汇总处理,得到第二信息采集结果。Based on the information content of a preset network data source collected within a preset historical time, incremental deduplication and aggregation processing is performed on the information content in the first information collection result to obtain a second information collection result.

在一些可选的实施例中,基于预设历史时间内采集的预设的网络数据源的信息内容,对第一信息采集结果中的信息内容进行增量去重汇总处理,得到第二信息采集结果,包括:In some optional embodiments, based on the information content of a preset network data source collected within a preset historical time, the information content in the first information collection result is incrementally deduplicated and aggregated to obtain the second information collection result, including:

获取网络数据源的历史采集结果,历史采集结果是对指定历史时间内采集的网络数据源的信息内容进行汇总得到的;Obtain the historical collection results of network data sources, which are obtained by summarizing the information content of network data sources collected within a specified historical period;

基于历史采集结果中的信息内容包括的文章标题和/或文章链接地址,对第一信息采集结果中重复采集的信息内容进行去重处理,得到增量信息内容;Based on the article titles and/or article link addresses included in the information content in the historical collection results, duplicate information content collected in the first information collection results is deduplicated to obtain incremental information content;

将增量信息内容和历史采集结果中的信息内容,作为预设的网络数据源的信息内容的第二信息采集结果。The incremental information content and the information content in the historical collection results are used as the second information collection result of the information content of the preset network data source.

在一些可选的实施例中,网络数据源信息采集模块,进一步用于:In some optional embodiments, the network data source information collection module is further used to:

采用HTTP协议客户端工具包,采集预设的各网络数据源的各页面的页面信息;Using HTTP protocol client toolkit to collect page information of each page of each preset network data source;

响应于页面中目标页面的页面信息采集失败,采用无头浏览器技术采集目标页面的页面信息;In response to a failure in collecting page information of a target page in the page, using a headless browser technology to collect page information of the target page;

针对各网络数据源,对从网络数据源的页面采集的页面信息进行解析和提取,得到网络数据源对应的第一信息采集结果。For each network data source, the page information collected from the page of the network data source is parsed and extracted to obtain a first information collection result corresponding to the network data source.

在一些可选的实施例中,页面信息,包括:相应页面的前N屏的页面信息,其中,N为大于或等于3的自然数。In some optional embodiments, the page information includes: page information of the first N screens of the corresponding page, where N is a natural number greater than or equal to 3.

综上,本公开一个实施例中公开的网络资讯生成装置,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;对第一信息采集结果进行汇总处理,得到第二信息采集结果,以初步去除重复采集的信息内容,减少后续步骤处理数据量,提示信息处理效率;之后,基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,结合页面信息自动递进采集和AIGC内容处理能力,实现信息采集和生成的全流程自动化,不再需要人工总结提炼,响应网络资讯生成需求的时间缩减到分钟级别,极大提升了网络资讯生成时效性。并且,采用本公开实施例公开的网络资讯生成方法,支持用户配置若干网络数据源,生成的网络资讯更加全面。支持用户以提示模板的方式配置信息处理规则,例如配置信息过滤规则、信息提取规则、总结规则、格式转换规则等,按需进行网络信息提取。另一方面,通过以提示模板的方式配置信息处理规则,如配置信息过滤规则、去重规则等,进一步提升生成网络资讯的质量。In summary, the network information generation device disclosed in one embodiment of the present disclosure collects the page information of the preset network data source respectively to obtain the first information collection result corresponding to each network data source; the first information collection result is summarized and processed to obtain the second information collection result, so as to preliminarily remove the information content collected repeatedly, reduce the amount of data processed in the subsequent steps, and prompt the information processing efficiency; then, based on the preset prompt template and the second information collection result, the preset large-scale language model is called to generate the network information corresponding to the preset network data source, and the automatic progressive collection of page information and the AIGC content processing capability are combined to realize the full process automation of information collection and generation, and no longer need manual summary and refinement, and the time to respond to the network information generation demand is reduced to the minute level, which greatly improves the timeliness of network information generation. In addition, the network information generation method disclosed in the embodiment of the present disclosure supports users to configure several network data sources, and the generated network information is more comprehensive. It supports users to configure information processing rules in the form of prompt templates, such as configuring information filtering rules, information extraction rules, summary rules, format conversion rules, etc., and extract network information on demand. On the other hand, by configuring information processing rules in the form of prompt templates, such as configuring information filtering rules, deduplication rules, etc., the quality of generated network information is further improved.

在上述实施例的基础上,本实施例还提供了一种网络资讯生成装置,装置包括:Based on the above embodiment, this embodiment further provides a network information generating device, the device comprising:

请求接收模块,用于响应于HTTP请求,获取HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板;A request receiving module, used to respond to an HTTP request and obtain a prompt template corresponding to a network data source and a preset information processing operation carried in the HTTP request;

网络数据源信息采集模块,用于分别采集网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;A network data source information collection module, used to collect page information of network data sources respectively, and obtain first information collection results corresponding to each network data source;

信息汇总模块,用于对第一信息采集结果进行汇总处理,得到第二信息采集结果;An information aggregation module, used to aggregate the first information collection result to obtain a second information collection result;

资讯生成模块,用于基于提示模板和第二信息采集结果,调用预设大规模语言模型,生成网络数据源对应的网络资讯;An information generation module is used to call a preset large-scale language model based on the prompt template and the second information collection result to generate network information corresponding to the network data source;

请求响应模块,用于针对HTTP请求,输出网络资讯Request response module, used to output network information for HTTP requests

综上,本实施例公开的网络资讯生成装置,通过接收HTTP请求,并获取HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板,之后,分别采集预设的网络数据源的页面信息,得到各网络数据源对应的第一信息采集结果;对第一信息采集结果进行汇总处理,得到第二信息采集结果,以初步去除重复采集的信息内容,减少后续步骤处理数据量,提示信息处理效率;之后,基于预设的提示模板和第二信息采集结果,调用预设大规模语言模型,生成预设的网络数据源对应的网络资讯,结合页面信息自动递进采集和AIGC内容处理能力,实现信息采集和生成的全流程自动化,不再需要人工总结提炼,响应网络资讯生成请求的时间缩减到分钟级别,极大提升了网络资讯生成时效性。并且,采用本公开实施例公开的网络资讯生成装置,支持用户配置若干网络数据源,生成的网络资讯更加全面。支持用户以提示模板的方式配置信息处理规则,例如配置信息过滤规则、信息提取规则、总结规则、格式转换规则等,按需进行网络信息提取。另一方面,通过以提示模板的方式配置信息处理规则,如配置信息过滤规则、去重规则等,进一步提升生成网络资讯的质量。In summary, the network information generating device disclosed in the present embodiment receives an HTTP request and obtains a prompt template corresponding to a network data source and a preset information processing operation carried in the HTTP request. Then, the page information of the preset network data source is collected respectively to obtain a first information collection result corresponding to each network data source; the first information collection result is summarized and processed to obtain a second information collection result, so as to preliminarily remove the repeatedly collected information content, reduce the amount of data processed in subsequent steps, and improve the information processing efficiency; then, based on the preset prompt template and the second information collection result, the preset large-scale language model is called to generate network information corresponding to the preset network data source, and the automatic progressive collection of page information and the AIGC content processing capabilities are combined to realize the full process automation of information collection and generation, and manual summary and refinement are no longer required. The time required to generate network information is reduced to the minute level, which greatly improves the timeliness of network information generation. In addition, the network information generation device disclosed in the embodiment of the present disclosure supports users to configure a number of network data sources, and the generated network information is more comprehensive. It supports users to configure information processing rules in the form of prompt templates, such as configuring information filtering rules, information extraction rules, summary rules, format conversion rules, etc., and extract network information on demand. On the other hand, by configuring information processing rules in the form of prompt templates, such as configuring information filtering rules, deduplication rules, etc., the quality of generated network information is further improved.

本公开实施例还提供了一种非易失性可读存储介质,该存储介质中存储有一个或多个模块(programs),该一个或多个模块被应用在设备时,可以使得该设备执行本公开实施例中各方法步骤的指令(instructions)。The embodiments of the present disclosure also provide a non-volatile readable storage medium, in which one or more modules (programs) are stored. When the one or more modules are applied to a device, the device can execute instructions (instructions) of each method step in the embodiments of the present disclosure.

本公开实施例还提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,计算机执行指令被处理器执行时用于实现如本公开实施例的方法。The embodiment of the present disclosure further provides a computer-readable storage medium, in which computer-executable instructions are stored. When the computer-executable instructions are executed by a processor, they are used to implement the method of the embodiment of the present disclosure.

本公开实施例还提供了一种电子设备,包括:处理器,以及与处理器通信连接的存储器;存储器存储计算机执行指令;处理器执行存储器存储的计算机执行指令,以实现如本公开实施例的方法。本公开实施例中,电子设备包括服务器、终端设备等设备。The present disclosure also provides an electronic device, including: a processor, and a memory connected to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method of the present disclosure. In the present disclosure, the electronic device includes a server, a terminal device, and other devices.

本公开的实施例可被实现为使用任意适当的硬件,固件,软件,或及其任意组合进行想要的配置的装置,该装置可包括服务器(集群)、终端等电子设备。图6示意性地示出了可被用于实现本公开中的各个实施例的示例性装置600。The embodiments of the present disclosure may be implemented as a device configured as desired using any appropriate hardware, firmware, software, or any combination thereof, and the device may include electronic devices such as a server (cluster), a terminal, etc. FIG6 schematically shows an exemplary device 600 that can be used to implement various embodiments of the present disclosure.

对于一个实施例,图6示出了示例性装置600,该装置具有一个或多个处理器602、被耦合到(一个或多个)处理器602中的至少一个的控制模块(芯片组)604、被耦合到控制模块604的存储器606、被耦合到控制模块604的非易失性存储器(Non-Volatile Memory,NVM)/存储设备608、被耦合到控制模块604的一个或多个输入/输出设备610,以及被耦合到控制模块604的网络接口612。For one embodiment, Figure 6 shows an exemplary device 600 having one or more processors 602, a control module (chip set) 604 coupled to at least one of the (one or more) processors 602, a memory 606 coupled to the control module 604, a non-volatile memory (NVM)/storage device 608 coupled to the control module 604, one or more input/output devices 610 coupled to the control module 604, and a network interface 612 coupled to the control module 604.

处理器602可包括一个或多个单核或多核处理器,处理器602可包括通用处理器或专用处理器(例如图形处理器、应用处理器、基频处理器等)的任意组合。在一些实施例中,装置600能够作为本公开实施例中服务端、终端等设备。The processor 602 may include one or more single-core or multi-core processors, and the processor 602 may include any combination of general-purpose processors or special-purpose processors (such as graphics processors, application processors, baseband processors, etc.). In some embodiments, the device 600 can be used as a server, terminal, or other device in the embodiments of the present disclosure.

在一些实施例中,装置600可包括具有指令614的一个或多个计算机可读介质(例如,存储器606或NVM/存储设备608)以及与该一个或多个计算机可读介质相合并被配置为执行指令614以实现模块从而执行本公开中的动作的一个或多个处理器602。In some embodiments, the apparatus 600 may include one or more computer-readable media (e.g., memory 606 or NVM/storage device 608) having instructions 614 and one or more processors 602 combined with the one or more computer-readable media and configured to execute the instructions 614 to implement a module to perform actions in the present disclosure.

对于一个实施例,控制模块604可包括任意适当的接口控制器,以向(一个或多个)处理器602中的至少一个和/或与控制模块604通信的任意适当的设备或组件提供任意适当的接口。For one embodiment, control module 604 may include any suitable interface controller to provide any suitable interface to at least one of processor(s) 602 and/or any suitable device or component in communication with control module 604 .

控制模块604可包括存储器控制器模块,以向存储器606提供接口。存储器控制器模块可以是硬件模块、软件模块和/或固件模块。The control module 604 may include a memory controller module to provide an interface to the memory 606. The memory controller module may be a hardware module, a software module, and/or a firmware module.

存储器606可被用于例如为装置600加载和存储数据和/或指令614。对于一个实施例,存储器606可包括任意适当的易失性存储器,例如,适当的DRAM(Dynamic Random Access Memory,动态随机存取存储器)。在一些实施例中,存储器606可包括双倍数据速率类型四同步动态随机存取存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR4SDRAM)。The memory 606 may be used, for example, to load and store data and/or instructions 614 for the device 600. For one embodiment, the memory 606 may include any suitable volatile memory, such as a suitable DRAM (Dynamic Random Access Memory). In some embodiments, the memory 606 may include a double data rate type four synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR4 SDRAM).

对于一个实施例,控制模块604可包括一个或多个输入/输出控制器,以向NVM/存储设备608及(一个或多个)输入/输出设备610提供接口。For one embodiment, control module 604 may include one or more input/output controllers to provide an interface to NVM/storage device 608 and input/output device(s) 610 .

例如,NVM/存储设备608可被用于存储数据和/或指令614。NVM/存储设备608可包括任意适当的非易失性存储器(例如,闪存)和/或可包括任意适当的(一个或多个)非易失性存储设备(例如,一个或多个硬盘驱动器(Hard Disk Drive,HDD)、一个或多个光盘(Compact Disc,CD)驱动器和/或一个或多个数字通用光盘(DVD)驱动器)。For example, NVM/storage device 608 may be used to store data and/or instructions 614. NVM/storage device 608 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more hard disk drives (HDDs), one or more compact disc (CD) drives, and/or one or more digital versatile disc (DVD) drives).

NVM/存储设备608可包括作为装置600被安装在其上的设备的一部分的存储资源,或者其可被该设备访问可不必作为该设备的一部分。例如,NVM/存储设备608可通过网络经由(一个或多个)输入/输出设备610进行访问。NVM/storage device 608 may include storage resources that are part of the device on which apparatus 600 is installed, or it may be accessible to the device without being part of the device. For example, NVM/storage device 608 may be accessed via (one or more) input/output devices 610 over a network.

(一个或多个)输入/输出设备610可为装置600提供接口以与任意其他适当的设备通信,输入/输出设备610可以包括通信组件、音频组件、传感器组件等。网络接口612可为装置600提供接口以通过一个或多个网络通信,装置600可根据一个或多个无线网络标准和/或协议中的任意标准和/或协议来与无线网络的一个或多个组件进行无线通信,例如接入基于通信标准的无线网络,如蓝牙、WiFi(Wireless Fidelity,无线保真)、2G(第二代移动通信技术)、3G(第三代移动通信技术)、4G(第四代移动通信技术)、5G(第五代移动通信技术)等,或它们的组合进行无线通信。(One or more) input/output devices 610 may provide an interface for the apparatus 600 to communicate with any other appropriate device, and the input/output device 610 may include a communication component, an audio component, a sensor component, etc. The network interface 612 may provide an interface for the apparatus 600 to communicate through one or more networks, and the apparatus 600 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, for example, accessing a wireless network based on a communication standard, such as Bluetooth, WiFi (Wireless Fidelity), 2G (second generation mobile communication technology), 3G (third generation mobile communication technology), 4G (fourth generation mobile communication technology), 5G (fifth generation mobile communication technology), etc., or a combination thereof for wireless communication.

对于一个实施例,(一个或多个)处理器602中的至少一个可与控制模块604的一个或多个控制器(例如,存储器控制器模块)的逻辑封装在一起。对于一个实施例,(一个或多个)处理器602中的至少一个可与控制模块604的一个或多个控制器的逻辑封装在一起以形成系统级封装(Systemin Package,SiP)。对于一个实施例,(一个或多个)处理器602中的至少一个可与控制模块604的一个或多个控制器的逻辑集成在同一模具上。对于一个实施例,(一个或多个)处理器602中的至少一个可与控制模块604的一个或多个控制器的逻辑集成在同一模具上以形成片上系统(Systemon Chip,SoC)。For one embodiment, at least one of the processor(s) 602 may be packaged together with the logic of one or more controllers (e.g., a memory controller module) of the control module 604. For one embodiment, at least one of the processor(s) 602 may be packaged together with the logic of one or more controllers of the control module 604 to form a system in package (SiP). For one embodiment, at least one of the processor(s) 602 may be integrated on the same die with the logic of one or more controllers of the control module 604. For one embodiment, at least one of the processor(s) 602 may be integrated on the same die with the logic of one or more controllers of the control module 604 to form a system on chip (SoC).

在各个实施例中,装置600可以但不限于是:服务器、台式计算设备或移动计算设备(例如,膝上型计算设备、手持计算设备、平板电脑、上网本等)等终端设备。在各个实施例中,装置600可具有更多或更少的组件和/或不同的架构。例如,在一些实施例中,装置600包括一个或多个摄像机、键盘、液晶显示器(Liquid Crystal Display,LCD)屏幕(包括触屏显示器)、非易失性存储器端口、多个天线、图形芯片、专用集成电路(Application-Specific Integrated Circuit,ASIC)和扬声器。In various embodiments, the device 600 may be, but is not limited to, a terminal device such as a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet computer, a netbook, etc.). In various embodiments, the device 600 may have more or fewer components and/or different architectures. For example, in some embodiments, the device 600 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an application-specific integrated circuit (ASIC), and a speaker.

其中,检测装置中可采用主控芯片作为处理器或控制模块,传感器数据、位置信息等存储到存储器或NVM/存储设备中,传感器组可作为输入/输出设备,通信接口可包括网络接口。Among them, the main control chip can be used as a processor or control module in the detection device, sensor data, location information, etc. are stored in a memory or NVM/storage device, the sensor group can be used as an input/output device, and the communication interface may include a network interface.

本公开实施例还提供了一种电子设备,包括:处理器;和存储器,其上存储有可执行代码,当可执行代码被执行时,使得处理器执行如本公开实施例中一个或多个的方法。本公开实施例中存储器中可存储各种数据,如目标文件、文件与应用关联数据等各种数据,还可包括用户行为数据等,从而为各种处理提供数据基础。The disclosed embodiment also provides an electronic device, including: a processor; and a memory, on which executable code is stored, and when the executable code is executed, the processor executes one or more methods in the disclosed embodiment. The disclosed embodiment can store various data in the memory, such as target files, file and application association data, and user behavior data, so as to provide a data basis for various processing.

本公开实施例还提供了一个或多个机器可读介质,其上存储有可执行代码,当可执行代码被执行时,使得处理器执行如本公开实施例中一个或多个所述的方法。The embodiments of the present disclosure also provide one or more machine-readable media on which executable codes are stored. When the executable codes are executed, a processor executes one or more methods described in the embodiments of the present disclosure.

本公开实施例还提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现前述任一实施例的方法。计算机程序存储在可读存储介质中,电子设备的至少一个处理器可以从可读存储介质读取计算机程序,至少一个处理器执行计算机程序使得电子设备执行上述任一方法实施例中网络资讯生成方法的流程,具体功能和所能实现的技术效果此处不再赘述。The present disclosure also provides a computer program product, including a computer program, which implements the method of any of the above embodiments when executed by a processor. The computer program is stored in a readable storage medium, and at least one processor of the electronic device can read the computer program from the readable storage medium. At least one processor executes the computer program so that the electronic device executes the process of the network information generation method in any of the above method embodiments. The specific functions and technical effects that can be achieved are not repeated here.

对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the various embodiments can be referenced to each other.

本公开实施例是参照根据本公开实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The embodiments of the present disclosure are described with reference to the flowcharts and/or block diagrams of the methods, terminal devices (systems), and computer program products according to the embodiments of the present disclosure. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing terminal device to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing terminal device generate a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device so that a series of operating steps are executed on the computer or other programmable terminal device to produce computer-implemented processing, so that the instructions executed on the computer or other programmable terminal device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

尽管已描述了本公开实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本公开实施例范围的所有变更和修改。Although the preferred embodiments of the present disclosure have been described, those skilled in the art may make additional changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the present disclosure.

最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are related. In addition, the terms "include", "comprises" or any other variation thereof are intended to cover non-exclusive inclusion, so that a process, method, article or terminal device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or terminal device. In the absence of further restrictions, an element defined by the phrase "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or terminal device including the element.

以上对本公开所提供的一种网络资讯生成方法、一种网络资讯生成系统、一种电子设备和一种存储介质,进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。The network information generating method, network information generating system, electronic device and storage medium provided by the present disclosure are introduced in detail above. The principle and implementation mode of the present disclosure are explained by using specific examples in this article. The description of the above embodiments is only used to help understand the method and core idea of the present disclosure. At the same time, for the general technical personnel in this field, according to the idea of the present disclosure, there will be changes in the specific implementation mode and application scope. In summary, the content of this specification should not be understood as a limitation on the present disclosure.

Claims (12)

Translated fromChinese
一种网络资讯生成方法,其中,所述方法包括:A method for generating network information, wherein the method comprises:分别采集预设的网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果;Collecting page information of preset network data sources respectively to obtain first information collection results corresponding to each of the network data sources;对所述第一信息采集结果进行汇总处理,得到第二信息采集结果;Summarizing the first information collection results to obtain second information collection results;基于预设的提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述预设的网络数据源对应的网络资讯。Based on the preset prompt template and the second information collection result, a preset large-scale language model is called to generate network information corresponding to the preset network data source.根据权利要求1所述的方法,其中,所述提示模板与预设信息处理操作对应,所述基于预设的提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述预设的网络数据源对应的网络资讯,包括:The method according to claim 1, wherein the prompt template corresponds to a preset information processing operation, and the calling of a preset large-scale language model based on the preset prompt template and the second information collection result to generate network information corresponding to the preset network data source comprises:基于所述第二信息采集结果和所述提示模板,按照所述预设信息处理操作的预设执行顺序,链式调用预设大规模语言模型,生成所述预设的网络数据源对应的网络资讯。Based on the second information collection result and the prompt template, in accordance with the preset execution order of the preset information processing operation, a preset large-scale language model is chain-called to generate network information corresponding to the preset network data source.根据权利要求2所述的方法,其中,所述基于所述第二信息采集结果和所述提示模板,按照所述预设信息处理操作的预设执行顺序,链式调用预设大规模语言模型,生成所述预设的网络数据源对应的网络资讯,包括:The method according to claim 2, wherein the step of chain-calling a preset large-scale language model based on the second information collection result and the prompt template in accordance with a preset execution order of the preset information processing operation to generate network information corresponding to the preset network data source comprises:基于所述第二信息采集结果和所述提示模板,按照所述预设信息处理操作的预设执行顺序,链式执行所述预设信息处理操作,直至生成所述预设的网络数据源对应的网络资讯,其中,所述预设信息处理操作包括以下一种或两种操作:Based on the second information collection result and the prompt template, the preset information processing operations are chained and executed in a preset execution order of the preset information processing operations until network information corresponding to the preset network data source is generated, wherein the preset information processing operations include one or both of the following operations:基于所述第二信息采集结果,格式化当前预设信息处理操作对应的提示模板,生成第一提示词;以及,基于所述第一提示词调用预设大规模语言模型,得到生成内容;Based on the second information collection result, formatting a prompt template corresponding to the current preset information processing operation to generate a first prompt word; and based on the first prompt word, calling a preset large-scale language model to obtain generated content;基于执行前一个预设信息处理操作得到的生成内容,格式化当前预设信息处理操作对应的提示模板,生成第二提示词;以及,基于所述第二提示词调用预设大规模语言模型,得到生成内容。Based on the generated content obtained by executing the previous preset information processing operation, formatting the prompt template corresponding to the current preset information processing operation to generate a second prompt word; and calling the preset large-scale language model based on the second prompt word to obtain the generated content.根据权利要求1-3中任一项所述的方法,其中,所述对所述第一信息采集结果进行汇总处理,得到第二信息采集结果,包括:The method according to any one of claims 1 to 3, wherein the aggregating the first information collection result to obtain the second information collection result comprises:基于预设历史时间内采集的所述预设的网络数据源的信息内容,对所述第一信息采集结果中的信息内容进行增量去重汇总处理,得到第二信息采集结果。Based on the information content of the preset network data source collected within a preset historical time, incremental deduplication and aggregation processing is performed on the information content in the first information collection result to obtain a second information collection result.根据权利要求4所述的方法,其中,所述基于预设历史时间内采集的所述预设的网络数据源的信息内容,对所述第一信息采集结果中的信息内容进行增量去重汇总处理,得到第二信息采集结果,包括:The method according to claim 4, wherein the information content of the preset network data source collected within a preset historical time is incrementally deduplicated and aggregated to obtain the second information collection result, comprising:获取所述网络数据源的历史采集结果,所述历史采集结果是对指定历史时间内采集的所述网络数据源的信息内容进行汇总得到的;Acquire historical collection results of the network data source, where the historical collection results are obtained by summarizing information content of the network data source collected within a specified historical time;基于所述历史采集结果中的信息内容包括的文章标题和/或文章链接地址,对所述第一信息采集结果中重复采集的信息内容进行去重处理,得到增量信息内容;Based on the article titles and/or article link addresses included in the information content in the historical collection results, deduplication processing is performed on the repeatedly collected information content in the first information collection results to obtain incremental information content;将所述增量信息内容和所述历史采集结果中的信息内容,作为所述预设的网络数据源的信息内容的第二信息采集结果。The incremental information content and the information content in the historical collection result are used as the second information collection result of the information content of the preset network data source.根据权利要求1-5中任一项所述的方法,其中,所述分别采集预设的网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果,包括:According to the method according to any one of claims 1 to 5, wherein the collecting of page information of preset network data sources respectively to obtain first information collection results corresponding to each of the network data sources comprises:采用超文本传输协议HTTP协议客户端工具包,采集预设的各网络数据源的各页面的页面信息;Using the Hypertext Transfer Protocol (HTTP) client toolkit to collect page information of each page of each preset network data source;响应于所述页面中目标页面的页面信息采集失败,采用无头浏览器技术采集所述目标页面的页面信息;In response to a failure in collecting page information of a target page in the page, collecting page information of the target page using a headless browser technology;针对各网络数据源,对从所述网络数据源的页面采集的所述页面信息进行解析和提取,得到所述网络数据源对应的第一信息采集结果。For each network data source, the page information collected from the page of the network data source is parsed and extracted to obtain a first information collection result corresponding to the network data source.根据权利要求6所述的方法,其中,所述页面信息,包括:相应页面的前N屏的页面信息,其中,N为大于或等于3的自然数。The method according to claim 6, wherein the page information comprises: page information of the first N screens of the corresponding page, wherein N is a natural number greater than or equal to 3.一种网络资讯生成方法,其中,包括:A method for generating network information, comprising:响应于HTTP请求,获取所述HTTP请求中携带的网络数据源和预设信息处理操作对应的提示模板;In response to the HTTP request, obtaining a prompt template corresponding to a network data source and a preset information processing operation carried in the HTTP request;分别采集所述网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果;Collecting page information of the network data sources respectively to obtain first information collection results corresponding to each of the network data sources;对所述第一信息采集结果进行汇总处理,得到第二信息采集结果;Summarizing the first information collection results to obtain second information collection results;基于所述提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述网络数据源对应的网络资讯;Based on the prompt template and the second information collection result, calling a preset large-scale language model to generate network information corresponding to the network data source;针对所述HTTP请求,输出所述网络资讯。In response to the HTTP request, the network information is output.一种网络资讯生成系统,其中,所述系统包括:客户端和服务端,其中,A network information generation system, wherein the system comprises: a client and a server, wherein:所述客户端,用于获取用户配置的网络数据源和预设信息处理操作对应的提示模板;The client is used to obtain a user-configured network data source and a prompt template corresponding to a preset information processing operation;所述服务端,用于分别采集所述网络数据源的页面信息,得到各所述网络数据源对应的第一信息采集结果;以及,对所述第一信息采集结果进行汇总处理,得到第二信息采集结果;The server is used to collect page information of the network data sources respectively to obtain first information collection results corresponding to each of the network data sources; and to summarize the first information collection results to obtain second information collection results;所述服务端,还用于基于所述提示模板和所述第二信息采集结果,调用预设大规模语言模型,生成所述网络数据源对应的网络资讯;The server is further configured to call a preset large-scale language model based on the prompt template and the second information collection result to generate network information corresponding to the network data source;所述服务端,还用于向所述用户对应的资讯展示接口推送所述网络资讯。The server is also used to push the network information to the information display interface corresponding to the user.一种电子设备,其中,包括:处理器,以及与所述处理器通信连接的存储器;An electronic device, comprising: a processor, and a memory communicatively connected to the processor;所述存储器存储计算机执行指令;The memory stores computer-executable instructions;所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1-8中任一项所述的方法。The processor executes the computer-executable instructions stored in the memory to implement the method according to any one of claims 1 to 8.一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如权利要求1-8中任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, they are used to implement any one of claims 1 to 8. described method.一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如权利要求1-8中任一项所述的方法。A computer program product comprises a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 8 is implemented.
PCT/CN2024/1256112023-10-272024-10-17Network information generation method and system, electronic device, and storage mediumPendingWO2025087150A1 (en)

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
CN202311408987.62023-10-27
CN202311408987.6ACN117609644A (en)2023-10-272023-10-27Network information generation method, system, electronic equipment and storage medium

Publications (1)

Publication NumberPublication Date
WO2025087150A1true WO2025087150A1 (en)2025-05-01

Family

ID=89955117

Family Applications (1)

Application NumberTitlePriority DateFiling Date
PCT/CN2024/125611PendingWO2025087150A1 (en)2023-10-272024-10-17Network information generation method and system, electronic device, and storage medium

Country Status (2)

CountryLink
CN (1)CN117609644A (en)
WO (1)WO2025087150A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117609644A (en)*2023-10-272024-02-27杭州阿里巴巴海外互联网产业有限公司Network information generation method, system, electronic equipment and storage medium
JP7720464B1 (en)*2024-10-182025-08-07株式会社 日立産業制御ソリューションズ Document evaluation criteria extraction device and document evaluation criteria extraction method
CN119337890A (en)*2024-12-202025-01-21南京争锋信息科技有限公司 A network information intelligent analysis method and system based on large model

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106776808A (en)*2016-11-232017-05-31百度在线网络技术(北京)有限公司Information data offering method and device based on artificial intelligence
CN116150613A (en)*2022-08-162023-05-23马上消费金融股份有限公司Information extraction model training method, information extraction method and device
US20230252224A1 (en)*2021-01-222023-08-10Bao TranSystems and methods for machine content generation
CN116738060A (en)*2023-07-032023-09-12陈利人Content generation method and device and electronic equipment
CN117609644A (en)*2023-10-272024-02-27杭州阿里巴巴海外互联网产业有限公司Network information generation method, system, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106776808A (en)*2016-11-232017-05-31百度在线网络技术(北京)有限公司Information data offering method and device based on artificial intelligence
US20230252224A1 (en)*2021-01-222023-08-10Bao TranSystems and methods for machine content generation
CN116150613A (en)*2022-08-162023-05-23马上消费金融股份有限公司Information extraction model training method, information extraction method and device
CN116738060A (en)*2023-07-032023-09-12陈利人Content generation method and device and electronic equipment
CN117609644A (en)*2023-10-272024-02-27杭州阿里巴巴海外互联网产业有限公司Network information generation method, system, electronic equipment and storage medium

Also Published As

Publication numberPublication date
CN117609644A (en)2024-02-27

Similar Documents

PublicationPublication DateTitle
WO2025087150A1 (en)Network information generation method and system, electronic device, and storage medium
CA2865187C (en)Method and system relating to salient content extraction for electronic content
CN101777056B (en)Data storage method and device
CN103390038B (en)A kind of method of structure based on HBase and retrieval increment index
US20080189591A1 (en)Method and system for generating a media presentation
CN104508660A (en) User-defined loading of data onto the database
CN102332030A (en) Data storage, management and query method and system for distributed key-value storage system
US20110078114A1 (en)Independently Variably Scoped Content Rule Application in a Content Management System
CN114297204B (en) Data storage and retrieval method and device for heterogeneous data sources
CN104424271B (en)The automatic acquiring method and system of publication digital resource
CN105975495A (en)Big data storage and search method and apparatus
CN114528813A (en)File conversion management method, device, equipment and medium for online preview
CN117095419A (en)PDF document data processing and information extracting device and method
CN110019169A (en)A kind of method and device of data processing
CN111125485A (en) Scrapy-based website URL crawling method
CN114297143A (en) A method for searching files, a method, device and mobile terminal for displaying files
CN107180119B (en) Digital product generation method and digital product generation device
CN118689850A (en) Automatic filing method, device, electronic equipment and storage medium for electronic files
CN115758001A (en) Web page information extraction method, device, electronic device and storage medium
CN110740046B (en)Method and device for analyzing service contract
CN111833198A (en) A method for intelligently handling insurance terms
US20230342375A1 (en)Extension for Third Party Provider Data Access
CN104063386B (en)A kind of method and apparatus of content object multiplexing
CN107391773A (en)A kind of online text managemant method and apparatus
KR101921123B1 (en)Field-Indexing Method for Message

Legal Events

DateCodeTitleDescription
121Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number:24881515

Country of ref document:EP

Kind code of ref document:A1


[8]ページ先頭

©2009-2025 Movatter.jp