CN120611042A

Movatterモバイル変換

Info

Publication number: CN120611042A
Application number: CN202410235158.0A
Authority: CN
Inventors: 张倩汶; 饶孟良
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-02-29
Filing date: 2024-02-29
Publication date: 2025-09-09

Abstract

The embodiment of the application provides a dialogue abstract generating method, a dialogue abstract generating device, an electronic device, a medium and a program product, which can relate to the fields of artificial intelligence, cloud technology and the like, and particularly can relate to the fields of large models and natural language processing in the field of artificial intelligence. The method comprises the steps of obtaining target dialogue information input by a target object, determining target prompt information corresponding to the target dialogue information, wherein the target prompt information comprises key fields and is used for prompting generation of dialogue abstracts based on the key fields, generating summarization information of the target dialogue information based on the target dialogue information and the target prompt information through a large-scale language model, extracting field information of each target key field from the target dialogue information based on the target prompt information, and taking the summarization information of the target dialogue information and the field information of each target key field as abstract information of the target dialogue information. The method provided by the embodiment of the application can generate the high-quality dialogue abstract.

Description

Dialogue abstract generation method, device, electronic equipment, medium and program product

Technical Field

The application belongs to the technical field of computers, and relates to the fields of artificial intelligence, cloud technology, large models, natural language processing and the like, in particular to a dialog abstract generating method, a dialog abstract generating device, electronic equipment, a medium and a program product.

Background

The dialogue digest is key information in a dialogue, in the dialogue process, in some cases, a history dialogue may be a dialogue performed by an intelligent customer service and a user, and when the dialogue is continued by a manual customer service and a user, for better communication, the history dialogue of the intelligent customer service and the user needs to be known, or even if the dialogue of the intelligent customer service and the user ends, a manager needs to know dialogue content to determine whether the intelligent customer service solves a problem of the user, or in some cases, the dialogue of the user and the customer service (manual customer service or intelligent customer service) needs to be acquired for subsequent data analysis.

In the above process, it is generally necessary to browse all the history dialogue records to review the history dialogue contents of the user and the customer service, but this method of browsing all the history dialogue records to review the history dialogue contents is more complicated, and requires longer time and is less efficient.

Therefore, how to generate a higher-quality dialogue digest corresponding to dialogue information based on the dialogue information becomes a key issue.

Disclosure of Invention

The embodiment of the application aims to provide a dialogue abstract generating method, a dialogue abstract generating device, electronic equipment, a medium and a program product, so as to generate a dialogue abstract corresponding to dialogue information. In order to achieve the above object, the technical solution provided by the embodiments of the present application is as follows:

in a first aspect, a method for generating a dialog abstract is provided, including:

acquiring target dialogue information input by a target object;

determining target prompt information corresponding to the target dialogue information, wherein the target prompt information comprises key fields and is used for prompting the generation of a dialogue abstract based on the key fields, and the key fields comprise at least one first key field;

The following operations are performed by the large-scale language model:

generating summary information of the target dialogue information based on the target dialogue information and the target prompt information, and

Extracting field information of each target key field from the dialogue information based on the target prompt information, wherein the target key field comprises at least one first key field and at least one second key field, and the at least one second key field is obtained based on the at least one first key field;

And taking the summary information of the target dialogue information and the field information of each target key field as abstract information of the target dialogue information.

In one possible implementation, the determining the manner of the at least one second key field includes:

acquiring a plurality of preset fields;

Determining at least one second key field from each preset field based on at least one of the correlation of each preset field with a target field or the similarity of each preset field with at least one first key field;

the target domain is an application domain to which the target dialogue information belongs.

In another possible implementation manner, extracting field information of at least one second key field from the target dialogue information based on the target prompt information includes:

Carrying out recognition of preset information on the target prompt information;

and if the target prompt information comprises the preset information, generating field information of the at least one second key field based on the target dialogue information and the target prompt information.

In another possible implementation manner, the determining at least one second key field from each preset field based on at least one of the correlation between each preset field and the target field or the similarity between each preset field and at least one first key field includes:

And determining at least one preset field which has a correlation with the target field larger than a preset correlation and has a similarity with at least one first key field larger than a first preset similarity in each preset field as at least one second key field.

In another possible implementation manner, the plurality of preset fields belong to a target field set corresponding to the target field;

Determining at least one second key field from each preset field based on the similarity between each preset field and at least one first key field, wherein the determining comprises the following steps:

And determining a field with similarity greater than second preset similarity with the first key field in the target field set aiming at each first key field to obtain a field corresponding to each first key field respectively, wherein the field is used as at least one second key field corresponding to the target dialogue information.

In another possible implementation manner, the determining the target prompt information corresponding to the target dialogue information includes any one of the following:

acquiring configuration information related to the target prompt information input by the target object, and constructing target prompt information corresponding to the target dialogue information based on the configuration information;

Determining an application field corresponding to the target dialogue information, determining prompt information related to the application field from all preconfigured prompt information, and taking the determined prompt information as the target prompt information.

In another possible implementation, the configuration information is determined by at least one of:

Responding to a prompt message construction instruction triggered by the target object received through a configuration interface, and displaying configuration prompt messages related to the application field corresponding to the target dialogue information;

and receiving configuration information related to the target prompt information input by the target object based on the configuration prompt information.

In another possible implementation, the large-scale language model is trained by:

Obtaining a plurality of training samples, wherein each training sample comprises sample dialogue information, dialogue abstract samples corresponding to the sample dialogue information and prompt information corresponding to the sample dialogue information, and each dialogue abstract sample comprises summary information in the sample dialogue information and field information of at least one sample key field in the sample dialogue information;

Training an initial large-scale language model based on the training samples to obtain the large-scale language model, wherein the training loss of the large-scale language model is determined based on the difference between the dialogue abstract sample corresponding to each training sample and the dialogue abstract generated by the model.

In another possible implementation, the large-scale language model is obtained by:

acquiring a plurality of trained candidate large-scale language models;

determining the performance of each candidate large-scale language model;

determining the large-scale language model from the candidate large-scale language models based on the performance of the candidate large-scale language models.

In another possible implementation, the performance of any candidate large-scale language model is determined by:

Obtaining common problem solutions, wherein the common problem solutions comprise at least one problem and answers corresponding to each problem;

For each question, generating model input information corresponding to the question based on the question, wherein the model input information comprises at least one of dialogue information or abstract information;

For each question, inputting model input information corresponding to the question into any target large-scale language model to obtain a predicted answer, and

Matching the similarity between the predicted answer and the answer corresponding to the question to obtain a matching result;

And determining the performance of any large-scale language model based on the matching result corresponding to each problem.

In a second aspect, a dialog digest generation device is provided, the device comprising:

The acquisition module is used for acquiring target dialogue information input by a target object;

The first determining module is used for determining target prompt information corresponding to the target dialogue information, wherein the target prompt information comprises key fields and is used for prompting the generation of a dialogue abstract based on the key fields, and the key fields comprise at least one first key field;

the model execution processing module is used for executing the following operations through the large-scale language model:

In one possible implementation, the apparatus further comprises a second determination module, wherein,

The second determining module is specifically configured to, when determining the at least one second key field:

acquiring a plurality of preset fields;

In another possible implementation manner, the model execution processing module is specifically configured to, when extracting field information of at least one second key field from the target dialogue information based on the target prompt information:

In another possible implementation manner, the second determining module is specifically configured to, when determining at least one second key field from each preset field based on at least one of a correlation between each preset field and a target field, or a similarity between each preset field and at least one first key field:

The second determining module is specifically configured to, when determining at least one second key field from each preset field based on the similarity between each preset field and at least one first key field:

In another possible implementation manner, when determining the target prompt information corresponding to the target dialogue information, the first determining module is specifically configured to any one of the following:

acquiring a plurality of trained candidate large-scale language models;

determining the performance of each candidate large-scale language model;

In a third aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, and the memory stores a computer program, and the processor executes the computer program to implement a method for generating a dialog digest provided by any possible implementation manner of the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements a method for generating a dialog digest provided by any possible implementation of the first aspect.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a method for generating a dialog digest as provided by any of the possible implementations of the first aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1a is a schematic diagram of a dialogue summary generation system according to an embodiment of the present application;

FIG. 1b is a schematic diagram of another system for generating a dialogue summary according to an embodiment of the present application;

FIG. 1c is a schematic diagram of a dialogue summary generating system according to an embodiment of the present application;

FIG. 1d is a schematic flow chart of a method for generating a dialogue abstract according to an embodiment of the application;

FIG. 1e is a flowchart illustrating another method for generating a dialogue summary according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of generating a dialogue abstract in a practical application process according to an embodiment of the application;

FIG. 3a is a schematic diagram illustrating a method for generating a session abstract according to an embodiment of the application;

FIG. 3b is a schematic flow chart of an application process in a method for generating a dialogue abstract in the actual application process in the embodiment of the application;

FIG. 3c is a schematic flow chart of a training process in a method for generating a dialogue abstract in the actual application process of the embodiment of the application;

fig. 4 is a schematic structural diagram of a dialogue digest generating device according to an embodiment of the application;

fig. 5 is a schematic diagram of an apparatus structure of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B". In describing a plurality of (two or more) items, if a relationship between the plurality of items is not explicitly defined, the plurality of items may refer to one, more or all of the plurality of items, for example, the description of "the parameter a includes A1, A2, A3" may be implemented such that the parameter a includes A1 or A2 or A3, and may also be implemented such that the parameter a includes at least two of three items of the parameters A1, A2, A3.

The embodiment of the application provides a dialogue abstract generating method, a device, electronic equipment and a storage medium, which are used for generating an abstract result with higher quality through one-time request in a scene of man-machine interaction so that a requester can better and faster understand dialogue information.

In order to better understand and illustrate the solutions provided by the embodiments of the present application, some technical terms related to the embodiments of the present application are briefly described below.

Large Language Model (LLM), a large-scale language model, is a deep learning-based natural language processing model that learns the grammar and semantics of natural language so that human-readable text can be generated. The "language model" is an AI model for processing only language characters (or symbology), and finds rules therein, and can automatically generate contents conforming to the rules based on prompt (prompt).

Dialog digest-dialog digest is a special case of text digests whose core is directed to dialog class data. Conversational class data takes different forms, such as meetings, boring, mail, dialogs, customer service, and the like. Different forms of dialogue abstracts have different application scenes in the specific field of the dialogue abstracts, but the cores of the dialogue abstracts are consistent with the cores of abstract tasks, so that key information in the dialogue is captured, and the dialogue abstracts help to quickly understand the core content of the dialogue.

Prompt learning (training) is to unify all downstream tasks into a pre-training task, and convert the data of the downstream tasks into a natural language form by a specific template to fully mine the capability of the pre-training model. Essentially, a template which is matched with an upstream pre-training task is designed, the potential of the upstream pre-training model is dug through the design of the template, the upstream pre-training model can well complete the downstream task under the condition that data do not need to be marked as much as possible, and the method comprises the following steps:

1. designing a task of a pre-training language model;

2. designing an input template pattern (Prompt Engineering);

3. The manner in which the output of the label style and model map to label is designed (ANSWER ENGINEERING).

Prompt text, representing commands or instructions, to indicate to the large language model what action needs to be performed or to generate output, i.e., what action the model should take or generate when performing a particular task.

NLG (Natural Language Generation) natural language generation, which converts non-natural language data into a natural language form which is easy to understand and use by utilizing artificial intelligence and natural language processing crimes, and the main purpose is to reduce the communication gap between human beings and machines and convert the data in a non-language format into a language format which can be understood by human beings.

The scheme provided by the embodiment of the application relates to the field of artificial intelligence, wherein the artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The scheme provided by the embodiment of the application can particularly relate to the directions of natural language processing, machine learning/deep learning and the like in the AI field, wherein the natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by people in daily life, is closely researched with linguistics, and simultaneously relates to computer science and mathematics. An important technique for model training in the artificial intelligence domain, a pre-training model, is developed from a large language model (Large Language Model) in the NLP domain. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Optionally, the data processing (for example, acquisition of a training set in a training stage of a large language model, processing target dialogue information by using a trained large language model, and the like) involved in the method provided by the embodiment of the application may be implemented based on cloud computing. Cloud computing (clouding) is a computing model that distributes computing tasks across a large pool of computers, enabling various application systems to acquire computing power, storage space, and information services as needed.

It should be noted that, in the alternative embodiment of the present application, related data such as the target dialogue information and the target prompt information are required to obtain the permission or consent of the object when the embodiment of the present application is applied to a specific product or technology, and the collection, use and processing of the related data are required to comply with related laws and regulations and standards of related countries and regions. That is, in the embodiment of the present application, if data related to the object is involved, the data needs to be acquired through the approval of the object, the approval of the related department, and the compliance with the related laws and regulations and standards of the country and region. In the embodiment, for example, the personal information is involved, the acquisition of all the personal information needs to obtain the personal consent, for example, the sensitive information is involved, the individual consent of the information body needs to be obtained, and the embodiment also needs to be implemented under the condition of the authorized consent of the object.

In the related art, the conversation digest may include (1) a question/solution digest that is call center-specific function for providing a digest of questions and solutions in a conversation between a customer service agent and a customer, (2) a chapter title digest that provides suggested input conversation chapter titles, and (3) a narrative digest that provides conversation notes, meeting notes, or chat digests of the input conversation.

The function in the step (1) is mainly suitable for customer service-customer online chatting scenes, customer service telephone technologies and user questions such as customer service questions and customer service answers are extracted from dialogue contents, and can be used for subsequent user hotspot question analysis or customer service telephone technology library construction, so that a customer service robot is optimized. The damage of the function (2) title abstract to the information is relatively large, and the function (3) narrative abstract has the problem that a user cannot find important points.

In order to solve the technical problems, the embodiment of the application provides a dialogue abstract generation scheme based on heuristics, which is applied to dialogue scenes (such as a meeting abstract, a customer service work order abstract and the like). In various conversational business scenarios, the "dimension reduction" processing of information by business parties is very important because of the redundancy of conversations, and the extraction of core information in conversations can greatly accelerate the follow-up cost. In the embodiment of the application, the LLM model is utilized to unify two tasks of the dialogue abstract (generating the abstract and extracting the abstract), elicitations are added into the prompt template through prompt learning, and key field information is extracted while the induction model can generate dialogue nodules during LLM output, so that the LLM model can better match the service requirements. In the embodiment of the application, the generating function and the extracting function are fused together through the prompt information, heuristic information prompt is innovatively used for helping the dialogue abstract to extract key information, so that extracted fields contained in the obtained abstract information are not limited to the provided field range, LLM can find out important fields in the field according to the prompt information and field output which does not appear in the prompt information, and meanwhile, common problem solutions (Frequently-Asked Questions, FAQ for short) are introduced to summarize tasks to realize quality verification, help the model learn key information in the dialogue and expand the available scene of LLM. The embodiment of the application innovatively provides a universal dialogue scheme which can ensure that a business party obtains a summary result with higher quality in one request.

The embodiment of the application can be applied to dialogue type service scenes, and can refine dialogue information of long-term cuisine in the service scenes such as telephone, conference, instant messaging chat and the like, so as to accelerate follow-up cost.

Further, referring to fig. 1a, fig. 1a is a schematic structural diagram of a dialog abstract generating system according to an embodiment of the present application. The dialog digest generation system shown in fig. 1a includes a first server 110 and a terminal 120.

The terminal 120 may include, but is not limited to, one or more of a variety of desktop computers, notebook computers, smartphones, tablet computers, internet of things devices, portable wearable devices, or immersive image display devices, among others, which the terminal 120 may include, but is not limited to. The internet of things equipment can be one or more of an intelligent sound box, an intelligent television, an intelligent air conditioner, or intelligent vehicle-mounted equipment. The portable wearable device may be one or more of a smart watch, a smart bracelet, or a headset device, etc. Immersive image display devices include, but are not limited to, augmented Reality (Augmented Reality, AR) devices, virtual Reality (VR) devices, and the like.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

Specifically, the first server 110 may be used as a training server to obtain a plurality of training sets including a plurality of training samples and perform training, so as to obtain a trained large-scale language model. The trained large-scale language model may then be deployed in the first server 110 or the terminal 120. When summary information needs to be generated for the target dialogue information, the target prompt information and the target dialogue information can be acquired through the terminal 120, and then the summary information of the target dialogue information can be generated based on the trained large-scale language model.

It may be understood that, for deployment of the trained large-scale language model on the first server 110, after the terminal 120 obtains the target prompt information and the target session information, the target prompt information and the target session information may be sent to the first server 110, the first server 110 generates summary information of the target session information based on the target prompt information and the target session information, and then the terminal 120 receives the summary information of the target session information fed back by the first server 110 and displays the summary information, for example, the recommended object is advertisement 1, that is, the terminal 120 displays the advertisement 1. For deployment of the trained large language model at the terminal 120, the terminal 120 may directly invoke the trained large language model to generate summary information of the target session information, or may display the summary information after generating the summary information of the target session information.

The target session information may be obtained through the terminal 120 or may be obtained from another device, which is not limited herein.

In some scenarios, training may be performed through the first server 110, and the trained large-scale language model after training may be deployed on the first server 110, in addition, training may be performed through the terminal 120, and the trained target large-scale language model may be deployed on the terminal 120, in addition, the first server 110 and the terminal 120 server may be cooperatively trained, which is not limited herein.

Referring to fig. 1b, fig. 1b is a schematic diagram of another dialogue digest generation system according to an embodiment of the application. The dialog digest generation system shown in fig. 1b includes a first server 110, a terminal 120, and a second server 130. In this embodiment, the first server 110 may be used as a training server, and the second server 130 may be used as a session digest generation server, that is, a session digest is generated in the second server 130 for the target session information.

In some scenarios, it may also be the first server 110, and at least one of the terminal 120 or the second server 130 co-trains, which is not limited herein.

Specifically, the first server 110 may train to obtain a trained large-scale language model, and deploy the trained large-scale language model in the second server 130, so that the target prompt information and the target dialogue information may be obtained through the terminal 120, and the target prompt information and the target dialogue information are sent to the second server 130, and the second server 130 may further generate abstract information to the terminal 120, that is, if the second server 130 generates the abstract information of the target dialogue information, the abstract information is fed back to the terminal 120.

Referring to fig. 1c, fig. 1c is a schematic structural diagram of another object recommendation system according to an embodiment of the present application. The object recommendation system as shown in fig. 1c includes a first server 110, a terminal 120, and a plurality of second servers 130.

The first server 110 may train to obtain models of abstract information of target dialogue information of each scene, and deploy each target large language model in a different second server 130. After the target dialogue information is acquired through the terminal 120, determining an application scene of the target dialogue information, so that the target dialogue information and the target prompt information are sent to the corresponding second server 130, and the second server 130 which receives the target prompt information and the target dialogue information processes the target dialogue information by using a deployed target large-scale language model to obtain abstract information corresponding to the target dialogue information and feeds the abstract information back to the terminal 120;

It will be appreciated that the above large-scale language system is an example of some cases and does not constitute all cases of implementing the technical solution of the embodiments of the present application.

The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that, on the premise that the following embodiments do not conflict with each other, the following embodiments may refer to, reference or combine, and the description will not be repeated for the same terms, similar features, similar implementation steps, etc. in different embodiments.

Fig. 1d shows a method for generating a session abstract provided in the embodiment of the present application, where the method may be executed by a terminal, or may be executed by a server, for example, the method may be deployed in an application running in the terminal, where the application may provide a man-machine interaction service, and output abstract information in target session information for a user, that is, when the user may obtain, through a man-machine interaction interface of the application (a user interface of the application), target session information and target prompt information input by the user, the abstract information corresponding to the target session information may be generated and fed back to the user, where an operation of generating the abstract information corresponding to the target session information may be executed in the user terminal, or may be executed by the server, that is, a large-scale language model after training may be deployed in the terminal, or may be deployed in the server, or alternatively, may be deployed in the server, so as to reduce resource occupation of the application on the terminal, and be deployed in the server to facilitate updating or upgrading of the model.

As shown in fig. 1d, the method for generating a dialogue digest according to the embodiment of the application may include the following steps S101 to S103.

S101, acquiring target dialogue information input by a target object.

Further, the target object may also input target dialogue information in video format, extract dialogue information in speech format therefrom, and convert dialogue information in speech format into target dialogue information in text format.

S102, determining target prompt information corresponding to the target dialogue information.

For the embodiment of the application, different dialogue information can correspond to the same prompt information, and different dialogue information can correspond to different prompt information. In the embodiment of the application, if different dialogue information corresponds to different prompt information, the dialogue information in the same application field can correspond to the same prompt information, or the dialogue information in the same application field can also correspond to different prompt information.

Further, if the different dialog messages correspond to the same prompt message, that is, the prompt message is fixed, if the dialog messages in the same application field correspond to the same prompt message, that is, one field corresponds to one prompt message, after the target dialog message is obtained, the application field to which the target dialog message belongs is determined, so as to determine the prompt message corresponding to the application field, as the target prompt message corresponding to the target dialog message, if the dialog messages in the same application field can correspond to different prompt messages, at this time, the target prompt message corresponding to the target dialog message can be determined in other manners, and furthermore, the prompt message corresponding to the different dialog messages can be input by the target object or obtained according to the configuration of the target object, which is not limited in the embodiment of the application.

In the embodiment of the application, the at least one first key field is a key field carried in the target prompt information to prompt the extraction of field information corresponding to each first key field from the target dialogue information, that is, the dialogue abstract generated based on the target prompt information at least contains the field information corresponding to each first key field in the at least one first key field.

It should be noted that, step S102 may be performed before step S101, may be performed after step S101, may be performed simultaneously with step S101, and is not limited in the embodiment of the present application.

S103, generating summary information of the target dialogue information based on the target dialogue information and the target prompt information through a large-scale language model, and extracting field information of each target key field from the dialogue information based on the target prompt Sydney.

The summary information of the target dialogue information and the field information of each target key field are taken as the summary information of the target dialogue information.

After the summary information of the target dialogue information and the field information extracted to each target key field are obtained in the above manner, the summary information of the target dialogue information and the field information of each target key field are used as summary information of the target dialogue information. In the embodiment of the application, the target prompt information is used for prompting the generation of the dialogue abstract based on the target key fields in the target dialogue information, and the target key fields comprise at least one first key field and at least one second key field, namely the generated summary information of the target dialogue information comprises the summary information of the target dialogue information, the field information of the first key field and the field information of the second key field.

Specifically, in the embodiment of the present application, the target prompt information includes at least one first key field, and at least one second key field can be heuristically obtained based on the at least one first key field, so that when the field information of the first key field is extracted from the target dialogue information, the field information of the second key field can be extracted.

It should be noted that, the summary information of the target session information and the field information of the target key field (the field information of the first key field and the field information of the second key field) may be displayed separately, that is, by different paragraphs, and the generated summary information of the target session information may also include the field information of the first key field and the field information of the second key field. Optionally, in the real application, the LLM is utilized to integrate the generation function and the extraction function through the target prompt information to help the conversation abstract extract key information, so that the scheme meets the service requirement. The LLM is utilized to generate the summary of the target dialogue information and extract the field information of the first key field and the field information of the second key field from the summary information to obtain the summary information, so that the summary information generated by the embodiment of the application meets the service requirement more and the quality of the summary information is higher.

Further, if the summary information of the target dialogue information is generated by the server side, the server side sends the generated summary information of the target dialogue information to the terminal so as to enable the terminal to display, and if the summary information of the target dialogue information is generated by the terminal, the terminal can directly display after generating the summary information of the target dialogue information.

Further, as can be seen from the above embodiment, the generated summary information includes field information of each target key field, and the target key field includes a first key field and a second key field, that is, the generated summary information includes field information of the first key field, further, the target key field may also include at least one second key field, that is, the field information of the target key field included in the target summary information generated by the above embodiment may also include field information of at least one second key field in addition to field information of the first key field. That is, if the target prompt message only includes the first key field, the trained large-scale language model can be prompted in a heuristic manner, on the basis of extracting the field information of the first key field, not only is the provided field range limited, but also the field output which is important in the field and does not appear in the prompt message can be found according to the prompt, namely, the field information of the second key field can be extracted, that is, the generated abstract information can include a dialogue summary, the field information of each first key field and the field information of each second key field, so that the quality of the abstract information generation is further improved, and the experience degree of the target object is improved.

Specifically, the manner of determining the at least one second key field may specifically include a step Sa and a step Sb, as shown in fig. 1e, where only the processing flows of the step Sa and the step Sb are shown in fig. 1 e;

It should be noted that, the steps Sa and Sb may be performed before the step S103, or may be performed during the step S103, which is not limited in the embodiment of the present application, where the steps Sa and Sb are as follows:

step Sa, obtaining a plurality of preset fields.

For the embodiment of the application, the electronic device may locally store a plurality of preset fields, that is, may acquire a plurality of preset fields from the local storage when determining at least one second key field, or may include a plurality of preset fields in the trained large-scale language model.

And step Sb, determining at least one second key field from the preset fields based on at least one of the correlation between the preset fields and the target field or the similarity between the preset fields and the at least one first key field.

The target field is an application field to which the target dialogue information belongs.

Specifically, in the embodiment of the present application, after a plurality of preset fields are acquired, a correlation between each preset field and an application field to which the target session information belongs is determined, that is, whether each preset field belongs to the application field to which the target session information belongs or not is determined, and/or a similarity between each preset field and at least one first key field is determined, where the determining of the similarity between each preset field and at least one first key field may specifically include determining the similarity between each preset field and each first key field, where one first key field may have at least one similar preset field or may not have a similar preset field, and the embodiment of the present application is not limited thereto.

That is, only the first key field may exist in the target prompt message to prompt the field information including the first key field when generating the dialog digest information according to the target dialog information, and the at least one second key field may be determined heuristically based on the first key field, so that the field information including the first key field may be generated based on the target dialog information while the field information including the at least one second key field may be also included. For example, if the first key field in the target reminder information contains a contact name, the second key field may include a contact phone.

Specifically, the method for determining the at least one second key field from the preset fields based on at least one of the correlation between the preset fields and the target field or the similarity between the preset fields and the at least one first key field comprises the steps of determining at least one preset field, which has the correlation with the target field larger than the preset correlation and has the similarity with the at least one first key field larger than the first preset similarity, of the preset fields as the at least one second key field.

That is, after determining the application domain to which the target dialogue information belongs, that is, after determining the target domain, at least one preset field, which has a correlation with the target domain greater than a preset correlation and a similarity with at least one first key field greater than a first preset similarity, is determined from the preset fields, and the determined at least one preset field is determined as at least one second key field.

Further, the plurality of preset fields belong to a target field set corresponding to the target field, that is, the acquired plurality of preset fields belong to the target field set corresponding to the target field, at this time, when determining at least one second key field of the target dialogue, only the fields with the similarity greater than the second preset similarity with the first key field need to be determined from the plurality of preset fields, that is, the at least one second key field is determined from the preset fields based on the similarity between the preset fields and the at least one first key field in the step Sb, specifically, for each first key field, determining the fields with the similarity greater than the second preset similarity with the first key field in the target field set, so as to obtain the fields with the similarity greater than the second preset similarity with the first key field, which are respectively corresponding to the first key fields, as the at least one second key field corresponding to the target dialogue information.

Specifically, if the target prompt message includes only one first key field, at this time, a field with similarity greater than a second preset similarity to the first key field is determined from the target field set, so as to be used as at least one second key field corresponding to the target dialogue message, of course, there may not be a field with similarity greater than the second preset similarity to the first key field in the target field set, and if the target prompt message includes at least two first key fields, at this time, for each first key field, a field with similarity greater than the second preset similarity to the first key field is determined from the target field set, where for each first key field there may not be a second key field, or for one first key field there may not be at least two second key fields, which is not limited in the embodiment of the present application.

In order to further improve the experience of the target object, so as to generate summary information which is higher in quality and meets the requirements of the target object, namely when the target object needs to contain field information of a first key field and field information of a second key field in the generated summary information, the second key field is heuristically determined and field information of the second key field is extracted when the summary information is generated, and if the target object only needs the first key field, the generated summary information can only contain the field information of the first key field and does not contain the field information of the second key field.

That is, the step S103 of extracting the field information of the at least one second key field from the target dialogue information based on the target prompt information may specifically include identifying the preset information of the target prompt information, and if the target prompt information includes the preset information, generating the field information of the at least one second key field based on the target dialogue information and the target prompt information.

In the embodiment of the application, the large-scale language model can identify the target identification information to identify whether the target prompt information contains preset information, if the target prompt information is identified to contain the preset information, the intention of the characterization target object is that the field information of the first key field is extracted from the target dialogue information, the field information of the second key field is further extracted, if the target prompt information is identified to not contain the preset information, the intention of the characterization target object is that the field information of the first key field is extracted from the target dialogue information, and the heuristic extraction of the field information of the second key field is not needed, namely, in the embodiment of the application, the intention of the target object is accurately determined by identifying whether the target prompt information contains the preset information, and the experience of the target object is further improved while the quality of the generated abstract information is improved.

For example, the preset information may include "etc.", "at least" and other preset words, that is, the target prompt information is "extracting key field information mentioned in the dialogue," the fields include a vehicle type, a consultation problem, a contact person, a vehicle owner, a vehicle frame number, a dealer, etc. "or" extracting key field information mentioned in the dialogue, "the fields include at least a vehicle type, a consultation problem, a contact person, a vehicle owner, a vehicle frame number, a dealer," when the target prompt information is identified to include "etc." or "at least" and other key fields related to these fields are extracted heuristically.

Further, for the target dialogue information and the target prompt information, the field information for generating the at least one second key field may be specifically described in the above embodiments, which is not described herein again.

Further, as can be seen from the above embodiments, when generating the summary information corresponding to the target session information, the target prompt information corresponding to the target session information needs to be specifically determined according to the target prompt information, which may specifically include any one of the following (mode 1 and mode 2), where,

The method 1 comprises the steps of obtaining configuration information related to target prompt information input by a target object, and constructing target prompt information corresponding to target dialogue information based on the configuration information.

For the embodiment of the application, in one possible implementation manner, the target object can input configuration information related to the target prompt information through a man-machine interaction interface (an input interface of an application program) corresponding to the trained large-scale language model, and then obtain the target prompt information corresponding to the target dialogue information based on the configuration information so as to display through the man-machine interaction interface; the method for obtaining the target prompt information corresponding to the target dialogue information based on the configuration information can be generated by the trained large-scale language model or can be generated in other modes (for example, generated by other models);

for example, the configuration information input by the target object may include "dialogue summary, vehicle model, consultation problem, contact person, vehicle owner, vehicle frame number, dealer, etc." and the target prompt information may be generated based on the configuration information as follows:

The method comprises the steps of generating a work order abstract according to the following telephone conversation content, requiring 1. Fluency in expression and emphasis, 2. Firstly, summarizing the whole conversation, 3. Secondly, summarizing 1-2 conversation keywords, and 4. Extracting key field information mentioned in the conversation, wherein the fields comprise vehicle types, consultation problems, contacts, vehicle owners, vehicle frame numbers, dealers and the like. ".

In another possible implementation manner, the electronic device may display each configuration information, so that the target object selects from each configuration information, so as to generate the target prompt information corresponding to the target dialogue information based on each selected configuration information. The specific manner of generating the target prompt information corresponding to the target dialogue information is detailed in the above embodiment, and will not be described herein.

Specifically, in the embodiment of the present application, the configuration information is determined in the following manner (manner 3 and manner 4), wherein,

And 3, responding to a prompt message construction instruction triggered by the target object received through the configuration interface, and displaying configuration prompt messages related to the application field corresponding to the target dialogue information.

That is, the target object may trigger a virtual key or input a trigger instruction in the configuration interface to trigger a prompt message construction instruction, and display configuration prompt information related to an application field corresponding to the target dialogue information, where the configuration prompt information is used to prompt the target object to input accurate configuration information.

For example, the configuration prompt information may be a configuration prompt template to prompt the target object to specifically select which configuration information, for example, in a vehicle enterprise scene, the attention degree on the vehicle type, the vehicle frame number and the like is higher, and the configuration information may be set based on the vehicle type, the vehicle frame number and the like as heuristic words.

Further, the configuration prompt information may not be specific to a certain application field, that is, the configuration prompt template is a general prompt template, so as to prompt the target to select which configuration information.

And 4, receiving configuration information related to the target prompt information input by the target object based on the configuration prompt information.

Further, the target user may input (select) configuration information related to the configuration prompt information according to the configuration prompt information.

And 2, determining the application field corresponding to the target dialogue information, determining prompt information related to the application field from all preconfigured prompt information, and taking the determined prompt information as the target prompt information.

Specifically, prompt information corresponding to each field can be preconfigured, for example, prompt information 1 is preconfigured for the automobile field, prompt information 2 is preconfigured for the food field, and the like, after the application field corresponding to the target dialogue information is determined, the prompt information related to the field is determined from the preconfigured prompt information, for example, the application field corresponding to the target dialogue information is the automobile field, and then the configuration prompt information 1 is determined to be the target prompt information.

Further, after the target prompt information is determined, the target prompt information can be displayed without a human-computer interaction interface (an input interface of an application program) corresponding to the trained large-scale language model, and can be directly used for generating abstract information of the target dialogue information, or after the target prompt information is determined, the target prompt information can be displayed through a human-computer interaction interface (an input interface of an application program) corresponding to the large-scale language model, and of course, in the display process, the target object can also adjust the prompt information, so that the abstract information corresponding to the target dialogue information can be generated later according to the adjusted prompt information. Further, in the embodiment of the present application, a plurality of prompt messages may be configured in advance, and further, according to the habit of the target object, a target prompt message may be configured for the target object, or the target object may select one prompt message from the plurality of preset prompt messages as the target prompt message.

Further, the target dialogue information may be input for the target object as described above, in addition to the target dialogue information may be imported into the large-scale language model, that is, may be in a file format, or the target object may select a dialogue box with a certain contact through a selection operation, so as to import the dialogue information with the contact into the large-scale language model, so as to generate summary information through the large-scale language model. For example, as shown in fig. 2, the left side of fig. 2 may display all sessions, e.g., dialogs corresponding to guest 1, guest 2, guest 3, guest 4, and guest 5, respectively, and the target object may select one of the dialogs, e.g., select the dialog of guest 3, to display the history message record of guest 3 in the middle part of fig. 2, and then may click "create a work order" so that target dialog information and prompt information are input to enable summary generation for the target dialog information.

The large-scale language model is trained by acquiring a plurality of training samples, and training the initial large-scale language model based on the plurality of training samples to obtain the large-scale language model.

Each training sample comprises sample dialogue information, dialogue abstract samples corresponding to the sample dialogue information and prompt information corresponding to the sample dialogue information, wherein the dialogue abstract samples comprise summary information in the sample dialogue information and field information of at least one sample key field in the sample dialogue information.

Specifically, the training samples are constructed to include various types of text dialog information (text by automatic speech recognition (Automatic Speech Recognition, ASR) techniques if audio is present for a telephone, conference, etc.). The method comprises the steps of constructing a plurality of prompt messages based on various collected dialogue texts, wherein the construction mode of the prompt messages is to firstly explain tasks, secondly provide requirement details, related tasks comprise dialogue nodules, key fields and high-quality FAQ, and follow-up tasks are expandable, such as summarizing titles and the like.

For example, dialog information is defined to be characterized by D, prompt information is characterized by P, and dialog nodules, key fields and high-quality FAQ are respectively characterized by T1, T2 and T3. Capital letters represent collections. A certain group of dialogs D in D corresponds to answers to t1, t2, t 3. A certain requirement of P will correspond to any combination of t1, t2, t 3. And constructing training data according to the combination corresponding relation.

For example, the p corresponding to a certain group of conversations d can comprise the steps of requesting to generate a work order abstract according to the following telephone conversation content, requiring 1. Fluency in expression and emphasis, 2. Firstly, summarizing the whole conversation, 3. Secondly, summarizing 1-2 conversation keywords, 4. Extracting key field information mentioned in the conversation, wherein the fields comprise vehicle types, consultation problems, contacts, vehicle owners, vehicle frame numbers, dealers and the like. The output results of the corresponding dialogue junction and the key fields, namely the data of t1 and t 2.

For another example, a group of conversations d may correspond to p including a known XXXX customer service and a customer's conversation record as follows, please make a conversation summary. The key information is required to be kept 1, and 2, high-quality questions and answers can be extracted. The output results of the corresponding dialogue summary and the high-quality FAQ, namely the data of t1 and t 3.

Further, training loss of the large-scale language model is determined based on differences between the dialog digest samples corresponding to each training sample and the dialog digests generated by the model.

Specifically, training is performed by the training samples shown in the above embodiments. During training, the model iteratively adjusts the parameter values until the model can correctly predict the next token from the previous input token sequence. To this end, the model uses self-learning techniques that teach the model to adjust parameters to maximize the likelihood of correctly predicting the next token in the training example. After fine-tuning training, LLM can be easily adapted to perform multiple tasks using a relatively small supervised data set.

Further, by the training method shown in the foregoing embodiment, a large-scale language model may be trained to perform abstract generation, and in order to improve the quality of abstract generation, a plurality of candidate large-scale language models may be trained to determine a large-scale language model for abstract generation of target dialogue information therefrom, where the model architecture of each candidate large-scale language model may be different, that is, the models of different model architectures may be trained to obtain each candidate large-scale language model, or each candidate large-scale language model may be a model trained in different stages (each candidate large-scale language model may be trained using different sample amounts for an initial model).

The performance of any candidate large-scale language model is determined by acquiring common problem solutions, generating model input information corresponding to each problem based on the problem, inputting the model input information corresponding to the problem into any target large-scale language model for each problem to obtain a predicted answer, matching the predicted answer with the answer corresponding to the problem in a similarity manner to obtain a matching result, and determining the performance of any large-scale language model based on the matching result corresponding to each problem.

The common problem solutions comprise at least one problem and answers corresponding to each problem, and the model input information comprises at least one item of dialogue information or abstract information. In the embodiment of the application, the dialogue format information corresponding to each question in the common question solutions and the abstract format information are generated based on each question in the common question solutions and are respectively input into any target large-scale language model to obtain the predicted answer, the predicted answer is matched with the answer corresponding to the question in the common question solutions, and if the matching result represents that the matching degree is higher, the performance of the large-scale language model corresponding to the matching result is higher.

Furthermore, whether the questions in the common question solutions can be obtained by taking the original dialogue and the dialogue summary (abstract) as inputs can better promote the improvement of training data.

Referring to fig. 3a, the called or input teleconference audio is converted into dialogue text by ASR, and then is input into LLM, and fig. 3a shows two examples of prompt information under different scenes, the target prompt information is determined, and then is input into LLM, corresponding dialogue nodules can be output through LLM, the dialogue nodules are currently requested dialogue nodules and are output in text paragraph form, field information with key fields is output, specifically, when the field information with key word prompt is output in the target prompt information, the key fields in the dialogue information can be extracted in heuristic mode, and furthermore, FAQ with high quality can be extracted, namely, customer service-customer service-etc. customer hotspots and customer questions can be extracted from the dialogue content, and the method can be used for subsequent customer hotspot problem analysis and customer service library establishment, and customer service robot optimization.

It should be noted that, the dialogue and the target prompt information of the LLM input may be as shown in fig. 3.

An implementation procedure of an alternative embodiment of the present application is described below with reference to fig. 3b and 3c, where the implementation procedure includes a model training phase and a model application phase, as shown in fig. 3b, where the model application phase may include step S11, step S12, and step S13, and as shown in fig. 3c, the model training phase may include step S21 and step S22, and each step of the two phases is described below, where the model application phase is as follows:

step S11, target dialogue information input by a target object is acquired.

Alternatively, as an example, the target dialogue information input by the target object may be shown in the following table one, wherein the partial dialogue information contained in the table one is dialogue information between customer service and customer in the telephone scene.

List one

Step S12, determining target prompt information corresponding to the target dialogue information, wherein the target prompt information is used for prompting generation of a dialogue abstract based on target key fields in the target dialogue information, and the key fields comprise at least one first key field.

Specifically, the target prompt information corresponding to the target dialogue information may be input to the man-machine interaction interface corresponding to the LLM by the target object, or may be imported to the man-machine interaction interface by other modes. As an example in table two below, a target hint information is obtained for table two above, wherein,

Watch II

The first key fields in the second table are respectively a vehicle type, a consultation problem, a contact person, a vehicle owner, a vehicle frame number and a dealer.

And step S13, generating abstract information of the target dialogue information based on the target dialogue information and the target prompt information through the trained large-scale language model, wherein the abstract information comprises the abstract information of the target dialogue information, the field information of each first key field and the field information of at least one second key field obtained by heuristic.

In the embodiment of the application, the target dialogue information and the target prompt information are input into the large-scale language model to obtain summary information containing the target dialogue information, the field information of each first key field and the abstract information of the field information of at least one second key field obtained by heuristic, and the abstract information is output in the man-machine interaction interface. Wherein the summary information generated based on the first and second tables may be as shown in table three, wherein,

Watch III

Wherein "contact phone 138XXXXXXXX" is heuristic field information of the second key field obtained.

Continuing, as shown in fig. 3c, the model training phase may include step S21 and step S22, wherein,

Step S21, a plurality of training samples are acquired.

Each training sample comprises sample dialogue information, dialogue abstract samples corresponding to the sample dialogue information and prompt information corresponding to the sample dialogue information, wherein the dialogue abstract samples comprise summary information in the sample dialogue information and field information of at least one sample key field in the sample dialogue information;

and step S22, training the initial large-scale language model based on a plurality of training samples to obtain a trained large-scale language model.

Wherein the training loss of the large-scale language model is determined based on the difference between the dialogue digest sample corresponding to each training sample and the dialogue digest generated by the model.

It is to be understood that the various alternative embodiments of the present application may be implemented alone or in combination in actual practice without conflict between the embodiments.

Based on the same principle as the method provided by the embodiment of the present application, the embodiment of the present application further provides a device for generating a session digest, as shown in fig. 4, where the device 40 for generating a session digest may include:

an obtaining module 41, configured to obtain target dialogue information input by a target object;

a first determining module 42, configured to determine target prompt information corresponding to the target dialogue information, where the target prompt information includes a key field, and is configured to prompt that the dialogue abstract is generated based on the key field, and the key field includes at least one first key field;

the model execution processing module 43 is configured to execute the following operations through the trained large-scale language model:

Extracting field information of each target key field from the target dialogue information based on the target prompt information, wherein the target key field comprises at least one first key field and at least one second key field, and the at least one second key field is obtained based on the at least one first key field;

In a possible implementation manner of the embodiment of the present application, the apparatus 40 further includes a second determining module, where,

acquiring a plurality of preset fields;

determining at least one second key field from each preset field based on at least one of the correlation of each preset field with the target field or the similarity of each preset field with at least one first key field;

In another possible implementation manner of the embodiment of the present application, the model execution processing module 43 is specifically configured to, when extracting field information of at least one second key field from the target dialogue information based on the target prompt information:

If the target prompt information contains preset information, generating field information of at least one second key field based on the target dialogue information and the target prompt information.

In another possible implementation manner of the embodiment of the present application, the second determining module is specifically configured to, when determining at least one second key field from each preset field based on at least one of a correlation between each preset field and the target field or a similarity between each preset field and at least one first key field:

And determining at least one preset field which is in each preset field, has a correlation with the target field greater than the preset correlation and has a similarity with at least one first key field greater than the first preset similarity as at least one second key field.

In another possible implementation manner of the embodiment of the present application, a plurality of preset fields belong to a target field set corresponding to a target field;

And determining a field with the similarity larger than a second preset similarity in the target field set for each first key field to obtain a field corresponding to each first key field respectively, wherein the fields are used as at least one second key field corresponding to the target dialogue information.

In another possible implementation manner of the embodiment of the present application, when determining the target prompt information corresponding to the target dialogue information, the first determining module 42 is specifically configured to any one of the following:

Acquiring configuration information related to target prompt information input by a target object, and constructing target prompt information corresponding to target dialogue information based on the configuration information;

Another possible implementation manner of the embodiment of the present application, the configuration information is determined by the following manner:

Responding to a prompt message construction instruction triggered by a target object received through a configuration interface, and displaying configuration prompt messages related to the application field corresponding to the target dialogue information;

The receiving target object inputs configuration information related to the target prompt information based on the configuration prompt information.

Another possible implementation of the embodiment of the present application, the large-scale language model is trained by:

Training the initial large-scale language model based on a plurality of training samples to obtain a large-scale language model, wherein the training loss of the large-scale language model is determined based on the difference between the dialogue abstract sample corresponding to each training sample and the dialogue abstract generated by the model.

Another possible implementation of an embodiment of the present application, the large-scale language model is obtained by:

acquiring a plurality of trained candidate large-scale language models;

Determining the performance of each candidate large-scale language model;

a large-scale language model is determined from the candidate large-scale language models based on the performance of the candidate large-scale language models.

In another possible implementation of the embodiment of the present application, the performance of any candidate large-scale language model is determined by:

for each question, generating model input information corresponding to the question based on the question, the model input information including at least one of dialogue information or summary information;

for each question, inputting the model input information corresponding to the question into any target large-scale language model to obtain a predicted answer, and

It should be noted that, the first determining module 41 and the second determining module may be the same module or may be different modules, which is not limited in the embodiment of the present application.

According to the embodiment of the application, after target dialogue information input by a target object and target prompt information corresponding to the target dialogue information are acquired, summary information aiming at the target dialogue information can be generated through a large-scale language model and based on the target prompt information, at least one second key field is obtained heuristically based on a first key field in the target prompt information, field information of the first key field and field information of the second key field are extracted from the target dialogue information, so that summary information of the target dialogue information is obtained based on the summary information of the target dialogue information, the field information of the first key field and the field information of the second key field, namely, in the embodiment of the application, besides the summary information of the target dialogue information and the field information of the first key field, the field information of the second key field can be obtained heuristically, so that summary information which meets the requirements of the target object can be generated through one-time prompt information, and further experience of the target object can be generated.

The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.

There is also provided in an embodiment of the application an electronic device comprising at least one processor configured to perform the steps of the method provided in any of the alternative embodiments of the application. Optionally, the electronic device may further comprise a transceiver and/or a memory coupled to the processor, the memory having stored therein a computer program which, when executed by the processor, may implement the solutions provided by any of the alternative embodiments of the present application. Alternatively, the electronic device may be a user terminal or a server.

Fig. 5 shows a schematic structural diagram of an electronic device, which is applicable to the embodiment of the present application, and as shown in fig. 5, the electronic device may be a server or a user terminal, for example, and the electronic device may be used to implement the method provided in any embodiment of the present application.

As shown in fig. 5, the electronic device 2000 may mainly include at least one processor 2001 (one is shown in fig. 5), a memory 2002, a communication module 2003, and input/output interface 2004, etc., and optionally, the components may be in communication with each other through a bus 2005. It should be noted that the structure of the electronic device 2000 shown in fig. 5 is only schematic, and does not limit the electronic device to which the method provided in the embodiment of the present application is applicable.

The memory 2002 may be used to store an operating system, application programs, and the like, which may include computer programs that implement the methods of embodiments of the present invention when called by the processor 2001, and may also include programs for implementing other functions or services. Memory 2002 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and computer programs, EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The processor 2001 is connected to the memory 2002 via a bus 2005, and executes a corresponding function by calling an application program stored in the memory 2002. The Processor 2001 may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor, data signal Processor), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmable GATE ARRAY ) or other programmable logic device, transistor logic device, hardware components, or any combination thereof, which may implement or execute the various exemplary logic blocks, modules and circuits described in connection with the present disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

The electronic device 2000 may be coupled to a network through a communication module 2003 (which may include, but is not limited to, components such as a network interface) to enable interaction of data, such as sending data to or receiving data from other devices, through communication of the network with other devices, such as user terminals or servers, etc. Among other things, the communication module 2003 may include a wired network interface and/or a wireless network interface, etc., i.e., the communication module may include at least one of a wired communication module or a wireless communication module.

The electronic device 2000 may be connected to a desired input/output device, such as a keyboard, a display device, etc., through an input/output interface 2004, and the electronic device 2000 itself may have a display device, or may be externally connected to other display devices through the interface 2004. Optionally, a storage device, such as a hard disk, may be connected to the interface 2004, so that data in the electronic device 2000 may be stored in the storage device, or data in the storage device may be read, and data in the storage device may be stored in the memory 2002. It will be appreciated that the input/output interface 2004 may be a wired interface or a wireless interface. The device connected to the input/output interface 2004 may be a component of the electronic device 2000 or may be an external device connected to the electronic device 2000 when necessary, depending on the actual application scenario.

Bus 2005, which is used to connect the various components, may include a path to transfer information between the components. Bus 2005 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus, or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 2005 can be classified into an address bus, a data bus, a control bus, and the like according to functions.

Alternatively, for the solution provided by the embodiment of the present application, the memory 2002 may be used for storing a computer program for executing the solution of the present application, and the processor 2001 executes the computer program, where the processor 2001 executes the computer program to implement the actions of the method or the apparatus provided by the embodiment of the present application.

Based on the same principle as the method provided by the embodiment of the present application, the embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the computer program can implement the corresponding content of the foregoing method embodiment when executed by a processor.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the respective aspects of the method embodiments described above.

It should be noted that the terms "first," "second," "third," "fourth," "1," "2," and the like in the description and claims of the present application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.

It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.

In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.

The foregoing is merely an optional implementation manner of some of the implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, and the implementation manner is also within the protection scope of the embodiments of the present application.

Claims

Translated fromChinese

1.一种对话摘要生成方法，其特征在于，所述方法包括：1. A method for generating a conversation summary, characterized in that the method comprises:

获取目标对象输入的目标对话信息；Get the target dialogue information input by the target object;

确定所述目标对话信息对应的目标提示信息，所述目标提示信息中包含关键字段，并用于提示基于关键字段生成对话摘要，所述关键字段包括至少一个第一关键字段；Determining target prompt information corresponding to the target conversation information, wherein the target prompt information includes key fields and is used to prompt generation of a conversation summary based on the key fields, wherein the key fields include at least one first key field;

通过大规模语言模型执行以下操作：Large-scale language models can do the following:

基于所述目标对话信息和所述目标提示信息，生成所述目标对话信息的总结信息；以及，generating summary information of the target dialogue information based on the target dialogue information and the target prompt information; and

基于所述目标提示信息，从所述目标对话信息中抽取每个目标关键字段的字段信息，所述目标关键字段包括所述至少一个第一关键字段和至少一个第二关键字段，所述至少一个第二关键字段是基于所述至少一个第一关键字段所得到；Extracting, based on the target prompt information, field information of each target key field from the target conversation information, the target key field including the at least one first key field and at least one second key field, the at least one second key field being obtained based on the at least one first key field;

其中，将所述目标对话信息的总结信息和所述每个目标关键字段的字段信息作为所述目标对话信息的摘要信息。The summary information of the target conversation information and the field information of each target key field are used as the summary information of the target conversation information.

2.根据权利要求1所述的方法，其特征在于，确定所述至少一个第二关键字段的方式，包括：2. The method according to claim 1, wherein the method of determining the at least one second key field comprises:

获取多个预设字段；Get multiple preset fields;

基于各所述预设字段与目标领域的相关性，或者各所述预设字段与至少一个所述第一关键字段的相似度中的至少一项，从各所述预设字段中确定至少一个第二关键字段；Determining at least one second key field from each of the preset fields based on at least one of a relevance between each of the preset fields and the target domain or a similarity between each of the preset fields and at least one of the first key fields;

其中，所述目标领域为所述目标对话信息所属的应用领域。The target domain is the application domain to which the target dialogue information belongs.

3.根据权利要求2所述的方法，其特征在于，基于所述目标提示信息，从所述目标对话信息中抽取至少一个第二关键字段的字段信息，包括：3. The method according to claim 2, wherein extracting field information of at least one second key field from the target dialogue information based on the target prompt information comprises:

对所述目标提示信息进行预设信息的识别；Identifying preset information for the target prompt information;

若所述目标提示信息中包含所述预设信息，则基于所述目标对话信息和所述目标提示信息，生成所述至少一个第二关键字段的字段信息。If the target prompt information includes the preset information, field information of the at least one second key field is generated based on the target dialogue information and the target prompt information.

4.根据权利要求2所述的方法，其特征在于，所述基于各所述预设字段与目标领域的相关性，或者各所述预设字段与至少一个所述第一关键字段的相似度中的至少一项，从各所述预设字段中确定至少一个第二关键字段，包括：4. The method according to claim 2, wherein determining at least one second key field from each of the preset fields based on at least one of a correlation between each of the preset fields and the target domain or a similarity between each of the preset fields and at least one of the first key fields comprises:

将各所述预设字段中，与目标领域的相关性大于预设相关性，且与至少一个所述第一关键字段的相似度大于第一预设相似度的至少一个预设字段，确定为至少一个所述第二关键字段。Among the preset fields, at least one preset field having a correlation with the target field greater than a preset correlation and a similarity with at least one of the first key fields greater than a first preset similarity is determined as at least one of the second key fields.

5.根据权利要求2所述的方法，其特征在于，所述多个预设字段属于所述目标领域所对应的目标字段集合；5. The method according to claim 2, wherein the plurality of preset fields belong to a target field set corresponding to the target domain;

基于各所述预设字段与至少一个所述第一关键字段的相似度，从各所述预设字段中确定至少一个第二关键字段，包括：Determining at least one second key field from each of the preset fields based on similarity between each of the preset fields and at least one of the first key fields includes:

针对每一第一关键字段，确定所述目标字段集合中与该第一关键字段相似度大于第二预设相似度的字段，以得到各第一关键字段分别对应的字段，作为所述目标对话信息对应的至少一个第二关键字段。For each first key field, a field in the target field set having a similarity with the first key field greater than a second preset similarity is determined to obtain a field corresponding to each first key field as at least one second key field corresponding to the target conversation information.

6.根据权利要求1所述的方法，其特征在于，所述确定所述目标对话信息所对应的目标提示信息，包括以下任一项：6. The method according to claim 1, wherein determining the target prompt information corresponding to the target dialogue information comprises any one of the following:

获取所述目标对象输入的与所述目标提示信息有关的配置信息，并基于所述配置信息构建所述目标对话信息所对应的目标提示信息；Acquire configuration information related to the target prompt information input by the target object, and construct target prompt information corresponding to the target dialogue information based on the configuration information;

确定所述目标对话信息所对应的应用领域，从各预配置的提示信息中确定出与所述应用领域相关的提示信息，将确定出的提示信息作为所述目标提示信息。An application field corresponding to the target dialogue information is determined, prompt information related to the application field is determined from various pre-configured prompt information, and the determined prompt information is used as the target prompt information.

7.根据权利要求6所述的方法，其特征在于，所述配置信息是通过以下方式确定的：7. The method according to claim 6, wherein the configuration information is determined by:

响应于通过配置界面接收的所述目标对象触发的提示信息构建指令，显示与所述目标对话信息对应的应用领域相关的配置提示信息；In response to a prompt information construction instruction triggered by the target object received through the configuration interface, displaying configuration prompt information related to the application field corresponding to the target dialogue information;

接收所述目标对象基于所述配置提示信息输入与所述目标提示信息有关的配置信息。The target object receives configuration information related to the target prompt information based on the configuration prompt information.

8.根据权利要求1所述的方法，其特征在于，所述大规模语言模型是通过以下方式训练得到的：8. The method according to claim 1, wherein the large-scale language model is trained by:

获取多个训练样本，其中，每一所述训练样本包括：样本对话信息、所述样本对话信息对应的对话摘要样本、以及所述样本对话信息所对应的提示信息，所述对话摘要样本包括：所述样本对话信息中的总结信息、以及所述样本对话信息中的至少一个样本关键字段的字段信息；Acquire multiple training samples, wherein each training sample includes: sample conversation information, a conversation summary sample corresponding to the sample conversation information, and prompt information corresponding to the sample conversation information, wherein the conversation summary sample includes: summary information in the sample conversation information and field information of at least one sample key field in the sample conversation information;

基于所述多个训练样本，对初始大规模语言模型进行训练，得到所述大规模语言模型，其中，所述大规模语言模型的训练损失是基于各所述训练样本对应的对话摘要样本和通过模型生成的对话摘要之间的差异确定的。Based on the multiple training samples, an initial large-scale language model is trained to obtain the large-scale language model, wherein the training loss of the large-scale language model is determined based on the difference between the conversation summary sample corresponding to each of the training samples and the conversation summary generated by the model.

9.根据权利要求1至8任一项所述的方法，其特征在于，所述大规模语言模型是通过以下方式得到的：9. The method according to any one of claims 1 to 8, wherein the large-scale language model is obtained by:

获取多个训练后的候选大规模语言模型；Obtain multiple trained candidate large-scale language models;

确定各所述候选大规模语言模型的性能；determining the performance of each of the candidate large-scale language models;

基于各所述候选大规模语言模型的性能，从各所述候选大规模语言模型中确定所述大规模的语言模型。The large-scale language model is determined from the candidate large-scale language models based on performance of the candidate large-scale language models.

10.根据权利要求9所述的方法，其特征在于，任一候选大规模语言模型的性能是通过以下方式确定的：10. The method according to claim 9, wherein the performance of any candidate large-scale language model is determined by:

获取常见问题解答，所述常见问题解答中包括至少一个问题以及每一问题对应的答案；Obtaining answers to frequently asked questions, wherein the answers to frequently asked questions include at least one question and an answer to each question;

对于每一所述问题，基于该问题，生成该问题对应的模型输入信息，所述模型输入信息包括对话信息或摘要信息中的至少一项；For each of the questions, based on the question, generating model input information corresponding to the question, the model input information including at least one of the dialogue information or the summary information;

针对每一所述问题，将该问题所对应的模型输入信息输入至所述任一目标大规模语言模型中，得到预测答案，以及，For each of the questions, input the model input information corresponding to the question into any of the target large-scale language models to obtain a predicted answer, and

将所述预测答案与该问题对应的答案进行相似度匹配，得到匹配结果；Perform similarity matching between the predicted answer and the answer corresponding to the question to obtain a matching result;

基于各问题对应的匹配结果，确定所述任一大规模的语言模型的性能。Based on the matching results corresponding to each question, the performance of any of the large-scale language models is determined.

11.一种对话摘要生成装置，其特征在于，所述装置包括：11. A device for generating a conversation summary, characterized in that the device comprises:

获取模块，用于获取目标对象输入的目标对话信息；An acquisition module is used to obtain target dialogue information input by a target object;

第一确定模块，用于确定所述目标对话信息对应的目标提示信息，所述目标提示信息中包含关键字段，并用于提示基于关键字段生成对话摘要，所述关键字段包括至少一个第一关键字段；A first determining module is configured to determine target prompt information corresponding to the target conversation information, wherein the target prompt information includes key fields and is configured to prompt generation of a conversation summary based on the key fields, wherein the key fields include at least one first key field;

模型执行处理模块，用于通过大规模语言模型执行以下操作：The model execution processing module is used to perform the following operations using a large-scale language model:

12.一种电子设备，其特征在于，所述电子设备包括存储器和处理器，所述存储器中存储有计算机程序，所述处理器在运行所述计算机程序时执行权利要求1至10任一项所述的对话摘要生成的方法。12. An electronic device, characterized in that the electronic device comprises a memory and a processor, wherein a computer program is stored in the memory, and the processor executes the method for generating a conversation summary according to any one of claims 1 to 10 when running the computer program.

13.一种计算机可读存储介质，其特征在于，所述存储介质中存储有计算机程序，所述计算机程序被处理器执行时实现权利要求1至10任一项所述的对话摘要生成的方法。13. A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method for generating a conversation summary according to any one of claims 1 to 10 is implemented.

14.一种计算机程序产品，包括计算机程序，其特征在于，所述计算机程序被处理器执行时实现权利要求1-10任一项所述的对话摘要生成的方法的步骤。14. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the steps of the method for generating a conversation summary according to any one of claims 1 to 10 are implemented.