CN113919336A

Movatterモバイル変換

Info

Publication number: CN113919336A
Application number: CN202111223411.3A
Authority: CN
Inventors: 马亿凯
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-11

Abstract

The application relates to an artificial intelligence technology, and provides an article generation method, an article generation device, computer equipment and a storage medium based on deep learning, wherein the article generation method, the article generation device, the computer equipment and the storage medium comprise the following steps: acquiring a hot topic set, and screening candidate topics from the hot topic set according to a first preset rule; identifying the type of the candidate theme, and matching a corresponding target outline according to the type; obtaining the subject content which accords with the preset condition with the candidate subject; extracting abstract texts from the subject contents according to the target outline, and filling the abstract texts into a target area corresponding to the target outline to obtain initial soft texts; and acquiring a target illustration corresponding to the abstract text, and filling the target illustration to the target position of the initial soft text to obtain the target soft text. The application can improve the efficiency of soft text generation, promotes the rapid development in wisdom city.

Description

Article generation method and device based on deep learning and related equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an article generation method and apparatus based on deep learning, a computer device, and a medium.

Background

At present, enterprises are more and more advertising, and in order to enable the advertising to be played to obtain the effect closer to users, under the common condition, many enterprises adopt the soft text mode to play the advertising. A good product soft text is compared with the conventional articles in terms of degree of writing, and is often more likely to cause substitution feeling for customers than who, so that related products are purchased as soon as possible.

In the process of implementing the present application, the applicant finds that the following technical problems exist in the prior art: the product class soft texts in the industry at present mainly come from internal operation teams and external acquisition and purchase, and pain points exist in the internal operation teams and the external acquisition and purchase: the former has high requirements on operators, not only needs to understand writing and understand related services of products, but also has extreme scarcity of talents, so that the output efficiency of high-quality soft texts is low; the content of the latter products is more professional, but the cost of purchasing the products is huge for the company, so that the production cost of the high-quality soft texts is too high.

Therefore, it is necessary to provide a method for generating a document based on deep learning, which can improve the efficiency of generating the document.

Disclosure of Invention

In view of the above, it is desirable to provide a method, an apparatus, a computer device and a medium for generating a deep learning-based article, which can improve the efficiency of article generation.

A first aspect of an embodiment of the present application provides a method for generating an article based on deep learning, where the method for generating an article based on deep learning includes:

acquiring a hot topic set, and screening candidate topics from the hot topic set according to a first preset rule;

identifying the type of the candidate theme, and matching a corresponding target outline according to the type;

obtaining the subject content which accords with the preset condition with the candidate subject;

extracting abstract texts from the subject contents according to the target outline, and filling the abstract texts into a target area corresponding to the target outline to obtain initial soft texts;

and acquiring a target illustration corresponding to the abstract text, and filling the target illustration to the target position of the initial soft text to obtain the target soft text.

Further, in the article generating method based on deep learning provided by the embodiment of the present application, the acquiring a trending topic set includes:

obtaining a chat record set from a first target platform system;

extracting topic keywords corresponding to each chat record in the chat record set, and selecting target topic keywords with the occurrence frequency higher than a preset frequency threshold value as internal topics;

collecting a plurality of news lists from a second target platform system;

acquiring a target news list matched with preset vocabularies as external topics;

and combining the internal topics and the external topics to obtain a popular topic collection.

Further, in the above deep learning-based article generating method provided by the embodiment of the present application, before the determining the type of the candidate topic, the method further includes:

acquiring historical soft text sets corresponding to different types of hot topics;

extracting event element information corresponding to each historical soft text in the historical soft text set to generate an initial event set;

calculating the importance degree of each event element in the initial event set in the event depiction, and selecting the target event elements with the importance degree higher than a preset importance degree threshold value to form an initial summary frame corresponding to the event;

and determining target outlines corresponding to the hot topics of all types based on the initial summary framework.

Further, in the above deep learning-based article generating method provided by the embodiment of the present application, the identifying the type of the candidate topic includes:

performing word segmentation processing and part-of-speech tagging on the candidate topics by using a word segmentation and part-of-speech tagging combined model constructed by fusing external knowledge to obtain word segmentation results carrying part-of-speech tagging;

detecting whether the word segmentation result contains preset keywords or not;

and when the detection result is that the word segmentation result contains the preset keyword, determining the type corresponding to the preset keyword as the type of the candidate theme.

Further, in the above deep learning-based article generating method provided in an embodiment of the present application, the extracting a summary text from the theme content according to the target outline includes:

clustering the subject contents by taking sentences as units to obtain a clustering result;

calling a pre-trained subject word extraction model to extract subject words from the clustering result to obtain subject words corresponding to the subject contents;

counting words and word frequencies of the words with the same or similar semantics with the subject words in the subject content, and combining the words and the subject words to obtain a high-frequency word set corresponding to the subject content;

constructing a text network graph for the subject content, and extracting an abstract based on the text network graph and the high-frequency word set to obtain a candidate abstract sentence cluster;

and removing redundancy of the candidate abstract sentence group to obtain an initial abstract text, and optimizing the initial abstract text to obtain a target abstract text.

Further, in the above deep learning-based article generating method provided in an embodiment of the present application, the acquiring a target illustration corresponding to the abstract text includes:

acquiring a subject term corresponding to the abstract text;

acquiring a label set corresponding to an illustration in a preset database;

detecting whether the similarity degree of a target label and the subject term in the label set exceeds a preset similarity threshold value or not;

and when the detection result shows that the similarity degree of the target label and the subject term exceeds a preset similarity threshold value, determining an illustration corresponding to the target label as a target illustration.

Further, in the above deep learning-based article generating method provided in the embodiment of the present application, the filling the target illustration to the target position of the initial soft text to obtain the target soft text includes:

acquiring a target abstract text corresponding to the target illustration;

inquiring a preset typesetting format according to the target abstract text to obtain a target position corresponding to the target illustration;

and filling the target illustration to the target position to obtain the target soft text.

A second aspect of the embodiments of the present application further provides an article generating apparatus based on deep learning, where the article generating apparatus based on deep learning includes:

the topic screening module is used for acquiring a hot topic set and screening candidate topics from the hot topic set according to a first preset rule;

the outline matching module is used for identifying the types of the candidate topics and matching the corresponding target outlines according to the types;

the content acquisition module is used for acquiring the theme content which accords with the candidate theme with the preset condition;

the abstract extraction module is used for extracting an abstract text from the subject content according to the target outline and filling the abstract text into a target area corresponding to the target outline to obtain an initial soft text;

and the image insertion filling module is used for acquiring a target image corresponding to the abstract text and filling the target image to the target position of the initial soft text to obtain the target soft text.

A third aspect of embodiments of the present application further provides a computer device, where the computer device includes a processor, and the processor is configured to implement the deep learning-based article generation method according to any one of the above items when executing a computer program stored in a memory.

The fourth aspect of the embodiments of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the article generation method based on deep learning according to any one of the above-mentioned methods is implemented.

According to the article generation method based on deep learning, the article generation device based on deep learning, the computer equipment and the computer readable storage medium, the candidate topics are screened from the hot topics in a centralized mode, the target soft texts are generated according to the candidate topics, the hot events can be mastered automatically and comprehensively, manual searching and screening of the hot topics are avoided, timeliness of the target soft texts is guaranteed, and generation efficiency of the target soft texts is improved; according to the method and the device, the abstract text is automatically generated, and the abstract text is filled into the target area to obtain the soft text, so that the accuracy and the efficiency of generating the abstract text can be improved, and the accuracy and the efficiency of generating the soft text are improved. The application can be applied to various functional modules of smart cities such as smart government affairs and smart traffic, for example, the article generation module based on deep learning of smart government affairs can promote the rapid development of the smart cities.

Drawings

Fig. 1 is a flowchart of an article generation method based on deep learning according to an embodiment of the present application.

Fig. 2 is a block diagram of an article generation apparatus based on deep learning according to a second embodiment of the present application.

Fig. 3 is a schematic structural diagram of a computer device provided in the third embodiment of the present application.

The following detailed description will further illustrate the present application in conjunction with the above-described figures.

Detailed Description

In order that the above objects, features and advantages of the present application can be more clearly understood, a detailed description of the present application will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present application, and the described embodiments are a part, but not all, of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The article generation method based on deep learning provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the article generation device based on deep learning runs in the computer equipment. Fig. 1 is a flowchart of an article generation method based on deep learning according to an embodiment of the present application. As shown in fig. 1, the article generation method based on deep learning may include the following steps, and the order of the steps in the flowchart may be changed and some may be omitted according to different requirements:

s11, obtaining a topical topic set, and screening candidate topics from the topical topic set according to a first preset rule.

In at least one embodiment of the present application, to ensure that the subject of the soft text is sufficiently novel, operators often need to be sensitive to various trending topics. When a hot topic appears, the soft text composition needs to be completed and output at the first time. However, as the internet age becomes more and more mature, the life cycle of a hot topic at present may be 1 day, half day and 2-3 hours, so that the ability of operators and the company auditing process system are greatly checked. The application provides a hot topic monitoring system for carry out communication connection to first target platform system and second target platform system, and follow first target platform system with acquire the hot topic collection in the second target platform system. In one embodiment, the first target platform system may be an internal customer service system, and the second target platform system may be an external news information system. The trending topic set comprises a plurality of internal topics and a plurality of external topics, wherein the internal topics can be topics acquired from the internal customer service system, and the external topics can be topics acquired from the external news information system. The internal topics may be "what is a policyholder", "what is called a policy loan", and the like, and the external topics may be "4 deaths and 2 injuries caused by traffic accidents occurring in the eastern tai' an", "collision of more than 40 vehicles in Shanxi Bao Mao", and the like, which is not limited herein. The candidate subject refers to a subject in the hot topic set for generating the target soft text. In an embodiment, the first preset rule may be a randomly selected rule, that is, the candidate topic may be a randomly selected topic in the trending topic set. In other embodiments, the first preset rule may also be a rule selected according to the reading preference of the user, that is, the reading preference of the target user in a period of time is obtained, and topics similar to the reading preference are selected from the trending topic set to determine the candidate topics, which is not limited herein.

Optionally, the obtaining of the trending topic set includes:

obtaining a chat record set from a first target platform system;

collecting a plurality of news list lists from the second target platform system;

The method comprises the steps of establishing communication connection with an internal customer service system and an external news information system respectively, obtaining internal topics from the internal customer service system and external topics from the external news information system respectively, and combining the internal topics and the external topics to obtain a popular topic set. The chat record set can be a chat record between a client and an intelligent customer service or a chat record between a client and an artificial customer service. Most of the chat records represent the concept problem of the client for the related products, for example, the client can consult the problems of 'what is a medical insurance' and 'what is a life insurance'. The frequency of topic keywords corresponding to each question is calculated to determine whether the topic is a hot topic, wherein the topic keywords may refer to words such as "medical insurance", "life insurance", and the like associated with a product. It can be understood that when the occurrence frequency is higher than the preset frequency threshold, the target topic keyword is determined to be an internal topic; when the occurrence frequency is lower than a preset frequency threshold, determining that the target topic keyword is not an internal topic. The preset frequency threshold is a preset threshold for evaluating whether the question is an internal topic.

The preset vocabulary refers to preset vocabulary associated with a product, and can be stored in a preset database, and the preset database can be a node in a block chain in consideration of privacy and reliability of data storage. Taking a product as an insurance product and a target soft language as an insurance class soft language as an example, the preset vocabulary can refer to the common vocabulary of the insurance industry in the insurance vocabulary, 1300 more items of the whole book entry are divided into 8 classes and collected according to the classes; the basic entries are arranged in English letters by English-Chinese contrast. Whether the news list matched with the preset vocabulary exists in the external news information system or not is detected to determine whether the news list can be used as an external topic, and it can be understood that when the detection result is that the news list matched with the preset vocabulary exists in the external news information system, the target news list is determined to be the external topic; and when the detection result is that no news list matched with the preset vocabulary exists in the external news information system, determining that the target news list is not the external topic.

And S12, identifying the type of the candidate theme, and matching the corresponding target outline according to the type.

In at least one embodiment of the present application, the insurance-like soft texts are taken as an example, and the types of the candidate topics can be roughly divided into concept introduction, objection processing, product interpretation, and claim cases. The application provides different modules for different subject types, and for example, for a subject type of a claim case candidate, a corresponding target outline has 4 modules of customer condition, insurance passing, claim settlement result, story enlightenment and the like. And for different types, the system comprises corresponding target outlines, wherein the target outlines refer to module contents corresponding to the soft texts, and one outline corresponds to one module. It can be understood that, when the number of the target outline is four, the number of the module contents corresponding to the soft text is also four.

Optionally, before the identifying the type of the candidate topic, the method further comprises:

The historical soft texts comprise a plurality of historical soft texts, the historical soft texts correspond to hot topics of different types, the historical soft texts can be articles stored in a preset database and written by operators, the historical soft texts are divided into a plurality of paragraphs according to categories, each paragraph corresponds to one category, and the categories are not limited. In one embodiment, the historical texts in the historical soft text set can be the texts with reading amount higher than a preset reading amount threshold value, and the preset reading amount threshold value is a preset threshold value used for evaluating the audience degree of the historical texts. According to the method and the device, the frame extraction is carried out on the soft text with the reading amount higher than the preset reading amount threshold value, the target outline meeting the preference requirement of a reader can be obtained, the accuracy of obtaining the target outline is improved, and therefore the accuracy of generating the target soft text is improved.

The main task of event extraction is to find events from massive network data and perform structured processing around event elements, illustratively, a statement extractor is used to divide historical soft texts into sentences, a natural language processing tool is used to perform lexical and syntactic analysis on the historical soft texts, the historical soft texts are analyzed into a form of a syntax tree, and dependency relationships are identified. According to the structural features of words in a grammar tree and an entity element database, conducting named entity recognition on historical soft texts, mining entity element information such as event names, event occurrence time, event occurrence addresses, event participants, event occurrence reasons and event influence involved in events, and storing the event element information according to a preset data format to obtain an initial event set. The preset data format is a preset format for storing event element information.

The more critical the event element is to describe the event, the larger the value of the importance degree of the event element is, and the value range is between 0 and 1. In one embodiment, the importance of the event element is determined by the frequency of co-occurrence of the event element in the historical soft text, and the greater the frequency of occurrence of the event element in the historical soft text, the greater the importance of the event element; the less frequently an event element occurs in the history softword, the less important the event element is. And selecting the target event elements with the importance degrees higher than a preset importance degree threshold value to form an initial summary frame corresponding to the event, wherein the preset importance degree threshold value is a preset threshold value for evaluating the importance degrees of the event elements.

Wherein the determining the target outline corresponding to each type based on the initial outline frame may include: obtaining the category of the event elements in the initial summary frame; and traversing a preset mapping relation between the categories and the schemas according to the categories to obtain the target schemas corresponding to the categories. The mapping relationship between the categories and the outline is preset, the categories can include but are not limited to party information, party experiences, event results and reason profiling, and the outline can include but is not limited to customer conditions, insurance experiences, claim settlement results and story revelations. In one embodiment, for each category, a number of necessary event elements corresponding to the category may be set, and when the necessary event elements are included in the initial summary frame, the category to which the event elements belong may be determined. Illustratively, when only the event name and the event element of the event participant appear in a certain paragraph of the history soft text, since { event participant } is a necessary event element of the category of the client situation, the category corresponding to the paragraph can be the client situation, and the schema corresponding to the category is the client situation as can be seen by traversing the mapping relationship between the category and the schema.

Optionally, the identifying the type of the candidate topic comprises:

detecting whether the word segmentation result contains preset keywords or not;

Generally, a syntactic analysis tool is used to perform sentence segmentation processing on a section of text, and perform word segmentation (Segmentor), part-of-speech tagging (posttagger), and syntactic analysis (Parser) in sequence to obtain a word segmentation result. For example, when the candidate topic is "what is an applicant", the corresponding segmentation result may be "what", "is", "applicant", wherein "what" belongs to a preset keyword, the preset keyword is associated with two types of "concept import" and "product interpretation", and whether the type of the candidate topic belongs to "concept import" or "product interpretation" is determined by detecting whether the segmentation result includes an insurance product name. It can be understood that, since the word segmentation result does not include the insurance product name, the type corresponding to the candidate topic is known as "concept import". For another example, when the candidate topic is "what is important disease insurance", the corresponding segmentation result may be "what", "is", "important disease insurance", wherein "what" belongs to a preset keyword, and the preset keyword is associated with two types of "concept introduction" and "product interpretation", it can be understood that since the segmentation result includes an insurance product name, the type corresponding to the candidate topic is known as "product interpretation". In addition, when the preset keyword is "why", the type of the candidate topic corresponding to the preset keyword may be "objection processing". For external topics, the type of their corresponding candidate topic may be "claim case".

Optionally, the matching of the corresponding target outline according to the type includes:

acquiring a preset mapping relation between the type and the outline;

and traversing the mapping relation according to the type to obtain a target outline matched with the type.

And S13, obtaining the subject content which accords with the candidate subject with the preset condition.

In at least one embodiment of the present application, for each candidate topic, there is topic content corresponding to the candidate topic, and the candidate topic may be divided into an external topic and an internal topic according to an acquisition route, and when the candidate topic is the external topic, topic content meeting a preset condition may be acquired from a second target platform system corresponding to the external topic. Optionally, the obtaining of the subject content meeting the preset condition with the candidate subject includes:

acquiring a text link corresponding to each candidate theme;

crawling initial subject content corresponding to the candidate subject according to the text link;

and preprocessing the initial subject content to obtain target subject content.

The external topics can be crawled from an external news information system, in the external news information system, the external topics and text links corresponding to the external topics exist, and initial topic contents corresponding to the candidate topics can be crawled through the text links. Because the initial subject content comprises some information such as text editing time, text editing objects and the like which are irrelevant to soft text generation, irrelevant information is removed through preprocessing, and the target subject content which meets the preset condition is obtained. The preset condition may refer to a preset format condition that the theme content needs to meet, and the like, and is not limited herein.

When the candidate topic is an internal topic, topic content meeting preset conditions can be acquired from a first target platform system corresponding to the internal topic. Optionally, the obtaining of the subject content meeting the preset condition with the candidate subject includes:

acquiring a target chat record set corresponding to the candidate subject;

determining a chat main body corresponding to each target chat record in the target chat record set and an initial text corresponding to the chat main body;

and combining and preprocessing the initial text to obtain target subject content.

The target chat record set is a chat record set which is related to the candidate subject and is subjected to duplication removal processing, two chat subjects exist for each target chat record in the target chat record set, and the chat subjects can be intelligent customer service and customers or artificial customer service and customers. For each chat principal, there is corresponding initial text. For example, when a customer asks "what is a policyholder", the "what is a policyholder" may be used as an initial text corresponding to the chat subject of the customer, the chat subject of an intelligent customer or a human service may answer the questions of the customer, and specific answer contents may be used as the initial text corresponding to the chat subject. For the question of "what is an applicant", other subproblems may be derived in the communication process, and each subproblem has a corresponding initial text, that is, the number of the initial texts corresponding to the candidate topic may be 1 or more. And when the number of the initial texts is multiple, combining and preprocessing the initial texts to obtain target subject contents. The preprocessing may be a processing manner such as deleting stop words, and is not limited herein.

And S14, extracting abstract texts from the subject contents according to the target outline, and filling the abstract texts into a target area corresponding to the target outline to obtain initial soft texts.

In at least one embodiment of the present application, the abstract text refers to a text corresponding to the target outline, for one target outline, there is one abstract text corresponding to the target outline, and the abstract text is filled into a target area corresponding to the target outline, so as to obtain an initial soft text. And establishing an association relation between the abstract text and the target area, and filling the abstract text to the target area corresponding to the target outline by inquiring the association relation.

Optionally, the extracting the abstract text from the subject content according to the target outline includes:

The conventional k-means algorithm is adopted to perform clustering analysis on the subject content to obtain a clustering result, and the clustering processing performed by the k-means algorithm belongs to the prior art and is not described herein again. And extracting subject terms of the clustering result by using Latent Dirichlet Allocation (hereinafter, referred to as LDA subject model) to obtain the subject terms corresponding to the subject contents. The LDA topic model can identify potential topic information in a large-scale text set and provides the potential topic information in a probability distribution mode. Therefore, the topic model LDA is used for extracting the topic of the clustered topic content, the topic corresponding to the text contained in the cluster can be obtained, and the main meaning of the text contained in the cluster can be further known. Words with similar or identical semantics can be subjected to vectorization of words, and then the Euclidean distance between the words is calculated to obtain the similarity value between the words.

And determining whether similarity relation exists between the nodes by taking sentences as the nodes of the text, and constructing a text network graph. If the similarity between two nodes is larger than a set threshold value, an edge exists between the two nodes, the value of the similarity is the weight of the edge, and otherwise, the edge does not exist. The set threshold is set to 0.0001.

The optimization of the initial abstract text is to obtain the appearance sequence of each sentence in the text in the original text and add a sequence number label to each sentence. The sentences in the initial abstract text are sorted according to the weight value, the problem of discontinuous expression of the sentences before and after the sentences possibly exists, and if the sentences are output according to the sequence of the sentences appearing in the original text, the generated abstract has certain semantic consistency to a certain extent. And the text matching algorithm is used for sequentially obtaining the sequence number labels of all sentences in the initial abstract text and then outputting the sequence number labels from small to large so as to obtain the target abstract text.

In other embodiments, before the optimizing the initial abstract text to obtain the target abstract text, the method further includes: acquiring a preset abstract word number requirement; and preprocessing stop words and the like on the initial abstract text according to the abstract word number requirement to obtain a target abstract text meeting the abstract word number requirement.

In other embodiments, the text < text > can be changed into a < url > type in the front page by highlighting and automatically adding a jump link to the subject word contained in the target abstract text and the high-frequency word set corresponding to the subject content. The user can click on the highlighted keyword to directly view more contents related to the theme, so that the understanding of the client is enhanced and the conversion is realized.

And S15, acquiring a target illustration corresponding to the abstract text, and filling the target illustration to the target position of the initial soft text to obtain the target soft text.

In at least one embodiment of the present application, a preset database stores a large number of illustrations related to products, each illustration carries a corresponding tag, the tag is used for marking main content of the illustration, the tag may be manually set, or may be automatically obtained after the content of the illustration is analyzed, and the content analysis of the illustration belongs to the prior art, and is not described herein any more.

Optionally, the obtaining of the target illustration corresponding to the abstract text includes:

acquiring a subject term corresponding to the abstract text;

acquiring a label set corresponding to an illustration in a preset database;

The method includes the steps of calculating a label vector corresponding to each label in the label set, calculating a subject vector corresponding to the subject term, and determining the similarity degree between the label and the subject term by calculating the Euclidean distance between the label vector and the subject vector. The preset similarity threshold is a preset threshold for evaluating the distance between two vectors.

In an embodiment, the target soft text may be typeset according to a preset typesetting format, and the preset typesetting format may be, for example, a sequence of { major title, minor title, illustration, summary }. There are several locations in the target soft text for filling in target illustrations.

Optionally, the filling the target illustration to the target position of the initial soft text to obtain the target soft text includes:

acquiring a target abstract text corresponding to the target illustration;

inquiring a preset typesetting format preset by the target abstract text to obtain a target position corresponding to the target illustration;

According to the article generation method based on deep learning, provided by the embodiment of the application, the external news information is connected with the internal customer service system, the hot topic collection is obtained from the external news information and the internal customer service system, then the candidate topics are screened, and the target soft texts are generated aiming at the candidate topics, so that the hot events can be automatically and comprehensively mastered, the hot topics are prevented from being manually searched and screened, the timeliness of the target soft texts is guaranteed, and the generation efficiency of the target soft texts is improved; in addition, by extracting the frame of the historical soft text with the reading amount higher than the preset reading amount threshold value, the target outline meeting the preference requirement of a reader can be obtained, the accuracy of obtaining the target outline is improved, and the accuracy of generating the target soft text is improved; in addition, the abstract text is automatically generated through a subject word extraction model and a mode of analyzing words and word frequencies in the subject content, and the abstract text is filled into a target area to obtain the soft text, so that the accuracy and the efficiency of generating the abstract text can be improved, and the accuracy and the efficiency of generating the soft text are improved. The application can be applied to various functional modules of smart cities such as smart government affairs and smart traffic, for example, the article generation module based on deep learning of smart government affairs can promote the rapid development of the smart cities.

In some embodiments, the deep learning basedarticle generating apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer programs of the various program segments in the deep learning basedarticle generation apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform the functions of deep learning based article generation (detailed in fig. 1).

In the present embodiment, the deep learning-basedarticle generating apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the apparatus. The functional module may include: asubject screening module 201, anoutline matching module 202, acontent obtaining module 203, asummary extracting module 204 and anillustration filling module 205. A module as referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in a memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.

Thetopic screening module 201 may be configured to obtain a topical topic set, and screen candidate topics from the topical topic set according to a first preset rule.

In at least one embodiment of the present application, to ensure that the soft text theme is sufficiently novel, operators often need to be sensitive to various hotspots. When a hot spot occurs, it takes the first time to complete the soft text composition and output. However, as the internet age becomes more mature, the life cycle of a hot spot may be 1 day, half day, 2-3 hours, so that the ability of operators and the company auditing process system are greatly checked. The application provides a hot topic monitoring system for be connected first target platform system and second target platform system, and follow first target platform system with acquire the hot topic collection in the second target platform system. In one embodiment, the first target platform system may be an internal customer service system, and the second target platform system may be an external news information system. The trending topics collectively include a plurality of internal topics and a plurality of external topics, wherein the internal topics may be "what is a policyholder", "what is called a policy loan", and the like, and the external topics may be "what is 4 deaths and 2 injuries caused by traffic accidents occurring in Shandong Taian", "Shaanxi Bao Mao high-speed 40 vehicles collide with each other", and the like, which is not limited herein. The candidate subject refers to a subject in the hot topic set for generating the target soft text. In an embodiment, the first preset rule may be a randomly selected rule, that is, the candidate topic may be a randomly selected topic in the trending topic set, which is not limited herein.

Optionally, the obtaining of the trending topic set includes:

obtaining a chat record set from a first target platform system;

and combining the internal topics and the external topics to obtain a popular topic collection. The method comprises the steps of establishing communication connection with an internal customer service system and an external news information system respectively to obtain internal topics from the internal customer service system and external topics from the external news information system, and combining the internal topics and the external topics to obtain a popular topic set. The chat record set can be a chat record between a client and an intelligent customer service or a chat record between a client and an artificial customer service. Most of the chat records represent the concept problem of the client for the related products, for example, the client can consult the problems of 'what is a medical insurance' and 'what is a life insurance'. Determining whether the question is a hot topic by calculating the frequency of topic keywords corresponding to each question, wherein it can be understood that when the occurrence frequency is higher than a preset frequency threshold, the target topic keyword is determined to be an internal topic; when the occurrence frequency is lower than a preset frequency threshold, determining that the target topic keyword is not an internal topic. The preset frequency threshold is a preset threshold for evaluating whether the question is an internal topic.

Theschema matching module 202 may be configured to identify a type of the candidate topic and match a corresponding target schema according to the type.

In at least one embodiment of the present application, the insurance-like soft texts are taken as an example, and the types of the candidate topics can be roughly divided into concept introduction, objection processing, product interpretation, and claim cases. The application provides different templates for different subject types, and for example, for a subject type of a claim case candidate, a corresponding target outline has 4 modules such as a customer situation, a risk process, a claim settlement result, a story enlightenment and the like. And for different types, the system comprises corresponding target outlines, wherein the target outlines refer to module contents corresponding to the soft texts, and one outline corresponds to one module. It can be understood that, when the number of the target outline is four, the number of the module contents corresponding to the soft text is also four.

and determining target outlines corresponding to the hot topics of all types based on the initial summary framework. The historical soft texts comprise a plurality of historical soft texts, the historical soft texts correspond to hot topics of different types, the historical soft texts can be articles stored in a preset database and written by operators, the historical soft texts are divided into a plurality of paragraphs according to categories, each paragraph corresponds to one category, and the categories are not limited. In one embodiment, the historical texts in the historical soft text set can be the texts with reading amount higher than a preset reading amount threshold value, and the preset reading amount threshold value is a preset threshold value used for evaluating the audience degree of the historical texts. According to the method and the device, the frame extraction is carried out on the soft text with the reading amount higher than the preset reading amount threshold value, the target outline meeting the preference requirement of a reader can be obtained, the accuracy of obtaining the target outline is improved, and therefore the accuracy of generating the target soft text is improved.

Optionally, the identifying the type of the candidate topic comprises:

detecting whether the word segmentation result contains preset keywords or not;

acquiring a preset mapping relation between the type and the outline;

Thecontent obtaining module 203 is configured to obtain the subject content meeting the preset condition with the candidate subject.

acquiring a text link corresponding to each candidate theme;

and preprocessing the initial subject content to obtain target subject content.

The external topics can be crawled from an external news information system, in the external news information system, the external topics and text links corresponding to the external topics exist, and initial topic contents corresponding to the candidate topics can be crawled through the text links. Because the initial subject content comprises some information such as text editing time, text editing objects and the like which are irrelevant to soft text generation, irrelevant information is removed through preprocessing, and the target subject content which meets the preset condition is obtained. The preset condition may refer to a preset format condition that the theme content needs to meet, and the like, and is not limited herein. When the candidate topic is an internal topic, optionally, the obtaining the topic content meeting the preset condition with the candidate topic includes:

acquiring a target chat record set corresponding to the candidate subject;

and combining the initial texts to obtain the subject contents related to the candidate subjects.

The target chat record set is a chat record set which is related to the candidate subject and is subjected to duplication removal processing, two chat subjects exist for each target chat record in the target chat record set, and the chat subjects can be intelligent customer service and customers or artificial customer service and customers. For each chat principal, there is corresponding initial text. For example, when a customer asks "what is a policyholder", the "what is a policyholder" may be used as an initial text corresponding to the chat subject of the customer, the chat subject of an intelligent customer or a human service may answer the questions of the customer, and specific answer contents may be used as the initial text corresponding to the chat subject. For the question of "what is an applicant", other subproblems may be derived in the communication process, and each subproblem has a corresponding initial text, that is, the number of the initial texts corresponding to the candidate topic may be 1 or more. And when the number of the initial texts is multiple, combining the initial texts to obtain the subject contents related to the candidate subjects.

The abstract extractingmodule 204 may be configured to extract an abstract text from the subject content according to the target outline, and fill the abstract text into a target area corresponding to the target outline to obtain an initial soft text.

Theillustration filling module 205 may be configured to obtain a target illustration corresponding to the abstract text, and fill the target illustration to a target position of the initial soft text to obtain a target soft text.

acquiring a subject term corresponding to the abstract text;

acquiring a label set corresponding to an illustration in a preset database;

acquiring a target abstract text corresponding to the target illustration;

Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present application. In the preferred embodiment of the present application, the computer device 3 includes amemory 31, at least oneprocessor 32, at least onecommunication bus 33, and atransceiver 34.

It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 is not a limitation of the embodiments of the present application, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.

In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.

It should be noted that the computer device 3 is only an example, and other existing or future electronic products, such as those that may be adapted to the present application, are also included in the scope of the present application and are incorporated herein by reference.

In some embodiments, thememory 31 has stored therein a computer program that, when executed by the at least oneprocessor 32, implements all or part of the steps of the deep learning based article generation method as described. TheMemory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In some embodiments, the at least oneprocessor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in thememory 31 and calling data stored in thememory 31. For example, the at least oneprocessor 32, when executing the computer program stored in the memory, implements all or part of the steps of the deep learning-based article generation method described in the embodiments of the present application; or implement all or part of the functions of the deep learning-based article generation apparatus. The at least oneprocessor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.

In some embodiments, the at least onecommunication bus 33 is arranged to enable connection communication between thememory 31 and the at least oneprocessor 32 or the like.

Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least oneprocessor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims

Translated fromChinese

1.一种基于深度学习的文章生成方法，其特征在于，所述方法包括：1. a kind of article generation method based on deep learning, is characterized in that, described method comprises:

获取热门话题集，并从所述热门话题集中根据第一预设规则筛选候选主题；obtaining a set of hot topics, and screening candidate topics from the set of hot topics according to a first preset rule;

识别所述候选主题的类型，并根据所述类型匹配对应的目标大纲；Identifying the type of the candidate topic, and matching the corresponding target outline according to the type;

获取与所述候选主题符合预设条件的主题内容；Obtain the subject content that meets the preset condition with the candidate subject;

根据所述目标大纲从所述主题内容中抽取摘要文本，并将所述摘要文本填充至所述目标大纲对应的目标区域，得到初始软文；Extract the abstract text from the subject content according to the target outline, and fill the abstract text into the target area corresponding to the target outline to obtain an initial soft article;

获取与所述摘要文本对应的目标插图，并将所述目标插图填充至所述初始软文的目标位置，得到目标软文。A target illustration corresponding to the abstract text is acquired, and the target illustration is filled in the target position of the initial article to obtain a target article.

2.根据权利要求1所述的基于深度学习的文章生成方法，其特征在于，所述获取热门话题集包括：2. The method for generating articles based on deep learning according to claim 1, wherein the obtaining a set of hot topics comprises:

从第一目标平台系统中获取聊天记录集；Obtain the chat record set from the first target platform system;

抽取所述聊天记录集中每一聊天记录对应的话题关键词，并选取出现频率高于预设频率阈值的目标话题关键词作为内部话题；Extracting topic keywords corresponding to each chat record in the chat record set, and selecting target topic keywords with a frequency of occurrence higher than a preset frequency threshold as an internal topic;

从第二目标平台系统中采集若干新闻榜单；Collect several news lists from the second target platform system;

获取与预设词汇匹配的目标新闻榜单作为外部话题；Obtain the target news list matching the preset vocabulary as an external topic;

组合所述内部话题与所述外部话题，得到热门话题集。The internal topic and the external topic are combined to obtain a hot topic set.

3.根据权利要求1所述的基于深度学习的文章生成方法，其特征在于，在所述识别所述候选主题的类型之前，所述方法还包括：3. The method for generating articles based on deep learning according to claim 1, wherein, before the identifying the type of the candidate topic, the method further comprises:

获取不同类型的热门话题对应的历史软文集；Get historical soft essay collections corresponding to different types of hot topics;

抽取所述历史软文集中每一历史软文对应的事件要素信息，生成初始事件集合；Extracting event element information corresponding to each historical soft paper in the historical soft paper to generate an initial event set;

计算所述初始事件集合中各个事件要素在事件刻画中的重要程度，并选取所述重要程度高于预设重要程度阈值的目标事件要素组成事件对应的初始概要框架；Calculate the importance level of each event element in the initial event set in the event characterization, and select the target event element whose importance level is higher than a preset importance level threshold to form an initial summary frame corresponding to the event;

基于所述初始概要框架确定各个类型的热门话题对应的目标大纲。Target outlines corresponding to various types of hot topics are determined based on the initial outline framework.

4.根据权利要求1所述的基于深度学习的文章生成方法，其特征在于，所述识别所述候选主题的类型包括：4. The method for generating articles based on deep learning according to claim 1, wherein the identifying the type of the candidate topic comprises:

利用融合外部知识构建的分词和词性标注联合模型对所述候选主题进行分词处理与词性标注，得到携带词性标注的分词结果；Use the joint model of word segmentation and part-of-speech tagging constructed by integrating external knowledge to perform word segmentation processing and part-of-speech tagging on the candidate topic, and obtain a word segmentation result with part-of-speech tagging;

检测所述分词结果中是否包含预设关键词；Detecting whether a preset keyword is included in the word segmentation result;

当检测结果为所述分词结果中包含所述预设关键词时，确定所述预设关键词对应的类型作为所述候选主题的类型。When the detection result is that the word segmentation result contains the preset keyword, the type corresponding to the preset keyword is determined as the type of the candidate topic.

5.根据权利要求1所述的基于深度学习的文章生成方法，其特征在于，所述根据所述目标大纲从所述主题内容中抽取摘要文本包括：5. The method for generating articles based on deep learning according to claim 1, wherein the extracting abstract text from the subject content according to the target outline comprises:

以句子为单位聚类处理所述主题内容，得到聚类结果；Clustering the subject content with sentences as a unit to obtain a clustering result;

调用预先训练好的主题词提取模型对所述聚类结果进行主题词抽取，得到所述主题内容对应的主题词；Invoke a pre-trained subject heading extraction model to extract subject headings on the clustering results to obtain the subject headings corresponding to the subject content;

统计所述主题内容中与所述主题词有相同或相近语义的词语及其词频，并将其与所述主题词进行合并，得到所述主题内容对应的高频词词集；Counting the words and their frequencies of words that have the same or similar semantics as the subject words in the subject content, and combining them with the subject words to obtain a set of high-frequency words corresponding to the subject content;

对所述主题内容构建文本网络图，并基于所述文本网络图与所述高频词词集进行摘要提取处理，得到候选摘要句群；constructing a text network graph for the subject content, and performing abstract extraction processing based on the text network graph and the high-frequency word set to obtain candidate abstract sentence groups;

对所述候选摘要句群进行去冗余，得到初始摘要文本，并对所述初始摘要文本进行优化得到目标摘要文本。De-redundancy is performed on the candidate summary sentence group to obtain an initial abstract text, and the initial abstract text is optimized to obtain a target abstract text.

6.根据权利要求1所述的基于深度学习的文章生成方法，其特征在于，所述获取与所述摘要文本对应的目标插图包括：6. The article generation method based on deep learning according to claim 1, wherein the acquiring the target illustration corresponding to the abstract text comprises:

获取所述摘要文本对应的主题词；obtain the subject headings corresponding to the abstract text;

获取预设数据库中的插图对应的标签集；Get the label set corresponding to the illustration in the preset database;

检测所述标签集中是否存在目标标签与所述主题词的相似程度是否超过预设相似度阈值；Detecting whether the similarity between the target label and the subject word exceeds a preset similarity threshold in the label set;

当检测结果为存在所述目标标签与所述主题词的相似程度超过预设相似度阈值时，确定所述目标标签对应的插图作为目标插图。When the detection result is that the similarity between the target label and the subject word exceeds a preset similarity threshold, the illustration corresponding to the target label is determined as the target illustration.

7.根据权利要求1所述的基于深度学习的文章生成方法，其特征在于，所述将所述目标插图填充至所述初始软文的目标位置，得到目标软文包括：7. The article generation method based on deep learning according to claim 1, wherein the described target illustration is filled into the target position of the initial press release, and obtaining the target press release comprises:

获取所述目标插图对应的目标摘要文本；obtaining the target abstract text corresponding to the target illustration;

根据所述目标摘要文本查询预先设置的预设排版格式，得到所述目标插图对应的目标位置；Query a preset preset typesetting format according to the target abstract text, and obtain the target position corresponding to the target illustration;

将所述目标插图填充至所述目标位置处，得到目标软文。Filling the target illustration to the target position to obtain a target essay.

8.一种基于深度学习的文章生成装置，其特征在于，所述基于深度学习的文章生成装置包括：8. A device for generating articles based on deep learning, wherein the device for generating articles based on deep learning comprises:

主题筛选模块，用于获取热门话题集，并从所述热门话题集中根据第一预设规则筛选候选主题；a topic screening module, configured to obtain a set of hot topics, and screen candidate topics from the set of hot topics according to a first preset rule;

大纲匹配模块，用于识别所述候选主题的类型，并根据所述类型匹配对应的目标大纲；an outline matching module, used to identify the type of the candidate topic, and match the corresponding target outline according to the type;

内容获取模块，用于获取与所述候选主题符合预设条件的主题内容；a content acquisition module, used to acquire subject content that meets preset conditions with the candidate subject;

摘要抽取模块，用于根据所述目标大纲从所述主题内容中抽取摘要文本，并将所述摘要文本填充至所述目标大纲对应的目标区域，得到初始软文；The abstract extraction module is used for extracting abstract text from the subject content according to the target outline, and filling the abstract text into the target area corresponding to the target outline to obtain an initial soft article;

插图填充模块，用于获取与所述摘要文本对应的目标插图，并将所述目标插图填充至所述初始软文的目标位置，得到目标软文。The illustration filling module is used for acquiring the target illustration corresponding to the abstract text, and filling the target illustration to the target position of the initial advertisment to obtain the target advising article.

9.一种计算机设备，其特征在于，所述计算机设备包括处理器，所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1至7中任意一项所述基于深度学习的文章生成方法。9. A computer device, characterized in that the computer device comprises a processor, and when the processor is used to execute a computer program stored in a memory, the article based on deep learning as described in any one of claims 1 to 7 is implemented Generate method.

10.一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现如权利要求1至7中任意一项所述基于深度学习的文章生成方法。10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the computer program according to any one of claims 1 to 7 is implemented. An article generation method for deep learning.