Movatterモバイル変換


[0]ホーム

URL:


CN111598239A - Method and device for extracting process system of article based on graph neural network - Google Patents

Method and device for extracting process system of article based on graph neural network
Download PDF

Info

Publication number
CN111598239A
CN111598239ACN202010727219.7ACN202010727219ACN111598239ACN 111598239 ACN111598239 ACN 111598239ACN 202010727219 ACN202010727219 ACN 202010727219ACN 111598239 ACN111598239 ACN 111598239A
Authority
CN
China
Prior art keywords
title
level
node
level title
article
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010727219.7A
Other languages
Chinese (zh)
Other versions
CN111598239B (en
Inventor
宋永生
王楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenling Technology Beijing Co ltd
Original Assignee
Jiangsu United Industrial Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu United Industrial Ltd By Share LtdfiledCriticalJiangsu United Industrial Ltd By Share Ltd
Priority to CN202010727219.7ApriorityCriticalpatent/CN111598239B/en
Publication of CN111598239ApublicationCriticalpatent/CN111598239A/en
Application grantedgrantedCritical
Publication of CN111598239BpublicationCriticalpatent/CN111598239B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a method and a device for extracting a process system of an article based on a graph neural network, which relate to the technical field of artificial intelligence, and identify the hierarchical structure of different grades of titles of a first article by analyzing format information of the first article; judging whether each title is a behavior word describing the first process, when the first-level title is the behavior word describing the first process, establishing a time vector of the first-level title and the second-level title in the lower-layer title where the first-level title is located, establishing a belonging vector from the upper-layer title of the first-level title to the lower-layer title, further establishing a first title network diagram according to the time vector and the belonging vector, and performing unsupervised learning of a graph neural network on a large number of second title network diagrams of second articles to obtain a step sequence of a first process system and the first process system, so that the technical effect of maximizing the accuracy of the result of iterative learning of the graph neural network on the article title hierarchical structure is achieved.

Description

Method and device for extracting process system of article based on graph neural network
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for extracting a process system of an article based on a graph neural network.
Background
The basis of machine intelligence is the cognitive architecture of computers, which includes two broad categories: one type is a static conceptual system, such as: a classification system according to attribute characteristics, a structural system according to physical connection, and a relationship system according to logical relationships; the other is a dynamic event (process) system. A process that occurs in a particular spatio-temporal context is an event. Therefore, the identification and extraction of the process system are indispensable steps for the computer to acquire the machine intelligence, are the basis for the computer to judge the historical events and predict the future events, and are an important direction for the machine intelligence research at present.
The layout and the hierarchy for identifying the article title are mature technologies in the industry, because the commonly used text software (such as word, PDF, HTML and the like) of people carries format information, and people also use title numbering, font rendering, paragraph indentation, and counterpoint and the like to highlight the hierarchy of the title and the paragraph. Therefore, the computer can obtain rich information to identify the hierarchy of the article titles. The identified article title hierarchy itself reflects the relationship between the process and its steps. A title node is a step for the previous layer title it points to, and a process name for the next layer title it points to, so that when constructing the belonging vector (edge) of the title network graph, it is sufficient to rely on the hierarchical information of the article title structure. When determining how many steps a process includes and what the sequence is, the information about the process and the steps that are attached to the process provided by the title structure of an article is often incomplete, and in an article, even if two steps look "adjacent" in relative time, in fact, other steps may be hidden in the middle of the article. The traditional mathematical statistics needs similarity aggregation on a large number of article title structures, irreversibility and consistency check on increase and decrease of sequence elements in one step, and the like.
However, the applicant of the present invention finds that the prior art has at least the following technical problems:
the existing mathematical statistics can only carry out statistics aiming at the existing steps, the capacity of deducing unknown steps is not provided, and when step information of the same process reflected by different articles conflicts, consistency verification can cause the loss of the accuracy of the final result.
Disclosure of Invention
The embodiment of the invention provides a method and a device for extracting a process system of an article based on a graph neural network, which solve the technical problems that in the prior art, mathematical statistics can only be carried out aiming at the existing steps, the capacity of unknown steps is not deduced, and when step information of the same process reflected by different articles conflicts, consistency verification can cause the loss of the accuracy of a final result, thereby achieving the technical effects of continuously iterative learning based on the graph neural network, further having certain capacity of excavating hidden steps and ensuring the maximization of the accuracy of the result of the iterative learning of the graph neural network.
In view of the above problems, the present application is provided to provide a method and an apparatus for a process architecture for extracting articles based on a graph neural network.
In a first aspect, the present invention provides a method for extracting a process system of an article based on a graph neural network, the method comprising: obtaining first article format information of a first article; identifying a title hierarchy of the first article according to the first article format information to obtain a first-level title, wherein the first-level title comprises a first paragraph corresponding to the first-level title; judging whether the first-level title is a behavior word describing a first process; when the first-level title is a behavior word describing the first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located; obtaining a second-level title describing the first process in the lower-level title, wherein the second-level title contains a second paragraph corresponding to the second-level title; establishing a vector according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title; establishing a first title network graph according to the first-level title, the second-level title, the upper-layer title, the affiliated vector and the time vector; obtaining a plurality of second articles, and correspondingly establishing a plurality of second title network graphs according to the plurality of second articles, wherein the article names of the second articles and the first articles belong to synonyms; and inputting the first header network diagram and the plurality of second header network diagrams into a neural network for deep learning to obtain a first process system and a first process system.
Preferably, the first article format information includes a first article text format, a first article font format, and a first article paragraph format.
Preferably, the establishing the belonging vector according to the upper layer title and the lower layer title includes:
determining an upper node according to the upper title; determining the lower-layer title according to the first-level title and the second-level title; determining a lower layer node according to the lower layer title; and obtaining the vector of the lower node pointing to the upper node according to the lower node and the upper node.
Preferably, identifying the first paragraph and the second paragraph according to time to establish a time vector of the first level title and the second level title comprises:
obtaining a first-level title node of the first-level title; obtaining a second-level title node of the second-level title; obtaining a first time quantum according to the first paragraph corresponding to the first level title; obtaining a second time quantum according to the second paragraph corresponding to the second-level title; judging the time sequence of the first time quantum and the second time quantum; when the first amount of time is before the time of the second amount of time, determining whether the first level title node and the second level title node are adjacent nodes; obtaining the time vector pointing from the first level title node to the second level title node when the first level title node and the second level title node are adjacent nodes.
Preferably, the method further comprises:
inputting the first title network graph and the second title network graph into the graph neural network for training to obtain a plurality of first title state functions hv, wherein the first title state functions hv are expressed as hv = f (xv, xco [ v ], hne [ v ], xne [ v ]), and the first title state functions hv are vectorized representations of a node v, and judging whether the node v describes a first process; f (—) is a local transfer function, is shared by all nodes, and updates the state of the nodes according to the input domain information; xv is a feature representation of the node v; xco [ v ] is the edge connected to the node v, i.e. the feature representation of the belonging vector and the time vector; hne [ v ] is the state of the neighboring node; xne [ v ] is a feature representation of a node adjacent to the node v;
aggregating the plurality of first title state functions hv to obtain a first title state function set H, wherein the first title state function set H is represented as H = F (H, X), where F (×) is a local transfer function set; x is the feature set of the node v;
iteratively learning the first title state function set H along time to obtain an iterative function Ht +1, wherein the iterative function Ht +1 is represented as Ht +1= F (Ht, X), and Ht +1 is a title state function set at the next t +1 of the first title state function set; ht is a first title state function set at time t;
and when the iteration function Ht +1= Ht, calculating the iteration function Ht +1 to obtain the first process system and the sequence of steps of the first process system.
Preferably, the method further comprises:
determining the node v as a plurality of first steps Ov describing the first process according to the plurality of first title state functions hv, wherein the first steps Ov are represented as Ov = g (hv, Xv), wherein g (×) is a local output function;
aggregating the plurality of first steps Ov to obtain a first step aggregate O of the first process system, wherein the first step aggregate O is represented as O = G (H, X), wherein G (×) is a local output function aggregate.
In a second aspect, the present invention provides an apparatus for a process architecture for article extraction based on graph neural network, the apparatus comprising:
a first obtaining unit, configured to obtain first article format information of a first article;
a second obtaining unit, configured to identify a title hierarchy of the first article according to the first article format information to obtain a first-level title, where the first-level title includes a first paragraph corresponding to the first-level title;
the first judging unit is used for judging whether the first-level title is a behavior word describing a first process;
a first determining unit, configured to determine, when the first-level title is a behavior word describing the first process, an upper-layer title of the first-level title and a lower-layer title where the first-level title is located;
a third obtaining unit, configured to obtain a second-level title that describes the first process in the lower-level title, where the second-level title includes a second paragraph corresponding to the second-level title;
the first construction unit is used for establishing a vector of the upper layer title and the lower layer title according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first level title and the second level title;
a second constructing unit, configured to establish a first title network map according to the first level title, the second level title, the upper layer title, the belonging vector, and the time vector;
a third constructing unit, configured to obtain a plurality of second articles, and correspondingly establish a plurality of second headline network graphs according to the plurality of second articles, where the article names of the second articles and the first articles belong to a synonym;
a fourth obtaining unit, configured to perform deep learning on the first header network map and the plurality of second header network map input maps by using a neural network, so as to obtain a first process system and a sequence of steps of the first process system.
Preferably, the first article format information includes a first article text format, a first article font format, and a first article paragraph format.
Preferably, the establishing, by the first building unit, the vector according to the upper-layer title and the lower-layer title includes:
a second determining unit configured to determine an upper node according to the upper header;
a third determining unit configured to determine the lower-layer title according to the first-level title and the second-level title;
a fourth determining unit configured to determine a lower node according to the lower title;
a fifth obtaining unit, configured to obtain, according to the lower node and the upper node, the belonging vector of the lower node pointing to the upper node.
Preferably, the establishing, in the first building unit, a time vector of the first-level title and the second-level title according to the time for identifying the first paragraph and the second paragraph includes:
a sixth obtaining unit configured to obtain a first-level title node of the first-level title;
a seventh obtaining unit configured to obtain a second-level title node of the second-level title;
an eighth obtaining unit, configured to obtain a first amount of time according to the first paragraph corresponding to the first level title;
a ninth obtaining unit, configured to obtain a second amount of time according to the second paragraph corresponding to the second level title;
a second judging unit, configured to judge a time sequence of the first time amount and the second time amount;
a third judging unit configured to judge whether the first-level title node and the second-level title node are adjacent nodes when the first amount of time is before the time of the second amount of time;
a tenth obtaining unit configured to obtain the time vector pointing from the first-level title node to the second-level title node when the first-level title node and the second-level title node are adjacent nodes.
Preferably, the apparatus further comprises:
a tenth obtaining unit, configured to input the first header network graph and the second header network graph into the graph neural network for training, and obtain a plurality of first header state functions hv, where the first header state functions hv are expressed as hv = f (xv, xco [ v ], hne [ v ], xne [ v ]), and where the first header state functions hv are vectorized representations of a node v, and determine whether the node v describes a first process; f (—) is a local transfer function, is shared by all nodes, and updates the state of the nodes according to the input domain information; xv is a feature representation of the node v; xco [ v ] is the edge connected to the node v, i.e. the feature representation of the belonging vector and the time vector; hne [ v ] is the state of the neighboring node; xne [ v ] is a feature representation of a node adjacent to the node v;
an eleventh obtaining unit, configured to aggregate the plurality of first title state functions hv to obtain a first title state function set H, where the first title state function set H is denoted as H = F (H, X), where F (×) is a local transfer function set; x is the feature set of the node v;
a twelfth obtaining unit, configured to iteratively learn the first header state function set H along time to obtain an iterative function Ht +1, where the iterative function Ht +1 is denoted as Ht +1= F (Ht, X), and Ht +1 is a header state function set at a next t +1 of the first header state function set; ht is a first title state function set at time t;
a thirteenth obtaining unit, configured to calculate the iterative function Ht +1 to obtain the first process system and the sequence of steps of the first process system when the iterative function Ht +1= Ht.
Preferably, the apparatus further comprises:
a fifth determining unit, configured to determine the node v as a plurality of first steps Ov describing the first process according to the plurality of first header state functions hv, where the first steps Ov are denoted as Ov = g (hv, Xv), where g (×) is a local output function;
a fourteenth obtaining unit, configured to aggregate the plurality of first steps Ov to obtain a first step aggregate O of the first process system, where the first step aggregate O is represented as O = G (H, X), and G (×) is a local output function aggregate.
In a third aspect, the present invention provides an apparatus for a process architecture for article extraction based on graph neural network, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any one of the above methods when executing the program.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the method and the device for extracting the process system of the article based on the graph neural network provided by the embodiment of the invention are characterized in that first article format information of a first article is obtained; identifying a title hierarchy of the first article according to the first article format information to obtain a first-level title, wherein the first-level title comprises a first paragraph corresponding to the first-level title; judging whether the first-level title is a behavior word describing a first process; when the first-level title is a behavior word describing the first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located; obtaining a second-level title describing the first process in the lower-level title, wherein the second-level title contains a second paragraph corresponding to the second-level title; establishing a vector according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title; establishing a first title network graph according to the first-level title, the second-level title, the upper-layer title, the affiliated vector and the time vector; obtaining a plurality of second articles, and correspondingly establishing a plurality of second title network graphs according to the plurality of second articles, wherein the article names of the second articles and the first articles belong to synonyms; the first headline network graph and the plurality of second headline network graphs are input into a graph neural network for deep learning to obtain the step sequence of the first process system and the first process system, so that the technical problems that in the prior art, mathematical statistics can only be performed on the steps which appear, the capacity of unknown steps is not deduced, and when step information of the same process reflected by different articles conflicts, consistency verification can cause the loss of the accuracy of the final result are solved, continuous iterative learning based on the graph neural network is achieved, certain capacity of excavating hidden steps is achieved, and the technical effect of maximizing the accuracy of the result of the iterative learning of the graph neural network is ensured.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a flowchart illustrating a method for extracting a process architecture of an article based on a graph neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of an apparatus for a process architecture for article extraction based on graph neural networks according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another apparatus for extracting an article based on a graph neural network according to an embodiment of the present invention.
Description of reference numerals: a first obtainingunit 11, a second obtainingunit 12, afirst judging unit 13, a first determiningunit 14, a third obtainingunit 15, afirst constructing unit 16, asecond constructing unit 17, athird constructing unit 18, a fourth obtainingunit 19, abus 300, areceiver 301, aprocessor 302, atransmitter 303, amemory 304, and abus interface 306.
Detailed Description
The embodiment of the invention provides a method and a device for extracting a process system of an article based on a graph neural network, which are used for solving the technical problems that in the prior art, mathematical statistics can only be carried out aiming at the existing steps, the capability of deducing unknown steps is not realized, and when step information of the same process reflected by different articles conflicts, consistency verification can cause the loss of the accuracy of a final result.
The technical scheme provided by the invention has the following general idea: obtaining first article format information of a first article; identifying a title hierarchy of the first article according to the first article format information to obtain a first-level title, wherein the first-level title comprises a first paragraph corresponding to the first-level title; judging whether the first-level title is a behavior word describing a first process; when the first-level title is a behavior word describing the first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located; obtaining a second-level title describing the first process in the lower-level title, wherein the second-level title contains a second paragraph corresponding to the second-level title; establishing a vector according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title; establishing a first title network graph according to the first-level title, the second-level title, the upper-layer title, the affiliated vector and the time vector; obtaining a plurality of second articles, and correspondingly establishing a plurality of second title network graphs according to the plurality of second articles, wherein the article names of the second articles and the first articles belong to synonyms; and inputting the first header network graph and the plurality of second header network graphs into a graph neural network for deep learning to obtain a step sequence of a first process system and the first process system, so that continuous iterative learning based on the graph neural network is achieved, certain capacity of excavating hidden steps is achieved, and the technical effect of maximizing the accuracy of the result of the iterative learning of the graph neural network is ensured.
The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
Fig. 1 is a flowchart illustrating a method for extracting a process system of an article based on a graph neural network according to an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides a method for extracting a process architecture of an article based on a graph neural network, where the method includes:
step 110: first article format information of a first article is obtained.
Step 120: and identifying the title hierarchy of the first article according to the first article format information to obtain a first-level title, wherein the first-level title comprises a first paragraph corresponding to the first-level title.
Further, the first article format information includes a first article text format, a first article font format, and a first article paragraph format.
Specifically, the first article text format, the first article font format and the first article paragraph format of the first article are analyzed, such as the title font, the title font size, the paragraph indentation and the proof. According to a first article text format, a first article font format, a first article paragraph format and the like in the first article format information, a title hierarchical structure of a first article is identified, and then the grade of the title, namely a first-grade title, is obtained, wherein the first-grade title comprises a first-grade title, a second-grade title, a third-grade title and the like. The first level headlines are a collective term for each headline in the hierarchy of headlines identified in the first article. The first-level title comprises a first section of title information corresponding to the first-level title, wherein the first section is used for describing or further expanding the specific text content of the first-level title, and the first section belongs to the content hooked by the first-level title.
Step 130: and judging whether the first-level title is a behavior word describing a first process.
Step 140: and when the first-level title is a behavior word describing the first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located.
Step 150: obtaining a second level title describing the first process in the lower level title, wherein the second level title includes a second paragraph corresponding to the second level title.
Specifically, the process identification is performed on each title in the title hierarchy identified in the first article, that is, the process is determined which titles in the first-level title describe. When the first-level title has a behavior word for describing a first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located, wherein the name of the first process is the upper-layer title of the layer where the first-level title is located; the lower layer title is the current layer where the first level title is located. And performing the same process identification on all the titles of the layer where the first-level title is located, and obtaining all second-level titles describing the first process in the lower-layer title, wherein the second-level titles and the first-level title belong to the same-layer title. The second-level title comprises a second paragraph corresponding to the second-level title, wherein the second paragraph is a specific text content for describing or further expanding the second-level title, and the second paragraph belongs to a content hooked by the second-level title. For example, if a title describing "prosecution" and a title describing "court" both have the same upper level title "litigation," the process is named "litigation" and both "prosecution" and "court" are steps in the litigation process.
Step 160: and establishing a belonging vector according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title.
Further, the establishing of the vector according to the upper layer title and the lower layer title includes: determining an upper node according to the upper title; determining the lower-layer title according to the first-level title and the second-level title; determining a lower layer node according to the lower layer title; and obtaining the vector of the lower node pointing to the upper node according to the lower node and the upper node. Further, identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title, including: obtaining a first-level title node of the first-level title; obtaining a second-level title node of the second-level title; obtaining a first time quantum according to the first paragraph corresponding to the first level title; obtaining a second time quantum according to the second paragraph corresponding to the second-level title; judging the time sequence of the first time quantum and the second time quantum; when the first amount of time is before the time of the second amount of time, determining whether the first level title node and the second level title node are adjacent nodes; obtaining the time vector pointing from the first level title node to the second level title node when the first level title node and the second level title node are adjacent nodes.
Specifically, each title in the article is taken as a node of a title network graph, and an upper-layer title is determined to be an upper-layer node and a lower-layer title is determined to be a lower-layer node, wherein the lower-layer title comprises a first-level title and a second-level title. And obtaining the affiliated vector of the lower node pointing to the upper node according to the lower node and the upper node, namely drawing an edge pointing to the name in the step between the lower header and the first process name header of the upper header as the affiliated vector. The method comprises the steps of obtaining a first level title node of a first level title and a second level title node of a second level title, obtaining a first time amount according to a first section corresponding to the first level title and obtaining a second time amount according to a second section corresponding to the second level title. And judging the time sequence of the first time quantum and the second time quantum, and judging whether the first-level title node and the second-level title node are adjacent nodes or not when the first time quantum is before the time of the second time quantum. That is, the next step of determining the first level title node is the second level title node. When the first level title node and the second level title node are adjacent nodes, a time vector pointing from the first level title node to the second level title node is obtained. That is, the amount of time is found in the corresponding text passage of each step heading with the same belonging vector pointing thereto, and the adjacent nodes are found in the heading containing the amount of time, and an edge connecting the two adjacent nodes is drawn in the first-to-last direction as a time vector.
Step 170: and establishing a first title network graph according to the first-level title, the second-level title, the upper-layer title, the affiliated vector and the time vector.
Specifically, a first-level title, a second-level title and an upper-layer title are used as nodes, and a belonging vector and a time vector are used as edges to establish a first-title network graph. That is, a first level title and its vector and time vector linked neighboring second level titles form a node of the first title network graph, and all steps included under each first process are linked in such a manner as to form the first title network graph of the first process.
Step 180: and obtaining a plurality of second articles, and correspondingly establishing a plurality of second title network graphs according to the plurality of second articles, wherein the article names of the second articles and the first articles belong to synonyms.
Step 190: and inputting the first header network diagram and the plurality of second header network diagrams into a neural network for deep learning to obtain a first process system and a first process system.
In particular, in a particular article, all headings describing process steps that are next to a process name heading, do not necessarily cover all steps of the process, and are incomplete. To obtain the complete set of steps of the process, all steps of the same process that occur in a large number of articles need to be analyzed. To do this, the present embodiment requires that a hierarchy of all the titles of a large number of articles be combated into a data set of elementary units combined by individual "process names and their underlying steps", although each unit may contain steps that are incomplete and out of order. Thus, a large number of second articles are obtained, the article names of the second articles and the first articles belonging to the same synonym, i.e. the second articles and the first articles belonging to the same type of articles. And correspondingly establishing a plurality of second headline network graphs according to a plurality of second articles, inputting the first headline network graph and the second headline network graphs into a graph neural network for deep learning, adding new nodes from the second headline network graphs to the first headline network graph, or adjusting the positions of the existing nodes, and obtaining a step sequence of a first process system and a first process system with extremely high integrity when the gradient function of the nodes between the second headline network graph and the first headline network graph tends to zero through continuous iterative learning. The graph neural network can deduce the definition (label) of the core node through the information of the surrounding nodes and edges in the continuously iterative learning process, so that the graph neural network has certain capacity of mining hidden nodes (steps), and further obtains a process system with high integrity and consistency.
Further, the method further comprises: inputting the first title network graph and the second title network graph into the graph neural network for training to obtain a plurality of first title state functions hv, wherein the first title state functions hv are expressed as hv = f (xv, xco [ v ], hne [ v ], xne [ v ]), and the first title state functions hv are vectorized representations of a node v, and judging whether the node v describes a first process; f (—) is a local transfer function, is shared by all nodes, and updates the state of the nodes according to the input domain information; xv is a feature representation of the node v; xco [ v ] is the edge connected to the node v, i.e. the feature representation of the belonging vector and the time vector; hne [ v ] is the state of the neighboring node; xne [ v ] is a feature representation of a node adjacent to the node v; aggregating the plurality of first title state functions hv to obtain a first title state function set H, wherein the first title state function set H is represented as H = F (H, X), where F (×) is a local transfer function set; x is the feature set of the node v; iteratively learning the first title state function set H along time to obtain an iterative function Ht +1, wherein the iterative function Ht +1 is represented as Ht +1= F (Ht, X), and Ht +1 is a title state function set at the next t +1 of the first title state function set; ht is a first title state function set at time t; and when the iteration function Ht +1= Ht, calculating the iteration function Ht +1 to obtain the first process system and the sequence of steps of the first process system.
Further, the method further comprises: determining the node v as a plurality of first steps Ov describing the first process according to the plurality of first title state functions hv, wherein the first steps Ov are represented as Ov = g (hv, Xv), wherein g (×) is a local output function; aggregating the plurality of first steps Ov to obtain a first step aggregate O of the first process system, wherein the first step aggregate O is represented as O = G (H, X), wherein G (×) is a local output function aggregate.
Specifically, a first header network graph and a second header network graph are input into a graph neural network for training, and a plurality of first header state functions hv are obtained, wherein the first header state functions hv are expressed as hv = f (xv, xco [ v ], hne [ v ], xne [ v ]), and the first header state functions are states which convert nodes of the graph neural network composed of first-level headers into digital representations. And collecting the second title state functions and the first title state functions in the plurality of second title network graphs to obtain a first title state function set H, namely the total number of all nodes in the first title network graph and the second title network graph. And (4) carrying out iterative learning on the first title state function set H along time, and sequencing all nodes in the first title state function set H according to the time sequence to obtain an iterative function Ht + 1. The process is to find a first time amount in a first section under a first-level title, compare the time with adjacent nodes, if the time sequence is right, the position is not adjusted, if the time sequence is not right, the position of the first-level title node in the graph is adjusted until the time sequence is correct. When the iteration function Ht +1= Ht, calculating the iteration function Ht +1 to obtain the first process system and the sequence of steps of the first process system, that is, there is a relationship between adjacent states of the node set, each learning of the graph neural network is an iteration, each iteration adds a new node in the graph, or adjusts the position of an existing node in the graph, or both. The gradient function (loss function) can be constructed by using a certain function form L of (Ht + 1-Ht), and the aim of continuous iterative learning is to make the gradient function approach to zero. When no new node can be added, and no existing node position can be adjusted, the gradient function is zero, that is, no matter how many articles and titles are added, the number of processes that can be found by the graph neural network is not changed, and the step sequence of each found process is not changed, that is, the iteration function Ht +1= Ht, a step sequence of the first process system and the first process system with extremely high integrity can be obtained. In the process of iteratively learning the first header network graph and the second header network graph by the graph neural network, according to the plurality of first header state functions hv, the node v can be determined as a plurality of first steps Ov describing the first process, and the plurality of first steps Ov are collected to obtain a first step set O of the first process system, that is, all the first steps in the second header network graph are collected with the first steps in the first header network graph to determine a complete first step.
Example two
Based on the same inventive concept as the method for extracting the process system of the article based on the graph neural network in the foregoing embodiment, the present invention further provides a method and an apparatus for extracting the process system of the article based on the graph neural network, as shown in fig. 2, the apparatus includes:
a first obtainingunit 11, where the first obtainingunit 11 is configured to obtain first article format information of a first article;
a second obtainingunit 12, where the second obtainingunit 12 is configured to identify a headline hierarchy of the first article according to the first article format information to obtain a first-level headline, where the first-level headline includes a first paragraph corresponding to the first-level headline;
afirst judging unit 13, where thefirst judging unit 13 is configured to judge whether the first-level title is a behavior word describing a first process;
a first determiningunit 14, where the first determiningunit 14 is configured to determine an upper-layer title of the first-level title and a lower-layer title where the first-level title is located when the first-level title is a behavior word describing the first process;
a third obtainingunit 15, where the third obtainingunit 15 is configured to obtain a second-level title describing the first process in the lower-level title, where the second-level title includes a second paragraph corresponding to the second-level title;
afirst constructing unit 16, where thefirst constructing unit 16 is configured to establish a vector according to the upper layer title and the lower layer title, and identify, according to time, the first paragraph and the second paragraph, and establish a time vector of the first level title and the second level title;
asecond constructing unit 17, where thesecond constructing unit 17 is configured to establish a first title network map according to the first level title, the second level title, the upper layer title, the belonging vector, and the time vector;
athird constructing unit 18, where thethird constructing unit 18 is configured to obtain a plurality of second articles, and correspondingly establish a plurality of second headline network graphs according to the plurality of second articles, where the article names of the second articles and the first articles belong to synonyms;
a fourth obtainingunit 19, where the fourth obtainingunit 19 is configured to perform deep learning on the first header network map and the plurality of second header network maps input map neural networks, so as to obtain a first process system and a sequence of steps of the first process system.
Further, the first article format information includes a first article text format, a first article font format, and a first article paragraph format.
Further, the establishing, by the first constructing unit, the vector according to the upper layer title and the lower layer title includes:
a second determining unit configured to determine an upper node according to the upper header;
a third determining unit configured to determine the lower-layer title according to the first-level title and the second-level title;
a fourth determining unit configured to determine a lower node according to the lower title;
a fifth obtaining unit, configured to obtain, according to the lower node and the upper node, the belonging vector of the lower node pointing to the upper node.
Further, the establishing, by the first building unit, the time vector of the first-level title and the second-level title according to the time vector of the first-level title and the second-level title by identifying the first paragraph and the second paragraph according to time includes:
a sixth obtaining unit configured to obtain a first-level title node of the first-level title;
a seventh obtaining unit configured to obtain a second-level title node of the second-level title;
an eighth obtaining unit, configured to obtain a first amount of time according to the first paragraph corresponding to the first level title;
a ninth obtaining unit, configured to obtain a second amount of time according to the second paragraph corresponding to the second level title;
a second judging unit, configured to judge a time sequence of the first time amount and the second time amount;
a third judging unit configured to judge whether the first-level title node and the second-level title node are adjacent nodes when the first amount of time is before the time of the second amount of time;
a tenth obtaining unit configured to obtain the time vector pointing from the first-level title node to the second-level title node when the first-level title node and the second-level title node are adjacent nodes.
Further, the apparatus further comprises:
a first training unit, configured to input the first header network graph and the second header network graph into the graph neural network for training, so as to obtain a plurality of first header state functions hv, where the first header state functions hv are expressed as hv = f (xv, xco [ v ], hne [ v ], xne [ v ]), and where the first header state functions hv are vectorized representations of a node v, and determine whether the node v describes a first process; f (—) is a local transfer function, is shared by all nodes, and updates the state of the nodes according to the input domain information; xv is a feature representation of the node v; xco [ v ] is the edge connected to the node v, i.e. the feature representation of the belonging vector and the time vector; hne [ v ] is the state of the neighboring node; xne [ v ] is a feature representation of a node adjacent to the node v;
an eleventh obtaining unit, configured to aggregate the plurality of first title state functions hv to obtain a first title state function set H, where the first title state function set H is denoted as H = F (H, X), where F (×) is a local transfer function set; x is the feature set of the node v;
a twelfth obtaining unit, configured to iteratively learn the first header state function set H along time to obtain an iterative function Ht +1, where the iterative function Ht +1 is denoted as Ht +1= F (Ht, X), and Ht +1 is a header state function set at a next t +1 of the first header state function set; ht is a first title state function set at time t;
a thirteenth obtaining unit, configured to calculate the iterative function Ht +1 to obtain the first process system and the sequence of steps of the first process system when the iterative function Ht +1= Ht.
Further, the apparatus further comprises:
a fifth determining unit, configured to determine the node v as a plurality of first steps Ov describing the first process according to the plurality of first header state functions hv, where the first steps Ov are denoted as Ov = g (hv, Xv), where g (×) is a local output function;
a fourteenth obtaining unit, configured to aggregate the plurality of first steps Ov to obtain a first step aggregate O of the first process system, where the first step aggregate O is represented as O = G (H, X), and G (×) is a local output function aggregate.
Various changes and specific examples of the method for extracting a process system of an article based on a graph neural network in the first embodiment of fig. 1 are also applicable to the apparatus for extracting a process system of an article based on a graph neural network in the present embodiment, and through the foregoing detailed description of the method for extracting a process system of an article based on a graph neural network, those skilled in the art can clearly know an implementation method of the apparatus for extracting a process system of an article based on a graph neural network in the present embodiment, so for the brevity of the description, detailed descriptions are not further provided here.
EXAMPLE III
Based on the same inventive concept as the method for extracting the process architecture of the article based on the graph neural network in the foregoing embodiment, the present invention further provides an apparatus for extracting the process architecture of the article based on the graph neural network, as shown in fig. 3, including amemory 304, aprocessor 302, and a computer program stored on thememory 304 and operable on theprocessor 302, where theprocessor 302 executes the program to implement the steps of any one of the methods for extracting the process architecture of the article based on the graph neural network.
Where in fig. 3 a bus architecture (represented by bus 300),bus 300 may include any number of interconnected buses and bridges,bus 300 linking together various circuits including one or more processors, represented byprocessor 302, and memory, represented bymemory 304. Thebus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. Abus interface 306 provides an interface between thebus 300 and thereceiver 301 andtransmitter 303. Thereceiver 301 and thetransmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. Theprocessor 302 is responsible for managing thebus 300 and general processing, and thememory 304 may be used for storing data used by theprocessor 302 in performing operations.
Example four
Based on the same inventive concept as the method for extracting the process system of the article based on the graph neural network in the foregoing embodiments, the present invention also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the following steps: obtaining first article format information of a first article; identifying a title hierarchy of the first article according to the first article format information to obtain a first-level title, wherein the first-level title comprises a first paragraph corresponding to the first-level title; judging whether the first-level title is a behavior word describing a first process; when the first-level title is a behavior word describing the first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located; obtaining a second-level title describing the first process in the lower-level title, wherein the second-level title contains a second paragraph corresponding to the second-level title; establishing a vector according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title; establishing a first title network graph according to the first-level title, the second-level title, the upper-layer title, the affiliated vector and the time vector; obtaining a plurality of second articles, and correspondingly establishing a plurality of second title network graphs according to the plurality of second articles, wherein the article names of the second articles and the first articles belong to synonyms; and inputting the first header network diagram and the plurality of second header network diagrams into a neural network for deep learning to obtain a first process system and a first process system.
In a specific implementation, when the program is executed by a processor, any method step in the first embodiment may be further implemented.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the method and the device for extracting the process system of the article based on the graph neural network provided by the embodiment of the invention are characterized in that first article format information of a first article is obtained; identifying a title hierarchy of the first article according to the first article format information to obtain a first-level title, wherein the first-level title comprises a first paragraph corresponding to the first-level title; judging whether the first-level title is a behavior word describing a first process; when the first-level title is a behavior word describing the first process, determining an upper-layer title of the first-level title and a lower-layer title where the first-level title is located; obtaining a second-level title describing the first process in the lower-level title, wherein the second-level title contains a second paragraph corresponding to the second-level title; establishing a vector according to the upper layer title and the lower layer title, and identifying the first paragraph and the second paragraph according to time to establish a time vector of the first-level title and the second-level title; establishing a first title network graph according to the first-level title, the second-level title, the upper-layer title, the affiliated vector and the time vector; obtaining a plurality of second articles, and correspondingly establishing a plurality of second title network graphs according to the plurality of second articles, wherein the article names of the second articles and the first articles belong to synonyms; the first headline network graph and the plurality of second headline network graphs are input into a graph neural network for deep learning to obtain the step sequence of the first process system and the first process system, so that the technical problems that in the prior art, mathematical statistics can only be performed on the steps which appear, the capacity of unknown steps is not deduced, and when step information of the same process reflected by different articles conflicts, consistency verification can cause the loss of the accuracy of the final result are solved, continuous iterative learning based on the graph neural network is achieved, certain capacity of excavating hidden steps is achieved, and the technical effect of maximizing the accuracy of the result of the iterative learning of the graph neural network is ensured.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

CN202010727219.7A2020-07-272020-07-27Method and device for extracting process system of article based on graph neural networkActiveCN111598239B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010727219.7ACN111598239B (en)2020-07-272020-07-27Method and device for extracting process system of article based on graph neural network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010727219.7ACN111598239B (en)2020-07-272020-07-27Method and device for extracting process system of article based on graph neural network

Publications (2)

Publication NumberPublication Date
CN111598239Atrue CN111598239A (en)2020-08-28
CN111598239B CN111598239B (en)2020-11-06

Family

ID=72183075

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010727219.7AActiveCN111598239B (en)2020-07-272020-07-27Method and device for extracting process system of article based on graph neural network

Country Status (1)

CountryLink
CN (1)CN111598239B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112699658A (en)*2020-12-312021-04-23科大讯飞华南人工智能研究院(广州)有限公司Text comparison method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109460479A (en)*2018-11-192019-03-12广州合摩计算机科技有限公司A kind of prediction technique based on reason map, device and system
CN109635171A (en)*2018-12-132019-04-16成都索贝数码科技股份有限公司A kind of fusion reasoning system and method for news program intelligent label
CN110188168A (en)*2019-05-242019-08-30北京邮电大学 Semantic relationship recognition method and device
US20190304156A1 (en)*2018-04-032019-10-03Sri InternationalArtificial intelligence for generating structured descriptions of scenes
CN110619081A (en)*2019-09-202019-12-27苏州市职业大学News pushing method based on interactive graph neural network
CN111222315A (en)*2019-12-312020-06-02天津外国语大学 A method for predicting the plot of a movie script

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20190304156A1 (en)*2018-04-032019-10-03Sri InternationalArtificial intelligence for generating structured descriptions of scenes
CN109460479A (en)*2018-11-192019-03-12广州合摩计算机科技有限公司A kind of prediction technique based on reason map, device and system
CN109635171A (en)*2018-12-132019-04-16成都索贝数码科技股份有限公司A kind of fusion reasoning system and method for news program intelligent label
CN110188168A (en)*2019-05-242019-08-30北京邮电大学 Semantic relationship recognition method and device
CN110619081A (en)*2019-09-202019-12-27苏州市职业大学News pushing method based on interactive graph neural network
CN111222315A (en)*2019-12-312020-06-02天津外国语大学 A method for predicting the plot of a movie script

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
THIEN HUU NGUYEN: "Graph Convolutional Networks with Argument-Aware Pooling for Event Detection", 《32ND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI 2018》*
陈浩等: "基于多关系循环事件的动态知识图谱推理", 《模式识别与人工智能》*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112699658A (en)*2020-12-312021-04-23科大讯飞华南人工智能研究院(广州)有限公司Text comparison method and related device
CN112699658B (en)*2020-12-312024-05-28科大讯飞华南人工智能研究院(广州)有限公司Text comparison method and related device

Also Published As

Publication numberPublication date
CN111598239B (en)2020-11-06

Similar Documents

PublicationPublication DateTitle
US10083517B2 (en)Segmentation of an image based on color and color differences
Combe et al.I-louvain: An attributed graph clustering method
CN107784598A (en)A kind of network community discovery method
CN111143547B (en) A big data display method based on knowledge graph
CN113190670A (en)Information display method and system based on big data platform
CN110020176A (en)A kind of resource recommendation method, electronic equipment and computer readable storage medium
Lytvyn et al.Architectural ontology designed for intellectual analysis of e-tourism resources
CN103678436A (en)Information processing system and information processing method
US7991595B2 (en)Adaptive refinement tools for tetrahedral unstructured grids
CN118657808B (en) Flow field feature extraction and tracking method, device and equipment based on physical information fusion
He et al.Community Detection in Aviation Network Based on K-means and Complex Network.
CN116610807B (en)Knowledge structure identification method and device based on heterogeneous graph neural network
CN112052177B (en)MC/DC test case set generation method for multi-value coupling signal
CN111598239B (en)Method and device for extracting process system of article based on graph neural network
Du et al.Evaluating structural and topological consistency of complex regions with broad boundaries in multi-resolution spatial databases
CN114912703A (en)Method, device and equipment for predicting rupture pressure and storage medium
Sajjadi et al.A hybrid clustering approach for link prediction in heterogeneous information networks
Niu et al.Determining the optimal generalization operators for building footprints using an improved graph neural network model
CN118094020A (en) Topological adaptive graph layout method, system and storage medium for large graph visualization
CN114612914B (en) A machine learning method and system for multi-label imbalanced data classification
Pancerz et al.Rough sets for discovering concurrent system models from data tables
Ayadi et al.Modeling complex object changes in satellite image time-series: Approach based on CSP and spatiotemporal graphs
Li et al.[Retracted] Text Knowledge Acquisition Method of Collaborative Product Design Based on Genetic Algorithm
Amin et al.Link prediction in scientists collaboration with author name and affiliation
Lukac et al.Verification Based Algorithm Selection

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TR01Transfer of patent right

Effective date of registration:20220513

Address after:Room 408, unit 2, building 15, courtyard 16, Yingcai North Third Street, future science city, Changping District, Beijing 102200

Patentee after:Wenling Technology (Beijing) Co.,Ltd.

Address before:Room 1502, Tongfu building, 501 Zhongshan South Road, Qinhuai District, Nanjing, Jiangsu 210006

Patentee before:Jiangsu United Industrial Limited by Share Ltd.

TR01Transfer of patent right

[8]ページ先頭

©2009-2025 Movatter.jp