Disclosure of Invention
The application provides a text labeling method, a text labeling device, a text labeling terminal and a text labeling storage medium, which are used for solving the technical problems that the current text labeling implementation mode is manual labeling, the labeling accuracy is associated with a labeling person, and the labeling accuracy is obviously reduced when the labeling quantity is large and the associated labels are far away.
First, a first aspect of the present application provides a text annotation method, including:
acquiring text data to be marked;
performing word segmentation processing on the text data to obtain vocabulary information contained in the text data, and identifying entity words in the vocabulary information;
determining word meaning labeling information of each entity word in a semantic recognition mode according to the entity words and context keywords;
according to word meaning labeling information of a first entity word, in combination with a preset word meaning incidence relation, screening a second entity word from the entity words and word meaning relation information of the first entity word and the second entity word, wherein the second entity word is an entity word of which the word meaning labeling information and the word meaning labeling information of the first entity word have an incidence relation;
and generating word meaning association labeling information according to the first entity words, the second entity words and the word meaning relation information.
Preferably, the identifying entity words in the vocabulary information further comprises:
and generating text positioning information of the entity words according to the positions of the entity words in the text data.
Preferably, after determining word sense labeling information of each entity word by a semantic recognition mode according to the entity word and in combination with a context text of the entity word, the method further includes:
and determining a first label display area corresponding to the entity words according to the text positioning information of the entity words respectively so as to display the word meaning label information of the entity words on the first label display area.
Preferably, after generating the word sense associated tagging information according to the first entity word, the second entity word and the word sense relation information, the method further includes:
determining a second label display area according to the text positioning information of the first entity word and the second entity word so as to display the word meaning associated label information on the second label display area, wherein the word meaning associated label information comprises: word sense association label text and word sense association vector graphics.
Meanwhile, a second aspect of the present application provides a text labeling apparatus, including:
the text acquisition unit is used for acquiring text data to be marked;
the entity word recognition unit is used for performing word segmentation processing on the text data to obtain vocabulary information contained in the text data and recognizing entity words in the vocabulary information;
the word meaning labeling processing unit is used for determining word meaning labeling information of each entity word in a semantic recognition mode according to the entity words and in combination with context keywords;
the associated entity word identification unit is used for screening a second entity word from the entity words and word meaning relation information of the first entity word and the second entity word by combining a preset word meaning associated relation according to word meaning tagging information of the first entity word, wherein the second entity word is an entity word of which the word meaning tagging information and the word meaning tagging information of the first entity word have an associated relation;
and the word meaning association labeling information generating unit is used for generating word meaning association labeling information according to the first entity words, the second entity words and the word meaning relation information.
Preferably, the method further comprises the following steps:
and the entity word positioning unit is used for generating text positioning information of the entity words according to the positions of the entity words in the text data.
Preferably, the method further comprises the following steps:
and the word meaning label display unit is used for determining a first label display area corresponding to the entity word according to the text positioning information of the entity word so as to display the word meaning label information of the entity word on the first label display area.
Preferably, the method further comprises the following steps:
a word sense associated label display unit, configured to determine a second label display area according to text positioning information of the first entity word and the second entity word, so as to display the word sense associated label information on the second label display area, where the word sense associated label information includes: word sense association label text and word sense association vector graphics.
A third aspect of the present application provides a text annotation terminal, including: a memory and a processor;
the memory is used for storing program codes, and the program codes correspond to the text labeling method of the first aspect of the application;
the processor is configured to execute the program code.
A fourth aspect of the present application provides a storage medium having stored therein program code corresponding to the text annotation method of any one of the first aspects of the present application.
According to the technical scheme, the method has the following advantages:
the application provides a text labeling method, which comprises the following steps: acquiring text data to be marked; performing word segmentation processing on the text data to obtain vocabulary information contained in the text data, and identifying entity words in the vocabulary information; determining word meaning labeling information of each entity word in a semantic recognition mode according to the entity words and context keywords; according to word meaning labeling information of a first entity word, in combination with a preset word meaning incidence relation, screening a second entity word from the entity words and word meaning relation information of the first entity word and the second entity word, wherein the second entity word is an entity word of which the word meaning labeling information and the word meaning labeling information of the first entity word have an incidence relation; and generating word meaning association labeling information according to the first entity words, the second entity words and the word meaning relation information.
The method comprises the steps of utilizing an entity word obtained by performing word segmentation processing on a text to be labeled and word meaning labeling information of a first entity word to combine a preset word meaning association relationship, screening a second entity word from the entity word and word meaning relationship information of the first entity word and the second entity word, and generating word meaning association labeling information of the first entity word and the second entity word, so that automatic labeling of the entity word and the entity word association relationship in the text is realized, the accuracy of text labeling is improved, the existing labeling mode is solved, and the technical problem that the labeling accuracy is obviously reduced when the labeling quantity is large and the associated labels are far away is solved.
Detailed Description
The embodiment of the application provides a text labeling method, a text labeling device, a text labeling terminal and a text labeling storage medium, which are used for solving the technical problems that the existing text labeling mode is manual labeling, the accuracy of labeling is associated with a labeler, and the labeling accuracy is obviously reduced when the labeling quantity is large and the associated labels are far away.
In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, a first embodiment of the present application provides a text annotation method, including:
step 101, obtaining text data to be marked.
And 102, performing word segmentation processing on the text data to obtain vocabulary information contained in the text data, and identifying entity words in the vocabulary information.
The method includes the steps of firstly, acquiring text data to be labeled, performing word segmentation on the text data to obtain each vocabulary after the text data to be labeled is subjected to word segmentation, obtaining vocabulary information formed by the vocabularies, and identifying entity words in the vocabularies by combining the parts of speech of the vocabularies.
And 103, determining word meaning labeling information of each entity word in a semantic recognition mode according to the entity words and the context keywords.
It should be noted that, based on the entity words identified instep 102, in combination with the keyword information of the context of the entity words, the word senses or paraphrases of the entity words in the text are identified through a semantic identification method, such as a machine learning semantic identification method, and the word sense tagging information corresponding to each entity word is determined, so as to automatically tag the word senses of the entity words.
And 104, according to the word meaning labeling information of the first entity word, in combination with a preset word meaning association relationship, screening a second entity word and word meaning relationship information of the first entity word and the second entity word from the entity words, wherein the second entity word is an entity word of which the word meaning labeling information and the word meaning labeling information of the first entity word have an association relationship.
It should be noted that, one entity word is arbitrarily selected from the entity words instep 102 as a first entity word, according to the word sense tagging information of the first entity word, in combination with a preset word sense association relationship, an entity word having an association relationship with the word sense tagging information of the first entity word is screened out from the entity words as a second entity word, and then, based on the word sense association relationship, the word sense relationship information of the first entity word and the second entity word is determined.
And 105, generating word meaning association labeling information according to the first entity words, the second entity words and the word meaning relationship information.
It should be noted that, word meaning associated labeling information associated with the first entity word and the second entity word is generated according to the word meaning relationship information of the first entity word and the second entity word, so as to implement automatic labeling of the entity word association relationship.
The embodiment of the application utilizes the entity words obtained by performing word segmentation processing on the text to be labeled and the word meaning labeling information of the first entity word, and combines the preset word meaning association relationship to screen out the second entity word and the word meaning relationship information of the first entity word and the second entity word from the entity words, and generate the word meaning association labeling information of the first entity word and the second entity word, so that the automatic labeling of the entity words and the association relationship of the entity words in the text is realized, the accuracy of text labeling is improved, the existing labeling mode is solved, and the technical problem that the labeling accuracy is obviously reduced when the labeling quantity is large and the associated labels are far away is solved.
The above is a detailed description of a first embodiment of a text annotation method provided in the present application, and the following is a detailed description of a second embodiment of a text annotation method provided in the present application.
Referring to fig. 2 and fig. 4, a second embodiment of the present application provides a text annotation method based on the first embodiment, including:
step 201, obtaining text data to be marked.
Step 202, performing word segmentation processing on the text data to obtain vocabulary information contained in the text data, and identifying entity words in the vocabulary information.
The method includes the steps of firstly, acquiring text data to be labeled, performing word segmentation on the text data to obtain each vocabulary after the text data to be labeled is subjected to word segmentation, obtaining vocabulary information formed by the vocabularies, and identifying entity words in the vocabularies by combining the parts of speech of the vocabularies.
Step 2001, generating text positioning information of the entity words according to the positions of the entity words in the text data.
It should be noted that, in this embodiment, after the text characters are split, a pair of (x, y) coordinate locations is given to each entity word as the text location information of the entity word, so as to implement fast location of the entity word and/or be used for label display processing in subsequent steps.
And 203, determining word meaning labeling information of each entity word in a semantic recognition mode according to the entity words and the context keywords.
And step 2002, determining a first label display area corresponding to the entity word according to the text positioning information of the entity word, so as to display the word meaning label information of the entity word on the first label display area.
As shown in fig. 4, based on the text positioning information obtained instep 2001, a region for labeling entities or relationships is calculated from the text positioning information as a first label display region, such as a neighboring region above or below the entity word, so that the meaning label information of the entity word is displayed in the first label display region in the following. For example, taking the entity word "college" as an example, by identifying and determining that the meaning label information corresponding to the entity word "college" is "school", a first label display area may be formed at a position close to "college" to synchronously display the entity word and the meaning label information corresponding to the entity word.
Step 204, according to the word meaning labeling information of the first entity word, in combination with a preset word meaning association relationship, screening a second entity word and word meaning relationship information of the first entity word and the second entity word from the entity words, wherein the second entity word is an entity word in the entity words, and the word meaning labeling information of the second entity word and the word meaning labeling information of the first entity word have an association relationship.
Step 205, generating word meaning associated labeling information according to the first entity word, the second entity word and the word meaning relation information.
Step 2003, determining a second label display area according to the text positioning information of the first entity word and the second entity word, so as to display word meaning associated label information on the second label display area, wherein the word meaning associated label information includes: word sense association label text and word sense association vector graphics.
As shown in fig. 4, based on the text positioning information obtained instep 2001, a region for calculating a tagged entity or relationship with the text positioning information is used as a second tagged display region, such as a region above or below the first entity word and the second entity word, so as to subsequently display word meaning associated tagging information corresponding to the first entity word and the second entity word in the second tagged display region. For example, the word sense association relationship between the school/unit and the employee is hired/hired, assuming that the current first entity word is "colleges", when the word sense tagging information is screened as the second entity word "mental professional teacher" of the employee, determining and forming a second tagging display area according to the text positioning information of the first entity word and the second entity word, so as to display the word sense association tagging information on the second tagging display area, and when the first entity word is "mental professional teacher" or other entity words, generating the word sense association tagging information in the same manner, which is not described herein.
It should be noted thatsteps 201, 202, 203, 204, and 205 of this embodiment correspond tosteps 101 to 105 of the first embodiment, and these steps are not described again here.
The above is a detailed description of the second embodiment of the text labeling method provided in the present application, and the following is a detailed description of the first embodiment of the text labeling apparatus provided in the present application.
Referring to fig. 3, a third embodiment of the present application provides a text annotation device, including:
a text acquiring unit 301, configured to acquire text data to be labeled;
the entity word recognition unit 302 is configured to perform word segmentation processing on the text data to obtain vocabulary information included in the text data, and recognize entity words in the vocabulary information;
a word sense tagging processing unit 303, configured to determine, according to the entity words, word sense tagging information of each entity word in a semantic recognition manner in combination with the context keywords;
the associated entity word recognition unit 304 is configured to, according to the word sense tagging information of the first entity word, in combination with a preset word sense association relationship, screen out a second entity word and word sense relationship information of the first entity word and the second entity word from the entity words, where the second entity word is an entity word in which the word sense tagging information and the word sense tagging information of the first entity word have an association relationship;
a word sense associated labeling information generating unit 305 for generating word sense associated labeling information based on the first entity word, the second entity word and the word sense relation information.
Further, still include:
the entity word positioning unit 3001 is configured to generate text positioning information of an entity word according to a position of the entity word in the text data.
Further, still include:
the word sense tagging display unit 3002 is configured to determine, according to the text positioning information of the entity word, a first tagging display area corresponding to the entity word, so as to display word sense tagging information of the entity word on the first tagging display area.
Further, still include:
a word sense associated label display unit 3003, configured to determine a second label display area according to the text positioning information of the first entity word and the second entity word, so as to display word sense associated label information on the second label display area, where the word sense associated label information includes: word sense association label text and word sense association vector graphics.
The above is a detailed description of an embodiment of a text labeling apparatus provided in the present application, and the following is a detailed description of a text labeling terminal and a storage medium provided in the present application.
A third aspect of the present application provides a text annotation terminal, including: a memory and a processor;
the memory is used for storing program codes, and the program codes correspond to the text labeling methods mentioned in the first embodiment or the second embodiment of the application;
the processor is used for executing the program codes to realize the text labeling method mentioned in the first embodiment or the second embodiment of the application.
A fourth aspect of the present application provides a storage medium having stored therein program code corresponding to the text labeling method as mentioned in the first embodiment or the second embodiment of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.