Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
It will be appreciated that in particular embodiments of the present invention, data relating to user information and the like may be subject to user approval or consent, and that the collection, use and processing of the relevant data may be subject to relevant national and regional laws and regulations and standards.
In this embodiment, a data processing method is provided, and as shown in fig. 1, an application scenario diagram of a text extraction method is shown. The application scenario may include a terminal 101 and a server 102, where data exchange may be performed between the terminal 101 and the server 102 through a network. The terminal 101 may be a mobile phone, a tablet computer, an intelligent bluetooth device, a computer, etc.; the server 102 may be a single server or a server cluster formed by a plurality of servers.
The user can upload the image to be extracted to the server 102 through the terminal 101, and the server 102 can identify the text in the image to be extracted to obtain a first text in a first format; determining a text fragment from the first text based on a first grammar rule corresponding to the first format, and a fragment type corresponding to the text fragment; the first text is converted into the second text using the conversion hint words and the segment type of each text segment. The conversion prompt word may be specifically input into the conversion model, so that the conversion model converts the first text into the second text.
The second text may then be rationalized with the check hint word to obtain a third text. Namely, inputting the verification prompt word into a verification model to realize verification of the second text, and finally obtaining a third text passing through rationality verification. And finally, utilizing the mapping prompt word to guide the mapping model to establish a mapping relation between the third text and the specified field so as to extract the specified text corresponding to the specified field from the third text.
The specified field may also be provided by the user through the terminal 101, for example, in fig. 1, the terminal 101 may also send the specified field to the server 102. After obtaining the specified text, the server 102 may send the specified text to the terminal 101 for presentation.
In this embodiment, a text extraction method is provided, as shown in fig. 2, and the specific flow of the method may be as follows:
S110, identifying the text in the image to be extracted to obtain a first text in a first format.
The image to be extracted refers to an image required for text extraction, and the image may refer to an image of a non-editable text document, such as a document in PDF format, or a photograph, picture, or the like containing text.
The first format may be a specific format set in advance according to actual needs, and in the embodiment of the present invention, the first format may be a Markdown format, where Markdown is a lightweight markup language, and a language rule of the Markdown is intended to achieve a "easy-to-read and easy-to-write" goal, so that text content is easier to read and write. After the text is identified from the image to be extracted, the text can be converted into a first format to retain information such as layout of the text in the image, and the first text refers to the text in the image with the first format.
In some embodiments, to obtain the first text in the first format, optical character recognition (Optical Character Recognition, OCR) may be performed on the image to be extracted to obtain editable text; and converting the editable text into a first text in a first format. OCR is a technique that uses specific recognition algorithms to convert printed or handwritten characters into a text format that can be edited and processed by a computer. In order to improve the accuracy of OCR recognition, the image to be extracted can be firstly subjected to denoising and other treatments, and then the denoised image is subjected to text recognition by an OCR recognition technology.
Optionally, when text recognition is performed, layout analysis may be performed to identify different regions and layouts in the image to be extracted, where the regions may be index questions, paragraphs, tables, and the like, then text in the corresponding region may be identified for characteristics in the different regions, in order to improve accuracy of text recognition, error correction processing may be performed on the identified text, and finally the identified text may be organized according to the first format, so that the first text in the first format may be obtained.
S120, determining a text fragment and a fragment type corresponding to the text fragment from the first text based on a first grammar rule corresponding to the first format.
The first grammar rule refers to a grammar rule that needs to be satisfied by the first text in the first format, that is, the first text is organized according to the first grammar rule. For facilitating subsequent processing, a text segment may be determined from the first text based on the first grammar rule, and a segment type corresponding to the text segment.
In some embodiments, when determining text fragments and fragment types, it may be to obtain segmentation markers and classification markers; dividing the first text into a plurality of text segments based on a segmentation marker; and classifying each text segment based on the classification mark to obtain the segment type corresponding to the text segment.
In this embodiment, the segmentation mark may be "\n\n", that is, each character in the first text may be sequentially identified, and when the segmentation mark "\n\n" is identified, all contents before the segmentation mark "\n\n" are used as a text segment; and deleting the text fragments from the first text, and repeating the steps until the first text is completely divided into a plurality of text fragments.
The classification labels are symbols in the first grammar rules, in this embodiment the classification labels are "|". For each text segment, a classification flag may be detected in the text segment to determine whether the classification flag is included in the text segment; if the text segment contains a classification mark, determining the segment type of the text segment as a table type; if the text segment does not contain the classification mark, determining the segment type of the text segment as a common type.
S130, converting the first text into a second text in a second format based on the conversion prompt words and the segment type of each text segment.
The conversion prompt word is a prompt word for guiding the large language model to perform text conversion on the text segment, and the second format is a final format of the text to be extracted from the image. It can be understood that the first grammar rule and the second grammar rule are completely different grammar rules, and the first text needs to be analyzed and processed in combination with the first grammar rule, and then the analyzed content is converted according to the second grammar rule to obtain the second text.
In some embodiments, when the second text in the second format is acquired, a conversion prompt word may be generated by using the first text and the conversion template, where the conversion prompt word includes an output format and a plurality of conversion steps; executing the plurality of conversion steps according to the fragment type of the text fragment and the second grammar rule for each text fragment to obtain a conversion fragment corresponding to the text fragment; and organizing the conversion fragments based on the output format to obtain a second text in a second format.
The conversion template refers to a preset prompt word template for conversion, and the conversion template can comprise conversion slots, a plurality of conversion steps and an output format. The conversion slots are slots to be filled in the conversion template, and the conversion slots can comprise text slots. That is, only the first text is filled into the text slots in the form of text fragments, and the conversion prompt word can be generated.
For example, the relimitter= "" in the conversion template is a text slot, and a text segment is filled in the text slot.
In the embodiment of the invention, a specific conversion template is as follows:
delimiter = "```"
system_prompt = f"""
The markdown text bracketed with { relimiter } above is converted into structured data in json format (second format) according to the following procedure. Extracting all key information in the text, and organizing the information according to json format requirements. The detailed description of each step will be separated by { relimitter }.
Step 1 { reliimiter } parses the markdown text, sorts all text elements, such as title (#), sub-title (#), unordered list (-or) Links ([ text ] (link)), and text paragraphs. The title is roughly divided into { < other information analyzed >, "< higher-level title name >" [ { < dictionary containing all lower-level titles as keys > } ], < other information analyzed > }.
Step 2 { reliimiter } maps the parsed elements into the structure of json. Each title should be translated into a key of the json object, the following as the corresponding value. Simple example: { < other information parsed >, "< higher-level title name >" [ { < dictionary containing all lower-level titles as keys > }, other content under this title ], < other information parsed > }. Unordered and ordered lists should be as in json text (taking this step paragraph as an example): { < other information parsed >, "ordered list 1" [ < content of step 1 >, < content of step 2 >, < content of step 3 >, < content of step 4 > ], < other information parsed > } should be converted into an array. The link should be stored as an object containing both "text" and "link".
Step 3 { reliimiter } processes text elements in special format, firstly, it should be confirmed whether there is an unmodified feature, such as phone number, bank card account, customer name and link, etc., and when conversion, it is necessary to ensure the migration as it is. For special format text, such as bolded, italic, or code blocks. The special format is removed, but it should be noted that these elements are likely to have non-modifiable characteristics. It is determined how to reasonably represent these elements in the json structure.
Step 4 { relimitter } organizes all the converted information and ensures that the final output is in an efficient json format. Ensuring that each key-value pair is correctly matched, the array and object are correctly constructed. For complex nested structures, it is guaranteed that their hierarchy and association in json is correct.
The final output should follow the following format:
step 1 { relimit } < result of step 1 >
Step 2 { relimit } < result of step 2 >
Step 3 { relimit } < result of step 3 >
Step4 { relimit } < result of step4 >
Json format output { relimitter } < json text after conversion >
After each step is completed, { relimitter } is used to separate the results of the step from json structured data.
Based on the foregoing, the conversion prompt word may include a plurality of conversion steps, an output format, and the like, where the conversion steps are specific steps that need to be executed when the conversion model obtains the second text in the second format; the format to which the results output by the format conversion model are to be subjected is output. Based on the conversion step, the text segment can be converted into a corresponding conversion segment, and then the conversion segment is organized according to the output format, so that the second text can be obtained.
As one implementation, the conversion hinting words may be input into a conversion model that processes the text segments to convert each text segment into conversion segments according to the conversion steps in the conversion hinting words. The output format is the format of the content output by the conversion model, and the conversion fragments are organized according to the output format to obtain a second text in a second format. The output format may be that, for each conversion step, a result corresponding to the conversion step is output, and finally, a conversion fragment is output. By using the output format, the conversion fragment can be rapidly determined from the output content, so that the conversion fragment can be directly extracted to obtain the second text. In other words, the second text includes a plurality of conversion fragments, the number of which corresponds to the number of text fragments.
Optionally, when converting the text segment into the conversion segment, for each text segment, analyzing the text segment according to the segment type and the first grammar rule to obtain a text element, an element type of the text element, and an element style of the text element; converting the text element into an intermediate element according to the element type of the text element and a second grammar rule; performing style adjustment on the intermediate element by utilizing an element style and a designated style corresponding to the text element to obtain a target element; and organizing all the target elements based on the second grammar rule to obtain conversion fragments.
For each text segment, the corresponding segment type, namely the common type or the table type, is determined, and the text segment can be parsed based on the segment type and the first grammar rule to obtain text elements in the text segment, the element type corresponding to each text element and the element style corresponding to the text element.
The first grammar rule defines element identifiers corresponding to text elements with different element types, for example, "#" is used for representing titles, one# is used for representing primary titles, and two# is used for representing secondary titles; list available "-" or ""Means. The element style also defines different style identifications, for example,A is italics; indicating that a is bolded.
Thus, for each text segment, an element identification may be identified in the text segment to parse out text elements therein, e.g., titles, lists, tables, links, etc. If the text segment is of a common type, the form cannot be parsed therefrom, and if the text segment is of a form type, the form can be parsed therefrom. Based on the segment type corresponding to the text segment, whether the analyzed text element is accurate or not can be checked, so that the accuracy of text element analysis is improved. Specifically, after the text element is analyzed, if the text element does not contain a table and the fragment type is a common type, the analysis can be judged to be accurate; if the table is not analyzed, but the fragment type is the table type, the inaccuracy of analysis can be judged, and the re-analysis is needed.
After the text element is parsed, the text element may be converted into an intermediate element using an element type and a second grammar rule corresponding to the text element. For example, when the element type is a title, the title may be used as a key, and text content corresponding to the title may be used as a value, so as to obtain an intermediate element.
When the element type is a list, the list may be further divided into an ordered list and an unordered list, and the difference between the ordered list uses numbers or letters to represent each item in the list in order, and the unordered list uses item symbols such as dots, squares, arrows, or the like to represent each item in the list. For a list, the list may be named, e.g., ordered list 1, unordered list 2, etc., with the naming as a key, and the specific content of the list as a value to form an object, thereby obtaining an intermediate element.
When the element type is a table, the table can be used as a key according to the sequence of appearance, the list group is used as a value, and the elements of the list are dictionaries. The dictionary in the list contains the attributes of each column of the table, and the corresponding value is the value of the corresponding row, so that the ith row corresponds to the ith dictionary in the list.
When the element type is a link, the element type is stored as an object containing text and a link field to obtain an intermediate element. The parsed text element may have its corresponding element style, for example, bold, italic, etc., and the element style of the corresponding text element may be modified to a specified style, which may refer to no style. That is, the element style of the text element is directly removed, e.g., italics and bolded styles are removed, to obtain the text element having the specified style. When the style of the intermediate element is adjusted, only style adjustment is performed, and text content is not modified, so that conversion accuracy is ensured, and a target element is obtained. Finally, each target element is organized according to a second grammar rule, and the hierarchy and nesting in the target element are ensured to be correct, so that the conversion fragment can be obtained.
It should be noted that the above conversion step may be performed by a conversion model, and the conversion model may be a large language model. Specifically, a conversion hint word may be input into the conversion model, which may process the text segments according to the conversion steps in the conversion hint word to convert each text segment into a conversion segment. The output format is the format of the content output by the conversion model, and finally, the conversion fragments can be organized according to the output format to obtain a second text in a second format. The output format may be that, for each conversion step, a result corresponding to the conversion step is output, and finally, a conversion fragment is output. With this output format, which part of the content that can be output is the conversion fragment, so that the conversion fragment can be directly extracted, resulting in the second text.
And S140, carrying out rationality verification on the second text by using the verification prompt word so as to obtain a third text passing the rationality verification.
The verification prompt word is used for guiding the large language model to carry out rationality verification on the second text, and the rationality verification is used for verifying whether the second text is reasonable and accurate compared with the first text. And using the verification prompt word to carry out rationality verification on the second text so as to obtain a third text passing the rationality verification.
As one embodiment, the step may include generating, for each text segment, a check prompt by using a segment group and a check template, where the segment group includes the text segment and a conversion segment corresponding to the text segment, and the check prompt includes sub-checks corresponding to multiple element types and a preset threshold; executing a plurality of sub-checks on the fragment group through a check model to obtain sub-check results corresponding to the sub-checks; generating a verification score of the fragment group according to the sub-verification result; if the check score is not smaller than the preset threshold, taking the conversion fragments in the fragment group as fragments to be combined; if the check score is smaller than the preset threshold value, modifying the conversion fragments in the fragment group to obtain fragments to be combined; and combining all the fragments to be combined to obtain a third text passing the rationality check.
As can be seen from the foregoing, the second text includes a plurality of conversion fragments, wherein one conversion fragment corresponds to one text fragment, and the conversion fragment and the text fragment having the correspondence are denoted as fragment groups. And generating a verification prompt word by using the fragment group and the verification template. The verification template refers to a preset prompt word template for verification, and the verification template may include verification slots, where the verification slots are slots that need to be filled in the verification template, for example, text slots, conversion slots, and threshold slots, where the text slots are used to fill text fragments in a fragment group, the conversion slots are used to fill conversion fragments in the same fragment group, and the preset threshold in the threshold slots may be filled in advance as part of the template. Illustratively, relimitter= "" is text slot and transition slot; threshold=90 is a threshold slot, and 90 is a preset threshold, which can be adjusted according to actual needs.
In the embodiment of the invention, a specific verification template is as follows:
delimiter = "```"
In a threshold=90# practical system, the threshold can be modified according to the scene.
System_prompt 2=f "" "is divided using { relimiter } according to the following procedure, and a markdown text (text fragment) bracketed with { relimiter } and a json text (conversion fragment), which is the result of information extraction of the markdown text, are evaluated and checked. It is necessary to ensure that all information is correctly converted from markdown (first format) to json (second format) without any errors or omissions.
Step 1 { relimmer } compares the markdown primitive with the json output to ensure that the title and subtitle are correctly converted into the keys of the json object, note the hierarchical relationship between the titles, and the lower level title should be contained in the dictionary of the higher level title, and the related content is taken as the corresponding value. Recursive parsing, thereby forming a nested dictionary format. Special attention is paid to whether there is a header level that is erroneously translated or missing. Simple example: { < other information parsed >, "< higher-level title name >" [ { < dictionary containing all lower-level titles as keys > }, other content under this title ], < other information parsed > }.
And 2, converting the table in the markdown into a json key according to the sequence of occurrence, wherein the json key is a table name, the value is a list, and the element of the list is a dictionary. The dictionary in the list contains the attributes of each column of the table, and the corresponding value is the value of the corresponding row, so that the ith row corresponds to the ith dictionary in the list. Simple example: { < other information parsed >, "staff information table" [ { "name": "Zhang Sanj", "age": 22}, { "name": "Lifour", "age": 28} ], < other information parsed > }, the length of the list should be equal to the number of rows of the table-1 (the header row is removed).
Step 3 { reliimiter } numbering the unordered list and ordered list in markdown according to the appearance order, "unordered list < number >" or "ordered list < number >" as keys, each item of content is converted into a list as a value. Ensuring that each item in the list is accurately transformed without losing or erroneously identifying a list item. Taking the step of the present hint word as a simple example, it should be converted into { < parsed other information >, "ordered list 1" [ < content of step 1 >, < content of step 2 >, < content of step 3 >, < content of step 4 >, < content of step 5 >, < content of step 6 > ], < parsed other information > }.
Step 4 { reliimiter } verifies whether the link in markdown is correctly converted into json key-value pairs containing "< hyperlink text >" and "< link URL >" properties. The text and URL of the link must be converted intact and not be modified at all.
Step 5 { reliimiter } confirms that irrelevant key values are not linked together, e.g. checking if birth year is incorrectly marked as age, or if a conflict has occurred between different keywords. If there is a conflict, the evaluation is not passed directly.
Step 6 { relimmer } ensures that all important information in the markdown text is converted and contained in the json result, without missing any critical data, special attention should be paid to fonts in special formats such as bold and italics.
After each of the above steps, please provide an evaluation report to ensure that the evaluation covers the following:
all conversion errors or inaccuracies detected
Suggested corrective measures
-For any ambiguous or complex transition case, the proposed processing advice
Fraction of accuracy (percentage system)
Fraction of integrity (percentage system)
Whether or not the evaluation (Boolean value, based on whether or not both are greater than { threshold })
Modified json text (if no evaluation is passed)
The scoring of accuracy must be strict, and conflicts or tampering with critical content (e.g., telephone numbers, bank card accounts, customer names, etc. that should not be modified) should be treated directly as 0 points. Other cases make decisions based on the characteristics of the fields, simple synonym substitutions for non-critical information, a summary of the statement is possible.
The assessment report was completed using the following format:
Step 1 results { reliimiter } < question identified in step 1 and suggestion >
Step 2 results { reliimiter } < question identified in step 2 and suggestion >
...
Step 6 results { reliimiter } < question identified in step 6 and advice >
Evaluation report: { reliimiter } < evaluation report in json format > "" ".
It should be noted that, the verification prompt word may include sub-verification of multiple element types, the verification model is a language model for performing rationality verification, the verification prompt word is input into the verification model, and the verification model may be guided to perform verification on the converted text and the text segment according to the verification step in the verification prompt word.
For each text segment, it is necessary to determine whether the text segment is accurately and comprehensively converted into a converted segment, and the process can use a verification model to make a determination. When performing a plurality of sub-checks on the segment group based on the check model, the method may include the steps of: aiming at the text fragments in the fragment group, acquiring the element type of each text element in the text fragments to obtain a target type; determining a target sub-check from the plurality of sub-checks according to the target type; and executing the target sub-verification on the fragment group through a verification model to obtain a verification result corresponding to the target sub-verification.
When converting the text segment into the conversion segment, the text segment is parsed to obtain the text element and the type of the text element in each text segment, and the parsing result is output together with the output of the conversion model, so that the element type of each text element in the text segment can be directly obtained from the output result of the conversion model to obtain the target type.
Text elements of different element types, which are different in the sub-checks to be correspondingly executed, that is, based on the target type of the text element, the sub-check to be executed currently can be determined from a plurality of sub-checks and recorded as the target sub-check. For example, the sub-checks of the plurality of element types include a sub-check of a title, a sub-check of a list, a sub-check of a table, and the like, and if a text fragment includes text elements of a list type and a title type, the target type includes a title and a list, and the target sub-check includes a sub-check of a title, a sub-check of a list, and the like.
Then, a target sub-check may be performed on the segment group through the check model to determine whether the text segments in the segment group are accurately and comprehensively converted.
For example, for sub-verification of a title, the verification model may compare text fragments and conversion fragments in a fragment group, ensure that the title, sub-title, etc. are accurately converted, ensure that their hierarchical relationships are accurate, and if not, require that the verification model record where not accurate, and give modified results.
Aiming at sub-verification of the table, the conversion model takes the table name of the table as a key, the list as a value, and elements in the list as a dictionary, wherein the dictionary in the list contains the attribute of each column of the table, and the corresponding value is the value of the corresponding row. For example, { "name": "Zhang San", "age": 22}, { "name": "Lifour", "age": 28}, in the conversion fragment, where name and age are the attributes of each column, and the corresponding value is the value of the corresponding row. And the length of the list in the conversion fragment satisfies the line number of the table minus 1. Therefore, in verification, whether the length of the list in the dictionary is the number of rows of the table minus 1 can be verified, and whether the relationship between the attribute and the value is accurate can be verified. If the verification has a problem, the verification model is required to record the place where the problem occurs, and the modified result is given.
For sub-verification of links, links in the conversion fragment are key-value pairs with text of the link as a key and URL of the link as a value. The verification model can verify whether the text of the link in the conversion fragment and the URL of the link are consistent with each other in the text fragment. If the verification is inconsistent, a place where the verification model records are inconsistent is required to be verified, and a modified result is given.
After each target verification is executed, a verification result corresponding to each target verification can be obtained, and a corresponding verification score is generated by using the verification result. For example, a first number of errors with inaccurate conversion can be determined according to the verification result, the first number is mapped to an accuracy score, a second number of errors with incomplete conversion is similarly determined from the verification result, the second number is mapped to an integrity score, and the accuracy score and the integrity score are weighted and summed by using preset weights to obtain the verification score.
If the check score is not smaller than the preset threshold, the conversion fragments in the fragment group are considered to be accurate and complete, and the conversion fragments in the fragment group can be used as fragments to be combined. If the verification score is smaller than the preset threshold value, the verification model modifies the conversion fragments, and the modified conversion fragments are used as fragments to be combined. And finally, combining all the fragments to be combined to obtain a third text passing the rationality detection.
And S150, based on the mapping prompt words, guiding a mapping model to establish a mapping relation between the third text and the specified field, and extracting the specified text corresponding to the specified field from the third text based on the mapping relation.
The mapping hint word refers to a hint word that is used to establish a mapping relationship between the third text and the specified field, and the mapping model is a language model that analyzes the third text and the specified field to establish the mapping relationship.
The specified field is a preset field, which may refer to a field where the user wants to obtain the corresponding content, and by establishing a mapping relationship between the third text and the specified field, the specified text corresponding to the specified field may be extracted from the third text, so that the user may conveniently and quickly extract the desired content. The specified field may also be a field defined in a preset database, and the third text may be stored in a form of an adaptation database by establishing a mapping relationship between the third text and the specified field, and when the method is used subsequently, relevant data may be directly obtained from the data.
In some embodiments, when extracting the specified text corresponding to the specified field, generating a mapping prompt word based on the specified field, the third text and the mapping template, wherein the mapping prompt word comprises a plurality of mapping steps; executing the plurality of mapping steps through a mapping model to establish a mapping relationship between the third text and the specified field; and extracting the specified text corresponding to the specified field from the third text according to the mapping relation and the specified field.
The mapping template is a preset prompt word template for establishing a mapping relation, the mapping template can comprise mapping slots, the mapping slots are slots which need to be filled in the mapping template, the mapping slots can comprise slots to be mapped and field slots, namely, the third text obtained in the last step is filled in the text slots, and the appointed field is filled in the field slots, so that the mapping prompt word can be obtained.
For example, the relimitter= "" in the mapping template is a slot to be mapped, the third text is filled in, the [ specified field set ] is a field slot, and the specified field is filled in.
In the embodiment of the invention, a specific conversion template is as follows:
delimiter = "```"
predefined_set= [ specified field set ]
system_prompt3 = f"""
Now, the mapping from the generated json text field (third text field) to the set of specified fields is done using { relimitter } separation according to the following procedure. The objective is to process and map close fields to ensure that all critical information correctly corresponds to a predefined set of specified fields. json text content (third text content) is a bracketed portion of { relimitter }. The specified field set includes: { predefined_set }.
Step 1 { relimitter } first, the generated json text is reviewed to identify all fields that need to be mapped. It is important to note whether these fields are close to any of the keywords in the predefined set of keywords.
Step 2 { relimitter } next, define mapping rules for each identified field. For example, if "age" is contained in json text, it should be mapped to "age" in the keyword set.
Step 3 { relimitter } then applies the mapping rule to map all identified fields onto the corresponding key. In this process, please see the percentage to ensure the accuracy and disambiguation of the mapping, and there cannot be any word substitution.
And 4, finally, verifying the accuracy of the mapping result. It is necessary to ensure that each json field is mapped correctly into a predefined set of keys and that no field is mapped or missed in error.
After each of the above steps, please provide a detailed mapping report. The report needs to contain the following:
each identified and mapped field and its corresponding key
-Applied mapping rules
Any challenges encountered during the mapping process and how to solve these challenges
Final validation of the mapping result, ensuring consistency and accuracy
Finally giving json text (text in second format) to which mapping rules are applied
Mapping reporting is done using the following format:
step 1 results { relimiter } < identified field and predetermined keyword >
Step 2 results { relimiter } < defined mapping rule >
...
Step 4 results, { reliimiter } < confirmation of mapping results >
Final results: { relimiter } < json text after mapping > "" "".
And inputting the mapping prompt word into a mapping template, namely guiding the mapping model to establish a mapping relation according to a plurality of mapping steps, and extracting a specified text corresponding to the specified field from the third text based on the mapping relation and the specified field.
The mapping prompt word may include a plurality of mapping steps, and the mapping relation between the third text and the specified field may be established by executing the plurality of mapping steps through the mapping model. As an embodiment, the step may specifically include: extracting a field to be mapped from the third text through the mapping model; for each specified field, calculating semantic similarity between the specified field and each field to be mapped; and establishing a mapping relation between each specified field and the field to be mapped by using the semantic similarity and the similarity threshold.
After the mapping prompt word is input into the mapping model, a plurality of mapping steps are executed by the mapping model, and first, the third text can be identified to identify the field to be mapped. It can be understood that in the third text in the second format, text elements according to different element types have different expressions, but most of the text elements are in a key value pair manner, so that the third text can be structurally parsed, and keys are used as fields to be mapped according to the element types.
And then, calculating the semantic similarity between the specified field and each field to be mapped according to each specified field, and establishing a mapping relation between the specified field and the field to be mapped by utilizing the semantic similarity and a similarity threshold value. Alternatively, the field to be mapped with the semantic similarity larger than the similarity threshold value can be determined as the candidate field; if the number of the candidate fields is larger than the specified number, determining the candidate field with the largest semantic similarity as a target field; if the number of the candidate fields is smaller than or equal to the preset number, determining the candidate fields as target fields; and establishing a mapping relation between the specified field and the target field.
The target field is the field with the highest semantic similarity with the field to be mapped, so that the mapping relation between the target field and the appointed field can be established. According to the flow, a mapping relation between each specified field and the field to be mapped can be established, and then the target field corresponding to each specified field can be obtained. For example, the field to be specified is "age", and the target field is "age"; as another example, the designated field is "graduation university" and the target field is "educational history".
When extracting the specified text corresponding to the specified field from the third text, mapping the third text based on the mapping relation to obtain a fourth text; performing accuracy verification on the fourth text; and if the fourth text passes the accuracy verification, acquiring the specified text corresponding to the specified field from the fourth text.
In some embodiments, in order to ensure the accuracy of the mapping, the target field in the third text may be directly replaced by the specified field corresponding to the target field in the mapping by using the mapping relationship, so as to obtain the fourth text.
Further, in order to ensure the accuracy and integrity of the mapping, the mapping relationship may also be used to perform accuracy verification on the fourth text. For example, each mapping relation is determined to be used without omission, and the mapped third text and fourth text are determined to be different in field only, and the value corresponding to the field is not changed; in the case where there is no omission and the mapped value does not change, the fourth text may be considered to be an accurately mapped text. Thus, the content corresponding to the specified field can be directly used as the specified text in the fourth text.
For a clearer view of the above procedure, reference is made to fig. 3, which shows a schematic diagram of the architecture of the text extraction method. From the image to be extracted, the text image in the image to be extracted can be directly recognized as a first text in a first format by an OCR recognition technology. And then, carrying out segmentation processing on the first text to obtain a plurality of text fragments, and inputting each text fragment into the LLM-Chain to obtain a corresponding appointed text. Wherein the LLM-Chain is a sequence Chain in a topological structure, and referring to FIG. 4, a structural schematic diagram of the LLM-Chain is shown. That is, the text segment in the first format may be input into the conversion model, the conversion model outputs a second text in a corresponding second format, the second text is input into the verification model in the form of the conversion segment, and the verification model outputs a third text passing the verification; the third text is then entered into the mapping model to obtain the specified text.
The text extraction method provided by the embodiment of the invention can be applied to a plurality of scenes, for example, legal scenes, and specific terms, case facts or judgment basis can be extracted from a large number of legal documents so as to assist legal staff in researching and analyzing cases. In the medical scene, the needed information can be extracted from the case report and the research literature, and support is provided for medical research and clinical decision. In a financial scenario, text may be extracted from imaged news stories and finance stories to capture market trends. In the scene of social media, specified public opinion and user feedback in the social media can be automatically extracted, and data support is provided for marketing and brand management.
The method provided by the embodiment of the invention can identify the first text in the first format from the image to be extracted, convert the first text into the second text based on the conversion prompt word and the corresponding conversion model, and can realize accurate conversion through the capability of the large language model. And then, checking the second text by using the check prompt word and the corresponding check model to ensure that a third text with accurate and complete conversion can be obtained. And finally, establishing a mapping relation between the third text and the specified field by using the mapping prompt word and the mapping model so as to extract the specified text corresponding to the specified field from the third text. In the process of extracting the text, the understanding capability of a large language model is fully exerted based on the prompt words, so that the required text is rapidly and accurately extracted from the image.
In order to better implement the method, the embodiment of the invention also provides a text extraction device which can be integrated in electronic equipment, wherein the electronic equipment can be a terminal, a server and the like. The terminal can be a mobile phone, a tablet personal computer, an intelligent Bluetooth device, a notebook computer, a personal computer and other devices; the server may be a single server or a server cluster composed of a plurality of servers.
For example, in this embodiment, a method according to an embodiment of the present invention will be described in detail by taking a specific integration of a text extraction device in a server as an example.
For example, as shown in fig. 5, the text extraction apparatus 200 may include an identification module 210, a determination module 220, a conversion module 230, a verification module 240, and an extraction module 250.
The recognition module 210 is configured to recognize a text in an image to be extracted, so as to obtain a first text in a first format;
a determining module 220, configured to determine a text segment and a segment type corresponding to the text segment from the first text based on a first grammar rule corresponding to the first format;
a conversion module 230, configured to convert the first text into a second text in a second format based on a conversion prompt word and a segment type of each text segment, where a second grammar rule corresponding to the second format is different from the first grammar rule;
the verification module 240 is configured to perform a rationality verification on the second text by using a verification prompt word, so as to obtain a third text that passes the rationality verification;
The extracting module 250 is configured to guide the mapping model to establish a mapping relationship between the third text and the specified field based on the mapping prompt word, and extract the specified text corresponding to the specified field from the third text based on the mapping relationship.
In some embodiments, the conversion module 230 is specifically configured to:
Generating a conversion prompt word by the first text and the conversion template, wherein the conversion prompt word comprises an output format and a plurality of conversion steps;
Executing the plurality of conversion steps according to the fragment type of the text fragment and the second grammar rule for each text fragment to obtain a conversion fragment corresponding to the text fragment;
And organizing the conversion fragments based on the output format to obtain a second text in a second format.
In some embodiments, the conversion module 230 is specifically configured to:
for each text segment, analyzing the text segment according to the segment type and the first grammar rule to obtain a text element, an element type of the text element and an element style of the text element;
converting the text element into an intermediate element according to the element type of the text element and a second grammar rule;
performing style adjustment on the intermediate element by utilizing an element style and a designated style corresponding to the text element to obtain a target element;
and organizing all the target elements based on the second grammar rule to obtain conversion fragments.
In some embodiments, the verification module 240 is specifically configured to:
Generating a verification prompt word by using a fragment group and a verification template aiming at each text fragment, wherein the fragment group comprises the text fragments and conversion fragments corresponding to the text fragments, and the verification prompt word comprises sub-verification and preset thresholds corresponding to a plurality of element types;
executing a plurality of sub-checks on the fragment group through a check model to obtain sub-check results corresponding to the sub-checks;
generating a verification score of the fragment group according to the sub-verification result;
if the check score is not smaller than the preset threshold, taking the conversion fragments in the fragment group as fragments to be combined;
if the check score is smaller than the preset threshold value, modifying the conversion fragments in the fragment group to obtain fragments to be combined;
and combining all the fragments to be combined to obtain a third text passing the rationality check.
In some embodiments, the verification module 240 is specifically configured to:
Aiming at the text fragments in the fragment group, acquiring the element type of each text element in the text fragments to obtain a target type;
determining a target sub-check from the plurality of sub-checks according to the target type;
And executing the target sub-verification on the fragment group through a verification model to obtain a verification result corresponding to the target sub-verification.
In some embodiments, the extraction module 250 is specifically configured to:
Generating a mapping prompt word based on the specified field, the third text and the mapping template, wherein the mapping prompt word comprises a plurality of mapping steps;
executing the plurality of mapping steps through a mapping model to establish a mapping relationship between the third text and the specified field;
and extracting the specified text corresponding to the specified field from the third text according to the mapping relation and the specified field.
In some embodiments, the extraction module 250 is specifically configured to:
extracting a field to be mapped from the third text through the mapping model;
for each specified field, calculating semantic similarity between the specified field and each field to be mapped;
and establishing a mapping relation between each specified field and the field to be mapped by using the semantic similarity and the similarity threshold.
In some embodiments, the extraction module 250 is specifically configured to:
Mapping the third text based on the mapping relation to obtain a fourth text;
performing accuracy verification on the fourth text;
and if the fourth text passes the accuracy verification, acquiring the specified text corresponding to the specified field from the fourth text.
In the implementation, each module may be implemented as an independent entity, or may be combined arbitrarily, and implemented as the same entity or several entities, and the implementation of each module may be referred to the foregoing method embodiment, which is not described herein again.
As can be seen from the above, the text extraction device of the present embodiment can identify the first text in the first format from the image to be extracted, and convert the first text into the second text based on the conversion prompt word and the corresponding conversion model, so that accurate conversion can be realized through the capability of the large language model. And then, checking the second text by using the check prompt word and the corresponding check model to ensure that a third text with accurate and complete conversion can be obtained. And finally, establishing a mapping relation between the third text and the specified field by using the mapping prompt word and the mapping model so as to extract the specified text corresponding to the specified field from the third text. In the process of extracting the text, the understanding capability of a large language model is fully exerted based on the prompt words, so that the required text is rapidly and accurately extracted from the image.
The embodiment of the invention also provides electronic equipment which can be a terminal, a server and other equipment. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and the like; the server may be a single server, a server cluster composed of a plurality of servers, or the like.
In some embodiments, the text extraction apparatus may also be integrated in a plurality of electronic devices, for example, the text extraction apparatus may be integrated in a plurality of servers, and the text extraction method of the present invention is implemented by the plurality of servers.
In this embodiment, a detailed description will be given taking an example that the electronic device of this embodiment is a server, for example, as shown in fig. 6, which shows a schematic structural diagram of the electronic device according to the embodiment of the present invention, specifically:
The electronic device may include one or more processor cores 310, one or more computer-readable storage media memory 320, a power supply 330, an input module 340, and a communication module 350, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 6 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
The processor 310 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 320, and invoking data stored in the memory 320. In some embodiments, processor 310 may include one or more processing cores; in some embodiments, processor 310 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 310.
The memory 320 may be used to store software programs and modules, and the processor 310 performs various functional applications and data processing by executing the software programs and modules stored in the memory 320. The memory 320 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 320 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 320 may also include a memory controller to provide processor 310 with access to memory 320.
The electronic device also includes a power supply 330 that powers the various components, and in some embodiments, the power supply 330 may be logically connected to the processor 310 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The power supply 330 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may also include an input module 340, which input module 340 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The electronic device may also include a communication module 350, and in some embodiments the communication module 350 may include a wireless module, through which the electronic device may wirelessly transmit over a short distance, thereby providing wireless broadband internet access to the user. For example, the communication module 350 may be used to assist a user in e-mail, browsing web pages, accessing streaming media, and the like.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In this embodiment, the processor 310 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 320 according to the following instructions, and the processor 310 executes the application programs stored in the memory 320, so as to implement the steps in the method of the embodiments of the present invention.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
From the above, the invention can identify the first text in the first format from the image to be extracted, convert the first text into the second text based on the conversion prompt word and the corresponding conversion model, and accurately convert the first text through the capability of the large language model. And then, checking the second text by using the check prompt word and the corresponding check model to ensure that a third text with accurate and complete conversion can be obtained. And finally, establishing a mapping relation between the third text and the specified field by using the mapping prompt word and the mapping model so as to extract the specified text corresponding to the specified field from the third text. In the process of extracting the text, the understanding capability of a large language model is fully exerted based on the prompt words, so that the required text is rapidly and accurately extracted from the image.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform any of the steps of the text extraction method provided by the embodiments of the present invention.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
According to one aspect of the present invention, there is provided a computer program product or computer program comprising computer programs/instructions stored in a computer readable storage medium. The computer program/instructions are read from a computer-readable storage medium by a processor of an electronic device, which executes the computer program/instructions, causing the electronic device to perform the methods provided in the various alternative implementations of the text extraction aspects provided in the above-described embodiments.
The steps in any text extraction method provided by the embodiment of the present invention can be executed by the instructions stored in the storage medium, so that the beneficial effects that any text extraction method provided by the embodiment of the present invention can achieve can be achieved, and detailed descriptions of the previous embodiments are omitted herein.
The text extraction method, the text extraction device and the electronic equipment provided by the embodiment of the invention are described in detail, and specific examples are applied to the description of the principle and the implementation mode of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.