wherein, w_ijWeight, s, representing the edge connected between the node of character i and the node of character j_iNumber of strokes, s, representing character i_jNumber of strokes, d, representing character j_jIndicating the number of nodes adjacent to the node of character j.

In one possible implementation manner, the computer device performs vectorization processing on the character node map based on the character node map and the weights of the edges in the map, and exemplarily, the computer device converts the character node map into a glyph vector based on a node2vec algorithm, and the glyph vector represents the glyph features of the characters. Of course, the glyph feature may also be expressed in other forms, and the embodiment of the present application is not limited thereto. In the strange embodiment, the incidence relation between characters with similar fonts is obtained by constructing the character node graph, the font features corresponding to the characters are generated based on the incidence relation of the characters in the font structure dimension, and the objectivity and the accuracy of the obtained font features can be ensured.

In a second possible implementation manner, the glyph analysis network has an image processing function and can perform image feature extraction, and is exemplarily constructed based on a convolutional neural network and includes a plurality of convolutional layers for performing image feature extraction. The computer device obtains a character image corresponding to each character, wherein the character image is used for indicating a structure of the character, and optionally, the character image is an image obtained by respectively performing screenshot on each character in the first text. And the computer equipment performs image feature extraction on the character image corresponding to each character through the font analyzing network to obtain the font feature of each character. The present embodiment does not limit the structure of the glyph analysis network and the method of extracting image features. In the embodiment of the application, the image features of the character image are directly extracted, and the extracted image features are used as the font features of the characters, so that the font features of the characters can be efficiently and quickly acquired.

It should be noted that the above description of the method for extracting the font features of the characters is only an exemplary description of one possible implementation manner, and the embodiment of the present application does not limit which method is specifically used to obtain the font features. In the embodiment of the application, the character dimension characteristics are integrated in the text error correction process, so that the model can fully learn the association between the correct character pattern and the wrong character pattern, the text error correction process is closer to the process of correcting the wrong character by human, and the accuracy of the text error correction result is further improved.

403. And the computer equipment respectively extracts the pronunciation characteristics of at least two characters in the first text based on the pronunciation of the at least two characters through a voice recognition network in the text error correction model.

In one possible implementation, a method for acquiring pronunciation characteristics of a character by a computer device includes any one of the following implementation manners:

in a possible implementation manner, the computer device obtains the pinyin corresponding to each character, where the pinyin is used to indicate the pronunciation of the character, and the computer device encodes the pinyin corresponding to each character through the voice recognition network to obtain the pronunciation characteristics of each character. Illustratively, the computer device stores a correspondence table between characters and pinyins, determines the pinyin corresponding to each character by querying the correspondence table, maps each pinyin to a pinyin vector, and indicates the pronunciation characteristics of the character by the pinyin vector. Optionally, the computer device maps each alphabetic element in the pinyin to a sub-vector, and then splices the sub-vectors corresponding to each alphabetic element based on the arrangement sequence of each alphabetic element in the pinyin to obtain the pinyin vector, and of course, the computer device may also obtain the pinyin vector of each character in other manners, which is not limited in the embodiment of the present application. In the embodiments of the present application, only the form in which the phonetic features of characters are expressed as vectors is taken as an example, and the phonetic features may be expressed in other forms such as matrices.

In one possible implementation, before encoding the pinyin for each character, the computer device may also perform a voicing process on the pinyin so that characters with similar pronunciation can correspond to the same pronunciation characteristics. Illustratively, the computer device performs data processing on the pinyin corresponding to each character through the voice recognition network based on the reference mapping condition, and then encodes the pinyin after data processing through the voice recognition network to obtain the pronunciation characteristics of each character. The reference mapping condition is set by a developer, and is not limited in the embodiment of the present application, and exemplarily includes mapping a warped-tongue sound in a pinyin to a corresponding flat-tongue sound, mapping a nose sound in the pinyin to a corresponding side sound, mapping a rear nose sound in the pinyin to a front nose sound, and removing at least one of the tones of the pinyin. In the embodiment of the application, the pronunciation characteristics of the characters are extracted based on the pinyin of the characters, and the pinyin with the similar pronunciation is subjected to the nearing treatment, so that the pinyin with the similar pronunciation is mapped to the same pronunciation characteristics, and the model can have good performance in the error correction of the nearing characters.

In a second possible implementation manner, the voice recognition network has a function of processing an audio file, and can perform audio feature extraction. Illustratively, the glyph dissection network is constructed based on a convolutional neural network, including a plurality of convolutional layers for audio feature extraction. The computer device obtains an audio file corresponding to each character, wherein one audio file comprises voice information for reading one character, and optionally, the audio file is pre-recorded and stored in the computer device. And the computer equipment performs audio characteristic extraction on the audio file corresponding to each character through the voice recognition network to obtain the pronunciation characteristic of each character. It should be noted that the structure of the speech recognition network and the method for extracting the audio features are not limited in the embodiments of the present application. In the embodiment of the application, the pronunciation characteristics are extracted based on the audio files corresponding to the characters, and the pronunciation characteristics of each character can be efficiently and quickly acquired.

It should be noted that the above description of the method for extracting the pronunciation features of the characters is only an exemplary description of one possible implementation manner, and the embodiment of the present application does not limit which method is specifically used to obtain the pronunciation features. In the embodiment of the application, the pronunciation characteristics of the characters are fused in the text error correction process, so that the model can fully learn the similarity between the pronunciations of the correct characters and the incorrect characters, and the model can well express in the subsequent error correction of the near-phonetic characters.

404. The computer equipment respectively extracts semantic features of at least two characters in the first text based on the context information of the at least two characters in the first text through a semantic recognition network in a text error correction model.

In a possible implementation manner, the semantic recognition network is a network used for extracting semantic features in an input layer of the BERT model, and the BERT model can map each character in the input first text into a CharEmbedding, which is also a semantic feature of the character. In a possible implementation manner, the semantic recognition Network may be constructed by a convolutional Neural Network (convolutional Neural Network), an RNN (Recurrent Neural Network), and the like, and the structure of the semantic recognition Network is not limited in this embodiment of the application. Illustratively, the computer device performs bidirectional feature extraction on the first text in the semantic recognition network to enable the semantic features of each character to be fused with context information of the first text, that is, the semantic recognition network performs semantic feature extraction on each character sequentially from left to right and from right to left, in the process of feature extraction once, a hidden layer feature corresponding to each character can be obtained, and after feature extraction twice, each character can correspond to two hidden layer features. Taking feature extraction of a first character and a second character which are adjacent in a first text by a computer device as an example, the first character is located on the left side of the second character, in a possible implementation manner, in a feature extraction sequence from left to right, after extracting a hidden layer feature of the first character, the computer device may transmit the hidden layer feature to the second character, and generate a hidden layer feature of the second character by combining the hidden layer feature of the first character, that is, semantic information of a previous character is fused in the hidden layer feature of each character; in the feature extraction sequence from right to left, after obtaining the hidden layer feature of the second character, the computer device may transmit the hidden layer feature to the first character, and generate the hidden layer feature of the first character by combining the hidden layer feature of the second character, that is, the hidden layer feature of each character is fused with semantic information of the next character. And the computer equipment performs feature fusion on the two hidden layer features to obtain the semantic features corresponding to each character. It should be noted that the above description of the semantic feature obtaining method is only an exemplary description of one possible implementation manner, and the embodiment of the present application does not limit which method is specifically used to obtain the semantic feature. In the embodiment of the application, the semantic features containing the text context information are obtained, and the text context is combined to identify and correct the wrong characters, so that the accuracy of text error correction can be improved.

It should be noted that thesteps 402 to 404 are steps of acquiring the font feature, the pronunciation feature, and the semantic feature of each character based on the structure, the pronunciation, and the context information in the first text of each character. In the embodiment of the application, the external information of the characters, such as the character pronunciation characteristics and the character font characteristics, is fully fused, and then the semantic characteristics of the characters in the text are combined, so that the coverage of text error correction can be effectively expanded, and the accuracy of the text error correction result is improved.

405. The computer equipment determines a first weight corresponding to the font characteristic, a second weight corresponding to the pronunciation characteristic and a third weight corresponding to the semantic characteristic of each character.

In the embodiment of the application, the computer device can assign different weights to different features, so that important features, namely features with larger weights, can be focused more in the subsequent feature decoding process. In a possible implementation manner, the process of acquiring, by the computer device, the weight corresponding to each feature includes the following two steps:

step one, for any character, the computer device performs feature fusion on the font feature, the pronunciation feature and the semantic feature of the character through a feature fusion network in a text error correction model, namely a backbone network in a BERT model, so as to obtain an initial fusion feature (BertHidden) corresponding to the character. It should be noted that, the method for extracting the initial fusion feature by the BERT model is not limited in the embodiments of the present application.

And secondly, respectively determining a first weight corresponding to the font feature, a second weight corresponding to the pronunciation feature and a third weight corresponding to the semantic feature of any character by the computer equipment based on the initial fusion feature corresponding to any character. In a possible implementation manner, first, the computer device obtains a text semantic feature corresponding to the first text, that is, the computer device performs overall feature extraction on the first text through an input layer in a text error correction model, that is, an input layer in a BERT model, to obtain a text feature (CLS) of the first text. Then, for any character, the computer equipment performs characteristic fusion on the text characteristic, the initial fusion characteristic and the font characteristic of the character to obtain a first intermediate characteristic; performing feature fusion on the text feature, the initial fusion feature and the pronunciation feature of any character to obtain a second intermediate feature; and performing feature fusion on the text feature, the initial fusion feature and the semantic feature of any character to obtain a third intermediate feature. Finally, the computer device determines the first weight, the second weight, and the third weight based on the first intermediate feature, the second intermediate feature, and the third intermediate feature, respectively. In one possible implementation, the method for obtaining the fusion feature is expressed as the following formula (2) to formula (5):

f_ij＝softmax(ReLU(W₁·[H_cls；H_i；E_ij]+b₁)) (2)

h_ij＝ReLU(W₂·[H_cls；H_i；E_ij]+b₂)⊙f_ij (3)

u_ij＝W₃·[H_i；h_ij]+b₃ (4)

wherein, a_ijRepresenting the weight corresponding to the j item characteristic of the ith character, wherein j belongs to the { font characteristic, pronunciation characteristic and semantic characteristic }; h_clsA text semantic feature representing the first text; h_iInitial fusion feature, h, representing the ith character_ijIntermediate features corresponding to the jth feature representing the character, E_ijJ-th feature, W, representing the ith character₁、W₂、W₃、b₁、b₂And b₃The numerical value of (a) is set by a developer; u. of_ijDenotes u_ijTransposing; f. of_ijRepresenting intermediate fusion features by text semantic features H_clsInitial fusion characteristics H of characters_iAnd j-th feature h of character_ijObtained by fusion.

For example, the computer device may perform dot-product processing on the initial fusion feature and the font feature, the pronunciation feature and the semantic feature of any character to obtain the first right, the second weight and the third weight, which is not limited in the embodiment of the present application. In the embodiment of the application, different weights are assigned to the features of different dimensions based on the attention mechanism, for example, for error characters with similar fonts, the weights of the font features of the characters are respectively greater, and for error characters with similar pronunciation, greater weights are assigned to the pronunciation features of the characters, so that the computer device can pay more attention to the features with greater weights, that is, pay more attention to the features with greater importance in the subsequent text error correction process, thereby improving the accuracy of text error correction.

406. And the computer equipment performs weighted fusion on the font characteristic, the pronunciation characteristic and the semantic characteristic of any character based on the first weight, the second weight and the third weight to obtain the fusion characteristic of any character.

In a possible implementation manner, the computer device may apply the first weight, the second weight, and the third weight to directly weight the font feature, the pronunciation feature, and the semantic feature, or may weight the intermediate features obtained instep 406, that is, perform weighted fusion on the first intermediate feature, the second intermediate feature, and the third intermediate feature based on the first weight, the second weight, and the third weight, respectively, to obtain the fused feature. In the embodiment of the application, the example that the intermediate feature is weighted and fused to obtain the fused feature is taken as an example for explanation, the intermediate feature fuses the whole text semantic feature of the first text, the included information is richer, so that feature fusion is performed based on the intermediate feature, the obtained fused feature can include richer and multidimensional information, and the accuracy of subsequent character prediction is further improved. In one possible implementation, the process of feature fusion by the computer device can be expressed as the following formula (6):

Z＝∑_jα_ij*h_ij (6)

wherein Z represents fusion characteristics, j belongs to { font characteristics, pronunciation characteristics and semantic characteristics }, alpha_ijWeight h corresponding to j-th feature of i-th character_ijJ-th item characteristic pair for representing characterIntermediate characteristics of the response.

The above-mentioned

steps

405 and 406 are steps of performing weighted fusion on the font feature, the pronunciation feature, and the semantic feature of each character to obtain a fusion feature of each character. Fig. 6 is a schematic diagram of a feature fusion method provided in an embodiment of the present application, and the following describes the processes ofstep 405 and step 406 with reference to fig. 6, to take feature fusion for each feature corresponding to one character as an example, a computer device fuses a text semantic feature (CLS) and an initial fusion feature (berthiden) of a first text with a font feature (ShapeEmbedding), a pronunciation feature (pinyinmeembedding), and a semantic feature (CharEmbedding) of the character, respectively, to obtain a firstintermediate feature 601, a secondintermediate feature 602, and a thirdintermediate feature 603, and then performs weighted fusion on the three intermediate features based on the first weight, the second weight, and the third weight, respectively, to obtain a fusion feature.

407. And the computer equipment decodes the obtained at least two fusion characteristics to obtain at least two target characters, and the at least two target characters form a second text.

Wherein the second text is a text corrected for the erroneous characters in the first text.

In one possible implementation, the computer device passes through a text correction model. Namely an output layer in the BERT model, respectively decoding each fusion feature into a classification vector, wherein one element in the classification vector is used for indicating the probability that the fusion feature corresponds to one candidate character, and the computer equipment respectively determines the target character from the candidate character indicated by the element with the largest value in each classification vector. It should be noted that the above description of the target character determination method is only an exemplary description of one possible implementation manner, and the embodiment of the present application does not limit which method is specifically adopted to determine the target character. The computer device sorts the target characters to obtain the second text, where the arrangement order of the target characters is the same as the arrangement order of the corresponding fusion features, and exemplarily, the computer device determines the arrangement order of each fusion feature based on the position of the character corresponding to the fusion feature in the first text.

Fig. 7 is a schematic diagram of a text error correction method according to an embodiment of the present application, and the text error correction process is described below with reference to fig. 7. As shown in fig. 7, in a possible implementation manner, the computer device inputs the first text into a text error correction model, that is, a BERT model, and the BERT model extracts a font feature (typeembedding), a pronunciation feature (PinyinEmbedding), and a semantic feature (CharEmbedding) corresponding to each character in the first text, optionally, the BERT model may further extract a position feature (PositionEmbedding) and a segmentation feature (SegmentEmbedding) of each character in the first text, and the computer device performs feature fusion based on the extracted multidimensional features to obtain an initial fusion feature corresponding to each character and a text semantic feature of the first text. For any character, the computer device performs weighted fusion on the features of each dimension of the character based on an attention mechanism, namely, thestep 405 and thestep 406 are executed to obtain a fusion feature, the fusion feature is mapped into a classification vector (classifier), a target character is determined based on the classification vector, and then a second text is obtained.

Fig. 8 is an interface schematic diagram of an information application provided in an embodiment of the present application, as shown in fig. 8 (a), when a user searches information content, the user may obtain a search result containing a wrongly written or mispronounced word, for example, the search result shown in anarea 801, and in combination with the technical solution provided in the embodiment of the present application, the computer device may perform text error correction on the search result, and replace an incorrect character in the search result, as shown in fig. 8 (b), or the computer device may filter content including the incorrect character directly, or, for search content including the incorrect character, the computer device adjusts a display position of the content in a search result interface to a later position. By combining the text error correction method provided by the embodiment of the application with the information application program, the quality of the information content browsed by the user can be effectively improved, malicious content in the application program can be accurately filtered, and the user experience of the user when the user uses the application program is improved.

The text error correction model in the above embodiment is a pre-trained model stored in a computer device, and the text error correction model is a model trained by the computer device or a model trained by other devices. Fig. 9 is a flowchart of a training method of a text correction model provided in an embodiment of the present application, and referring to fig. 9, in a possible implementation manner, the training method of the text correction model includes the following steps:

901. the computer device obtains a text error correction model to be trained and at least two training samples.

In the embodiment of the present application, the text error correction model to be trained is a BERT model that has been pre-trained. A training sample comprises a first training text and a second training text, wherein the first training text is a text comprising wrong characters, and the second training text is a text which corrects a wrong system in the first training text. In one possible implementation, the computer device trains the text error correction model based on an equal-length sequence, that is, the number of characters included in each text of the input text error correction model is the same, and for example, before each training text is input into the text error correction model, the computer device may adjust the length of each training text, for example, adding a place-occupying character at the end of the training text, so that the length of each training text is the same.

902. And the computer equipment inputs the at least two training samples into the text error correction model to obtain the error between the output result and the correct result of the text error correction model.

In this embodiment of the present application, the computer device performs error correction on the first training text in the training sample through the text error correction model to obtain an output result of the text error correction model, and the process of performing error correction on the first training file is the same as the foregoingsteps 401 to 407. The computer device obtains an error between the output result of the model and the correct result, i.e. the corresponding second training text, based on the loss function. The loss function may be a cross entropy loss function, which is not limited in the embodiments of the present application, and the method for obtaining the error between the model output result and the correct result is not limited in the embodiments of the present application.

903. And the computer equipment adjusts each parameter in the text error correction model based on the error until a model convergence condition is met, so that the trained text error correction model is obtained.

In one possible implementation, the computer device compares the obtained error with an error threshold, if the error is greater than the error threshold, the computer device propagates the error back to the text error correction model, and updates various parameters in the text error correction model in combination with an Adaptive moment estimation (Adam) optimization algorithm of SWA (Stochastic Weight Averaging) technology. If the error obtained by the computer device is smaller than the error threshold, it is determined that the output result of the model is correct, and the computer device continues to read the next set of training samples, and performs thestep 902. The error threshold is set by a developer, and the embodiment of the present application is not limited thereto.

In a possible implementation manner, if the number of the correct output results obtained by the computer device reaches the reference number, or all training samples are completely read, it is determined that the text error correction model meets the model convergence condition, and the text error correction model after training is obtained. The reference number is set by a developer, and is not limited in the embodiments of the present application. The model convergence condition may be set to other contents, and the embodiment of the present application is not limited to this.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 10 is a schematic structural diagram of a text error correction apparatus according to an embodiment of the present application, and referring to fig. 10, the apparatus includes:

atext obtaining module 1001, configured to obtain a first text to be corrected, where the first text includes at least two characters;

afeature obtaining module 1002, configured to obtain a font feature, a pronunciation feature, and a semantic feature of each character based on a structure and a pronunciation of each character and context information in the first text, respectively;

afeature fusion module 1003, configured to perform weighted fusion on the font feature, the pronunciation feature, and the semantic feature of each character respectively to obtain a fusion feature of each character;

thefeature decoding module 1004 is configured to decode the obtained at least two fusion features to obtain at least two target characters, and form a second text from the at least two target characters, where the second text is a text obtained by correcting an error character in the first text.

In one possible implementation, thefeature obtaining module 1002 includes:

In one possible implementation, the second obtaining sub-module includes:

In one possible implementation manner, the pinyin coding unit is configured to:

In one possible implementation, thefeature fusion module 1003 includes:

acquiring text semantic features corresponding to the first text;

In one possible implementation, the second fusion submodule is configured to:

In one possible implementation, thefeature decoding module 1004 is configured to:

According to the device provided by the embodiment of the application, when the first text is corrected, the font characteristic and the pronunciation characteristic of each character in the first text and the context semantic characteristic of each character in the first text are fully considered, the characteristics of the three dimensions are fused to predict the correct character, any character appearing in the first text can be identified and corrected, the coverage range of text correction can be effectively expanded, and the accuracy of text correction can be improved due to multi-dimensional characteristic fusion.

It should be noted that: in the text error correction device provided in the above embodiment, only the division of the above functional modules is used for illustration when text error correction is performed, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the text error correction device provided by the above embodiment and the text error correction method embodiment belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment and will not be described herein again.

The computer device provided in the foregoing technical solution may be implemented as a terminal or a server, for example, fig. 11 is a schematic structural diagram of a terminal provided in this embodiment of the present application. The terminal 1100 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1100 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, terminal 1100 includes: one ormore processors 1101 and one ormore memories 1102.

Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. Theprocessor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Theprocessor 1101 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, theprocessor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, theprocessor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory.Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in thememory 1102 is used to store at least one computer program for execution by theprocessor 1101 to implement the text correction method provided by the method embodiments herein.

In some embodiments, the terminal 1100 may further include: aperipheral interface 1103 and at least one peripheral. Theprocessor 1101,memory 1102 andperipheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to theperipheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one ofradio frequency circuitry 1104,display screen 1105,camera assembly 1106,audio circuitry 1107,positioning assembly 1108, andpower supply 1109.

Theperipheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to theprocessor 1101 and thememory 1102. In some embodiments, theprocessor 1101,memory 1102, andperipheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of theprocessor 1101, thememory 1102 and theperipheral device interface 1103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.

TheRadio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. Theradio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. Theradio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, theradio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. Theradio frequency circuit 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, therf circuit 1104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

Thedisplay screen 1105 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When thedisplay screen 1105 is a touch display screen, thedisplay screen 1105 also has the ability to capture touch signals on or over the surface of thedisplay screen 1105. The touch signal may be input to theprocessor 1101 as a control signal for processing. At this point, thedisplay screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments,display 1105 may be one, providing the front panel of terminal 1100; in other embodiments, thedisplay screens 1105 can be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in some embodiments,display 1105 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1100. Even further, thedisplay screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. TheDisplay screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

Camera assembly 1106 is used to capture images or video. Optionally,camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments,camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Theaudio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to theprocessor 1101 for processing or inputting the electric signals to theradio frequency circuit 1104 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1100. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from theprocessor 1101 or theradio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, theaudio circuitry 1107 may also include a headphone jack.

Positioning component 1108 is used to locate the current geographic position of terminal 1100 for purposes of navigation or LBS (Location Based Service). ThePositioning component 1108 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union galileo System.

Power supply 1109 is configured to provide power to various components within terminal 1100. Thepower supply 1109 may be alternating current, direct current, disposable or rechargeable. When thepower supply 1109 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1100 can also include one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

Acceleration sensor 1111 may detect acceleration levels in three coordinate axes of a coordinate system established with terminal 1100. For example, the acceleration sensor 1111 may be configured to detect components of the gravitational acceleration in three coordinate axes. Theprocessor 1101 may control thedisplay screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 1112 may cooperate with the acceleration sensor 1111 to acquire a 3D motion of the user with respect to theterminal 1100. From the data collected by gyroscope sensor 1112,processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1113 may be disposed on a side bezel of terminal 1100 and/orunderlying display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the terminal 1100, the holding signal of the terminal 1100 from the user can be detected, and theprocessor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of thedisplay screen 1105, theprocessor 1101 controls the operability control on the UI interface according to the pressure operation of the user on thedisplay screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and theprocessor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by theprocessor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 1114 may be disposed on the front, back, or side of terminal 1100. When a physical button or vendor Logo is provided on the terminal 1100, the fingerprint sensor 1114 may be integrated with the physical button or vendor Logo.

Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, theprocessor 1101 may control the display brightness of thedisplay screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of thedisplay screen 1105 is increased; when the ambient light intensity is low, the display brightness of thedisplay screen 1105 is reduced. In another embodiment,processor 1101 may also dynamically adjust the shooting parameters ofcamera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.

Proximity sensor 1116, also referred to as a distance sensor, is typically disposed on a front panel of terminal 1100. Proximity sensor 1116 is used to capture the distance between the user and the front face of terminal 1100. In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 is gradually decreased, thedisplay screen 1105 is controlled by theprocessor 1101 to switch from a bright screen state to a dark screen state; when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 becomes progressively larger, thedisplay screen 1105 is controlled by theprocessor 1101 to switch from a breath-screen state to a light-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of terminal 1100, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 12 is a schematic structural diagram of aserver 1200 according to an embodiment of the present application, where theserver 1200 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1201 and one ormore memories 1202, where the one ormore memories 1202 store at least one computer program, and the at least one computer program is loaded and executed by the one ormore processors 1201 to implement the methods provided by the foregoing method embodiments. Certainly, theserver 1200 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and theserver 1200 may further include other components for implementing the functions of the device, which is not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including at least one computer program, executable by a processor, is also provided to perform the text error correction method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising at least one computer program, the at least one computer program being stored in a computer readable storage medium. The at least one computer program is read by a processor of the computer device from the computer-readable storage medium, and the at least one computer program is executed by the processor to cause the computer device to implement the operations performed by the text error correction method.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for correcting text, the method comprising:

2. The method according to claim 1, wherein the obtaining the font feature, the pronunciation feature and the semantic feature of each character based on the structure, the pronunciation and the context information in the first text of each character comprises:

respectively extracting the font features of the at least two characters based on the structures of the at least two characters through a font analyzing network in a text error correction model;

respectively extracting the pronunciation characteristics of the at least two characters based on the pronunciation of the at least two characters through a voice recognition network in the text error correction model;

and respectively extracting semantic features of the at least two characters based on the context information of the at least two characters in the first text through a semantic recognition network in the text error correction model.

3. The method of claim 2, wherein the extracting the glyph features of the at least two characters based on the structures of the at least two characters through a glyph analysis network in a text correction model comprises:

generating a character node map corresponding to any character based on the structure of the any character and the structures of at least two reference characters through the glyph analysis network, wherein the character node map is used for indicating the incidence relation of the any character and the at least two reference characters in the structure dimension;

4. The method of claim 2, wherein the extracting the glyph features of the at least two characters based on the structures of the at least two characters through a glyph analysis network in a text correction model comprises:

and extracting image characteristics of the character image corresponding to each character through the font analyzing network to obtain the font characteristics of each character.

5. The method according to claim 2, wherein the extracting, through a speech recognition network in the text correction model, the pronunciation characteristics of the at least two characters based on the pronunciation of the at least two characters respectively comprises:

obtaining pinyin corresponding to each character, wherein the pinyin is used for indicating the pronunciation of the character;

and coding the pinyin corresponding to each character through the voice recognition network to obtain the pronunciation characteristics of each character.

6. The method of claim 5, wherein the encoding the pinyin corresponding to each character through the speech recognition network to obtain the pronunciation characteristics of each character comprises:

performing data processing on the pinyin corresponding to each character through the voice recognition network based on reference mapping conditions, wherein the reference mapping conditions comprise at least one of mapping a warped-tongue sound in the pinyin to a corresponding flat-tongue sound, mapping a nose sound in the pinyin to a corresponding side sound, mapping a rear nose sound in the pinyin to a front nose sound and removing the pinyin tone;

7. The method of claim 2, wherein the encoding the pinyin corresponding to each character through the voice recognition network to obtain the pronunciation characteristics of each character comprises:

8. The method according to claim 1, wherein the weighted fusion of the font feature, the pronunciation feature and the semantic feature of each character to obtain the fused feature of each character comprises:

for any character, performing feature fusion on the font feature, the pronunciation feature and the semantic feature of the character through a feature fusion network in the text error correction model to obtain an initial fusion feature corresponding to the character;

respectively determining a first weight corresponding to the font feature, a second weight corresponding to the pronunciation feature and a third weight corresponding to the semantic feature of any character based on the initial fusion feature corresponding to any character;

and performing weighted fusion on the font characteristic, the pronunciation characteristic and the semantic characteristic of any character based on the first weight, the second weight and the third weight to obtain the fusion characteristic of any character.

9. The method according to claim 8, wherein the determining a first weight corresponding to a glyph feature, a second weight corresponding to a pronunciation feature, and a third weight corresponding to a semantic feature of the any character based on the initial fusion feature corresponding to the any character comprises:

acquiring text semantic features corresponding to the first text;

10. The method according to claim 9, wherein the performing weighted fusion on the font feature, the pronunciation feature and the semantic feature of any character based on the first weight, the second weight and the third weight to obtain a fused feature of any character comprises:

and performing weighted fusion on the first intermediate feature, the second intermediate feature and the third intermediate feature respectively based on the first weight, the second weight and the third weight to obtain the fused feature.

11. The method according to any one of claims 1-10, wherein said decoding the obtained at least two fused features to obtain at least two target characters comprises:

decoding each fused feature into a classification vector respectively, wherein one element in the classification vector is used for indicating the probability that the fused feature corresponds to one candidate character;

and respectively determining the candidate character indicated by the element with the largest numerical value in each classification vector to obtain the target character.

12. The method of claim 1, wherein after obtaining the first text to be corrected, the method further comprises at least one of:

unifying the at least two characters in the first text into a reference font;

and removing foreign characters in the first text.

13. A text correction apparatus, characterized in that the apparatus comprises:

a feature obtaining module, configured to obtain a font feature, a pronunciation feature, and a semantic feature of each character based on a structure and a pronunciation of each character and context information in the first text, respectively;

the feature fusion module is used for respectively performing weighted fusion on the font feature, the pronunciation feature and the semantic feature of each character to obtain the fusion feature of each character;

14. A computer device comprising one or more processors and one or more memories having stored therein at least one computer program, the at least one computer program being loaded and executed by the one or more processors to perform operations performed by the text correction method of any one of claims 1 to 12.

15. A computer-readable storage medium, having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to perform the operations performed by the text correction method of any one of claims 1 to 12.