Movatterモバイル変換


[0]ホーム

URL:


CN114118064B - Display device, text error correction method and server - Google Patents

Display device, text error correction method and server

Info

Publication number
CN114118064B
CN114118064BCN202010879686.1ACN202010879686ACN114118064BCN 114118064 BCN114118064 BCN 114118064BCN 202010879686 ACN202010879686 ACN 202010879686ACN 114118064 BCN114118064 BCN 114118064B
Authority
CN
China
Prior art keywords
text
corrected
characters
similar
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010879686.1A
Other languages
Chinese (zh)
Other versions
CN114118064A (en
Inventor
王敏
修媛媛
杨善松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co LtdfiledCriticalHisense Visual Technology Co Ltd
Priority to CN202010879686.1ApriorityCriticalpatent/CN114118064B/en
Publication of CN114118064ApublicationCriticalpatent/CN114118064A/en
Application grantedgrantedCritical
Publication of CN114118064BpublicationCriticalpatent/CN114118064B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请实施例提供了一种显示设备、文本纠错方法及服务器,显示设备包括显示器和控制器,控制器被配置为:响应于接收到用户输入的语音命令,对语音命令进行语音转换,得到待纠错文本;控制显示器显示待纠错文本;基于音形相近混淆集和图注意力机制对待纠错文本进行纠错,得到初始纠错文本,对待纠错文本和初始纠错文本进行候选召回,根据召回文本的排序结果得到最终纠错文本;控制显示器将待纠错文本刷新为最终纠错文本。本申请实施例根据待纠错文本对应的混淆集生成发音相似知识图谱和形状相近知识图谱,将汉字的拼音及字形相关知识融入图神经网络,抽取相近字符间的深层语义信息,可有效利用音形相近的知识,提高检错纠错的正确率及召回率。

The embodiment of the present application provides a display device, a text correction method and a server, wherein the display device includes a display and a controller, wherein the controller is configured to: in response to receiving a voice command input by a user, perform voice conversion on the voice command to obtain a text to be corrected; control the display to display the text to be corrected; correct the text to be corrected based on a similar sound-shape confusion set and a graph attention mechanism to obtain an initial correction text, perform candidate recall on the text to be corrected and the initial correction text, and obtain a final correction text based on the sorting results of the recalled texts; and control the display to refresh the text to be corrected to the final correction text. The embodiment of the present application generates a pronunciation-similar knowledge graph and a shape-similar knowledge graph based on the confusion set corresponding to the text to be corrected, integrates the pinyin and glyph-related knowledge of Chinese characters into a graph neural network, extracts deep semantic information between similar characters, and can effectively utilize the knowledge of similar sound and shape to improve the accuracy and recall rate of error detection and correction.

Description

Display device, text error correction method and server
Technical Field
The present application relates to the field of display devices, and in particular, to a display device, a text error correction method, and a server.
Background
With the development of computers, big data and machine learning, the spelling error correction technology has been widely used in many fields such as Chinese and English input method, document editing tool, searching tool, OCR and speech recognition. The spelling error correction technology is firstly put forward in English which is the most global user, and through decades of development, technologies based on rules, statistics and characteristics are successively presented, so that the accuracy is also considerable. Compared with the prior art, the Chinese correction is late in starting, chinese is more complex than English, and students have less investment in the research of Chinese correction, so that the traditional Chinese correction has lower performance and accuracy, and fewer mature and available tools.
The accuracy of Chinese input data is a basic premise of common tasks of natural language processing, and is also a key for improving the upper application performance. In the related art, the error detection technology based on LSTM+CRF is difficult to fall to the ground generally because of being limited by relying on a large number of marked samples, and the error detection technology based on N-gram also has low algorithm performance because of the hard discriminant rule, so that the error detection efficiency is low.
Disclosure of Invention
In order to solve the technical problems, the application provides display equipment, a text error correction method and a server.
In a first aspect, the present application provides a display device comprising:
a display;
A controller coupled to the display, the controller configured to:
Responding to a voice command input by a user, and performing voice conversion on the voice command to obtain a text to be corrected;
Controlling a display to display the text to be corrected;
correcting the text to be corrected based on the similar confusion set of the sound and the shape and the graph annotation meaning mechanism to obtain an initial correction text;
Candidate recalls are carried out on the text to be corrected and the initial correction text, and a final correction text is obtained according to the sequencing result of the recalled text;
And controlling a display to refresh the text to be corrected to a final corrected text.
In some embodiments, the correcting the text to be corrected based on the voice-form similar confusion set and the drawing meaning mechanism includes:
Extracting features of the text to be corrected to obtain an initial characterization matrix;
Creating an adjacent matrix of each character in the text to be corrected according to the adjacent confusion set of the voice and the figure;
inputting the initial characterization matrix and the adjacent matrix into a multi-layer graph convolutional neural network to obtain a next-layer characterization matrix;
Obtaining a last layer of characterization matrix of the multi-layer graph convolution neural network according to a graph attention mechanism;
characters are generated through the full connection layer and the probability normalization function.
In some embodiments, the creating the adjacency matrix for each character in the text to be corrected according to the similar confusion set of the sound and the shape includes:
acquiring pronunciation similar characters and shape similar characters of each character in a pronunciation similar confusion set in the text to be corrected;
the characters in the text to be corrected, the similar pronunciation characters and the characters in the word stock are taken as nodes, the relation among the characters is taken as an edge, and a similar pronunciation adjacency matrix is established;
and taking the characters in the text to be corrected, the characters with similar shapes and the characters in the word stock as nodes, taking the relation among the characters as edges, and establishing a shape similar adjacency matrix.
In some embodiments, inputting the initial characterization matrix and the adjacency matrix into a multi-layer graph convolutional neural network to obtain a next-layer characterization matrix, including:
adding the adjacent matrix and the identity matrix to obtain an adjacent estimation matrix;
Calculating a diagonal matrix corresponding to the adjacent estimation matrix to obtain a diagonal estimation matrix;
And obtaining a next layer of characterization matrix according to the adjacent estimation matrix, the diagonal estimation matrix and the initial characterization matrix.
In some embodiments, the deriving the last layer of the characterization matrix of the multi-layer graph roll-up neural network according to the graph attention mechanism includes:
Calculating an attention characterization matrix of knowledge fusion by adopting an attention mechanism;
and obtaining a final layer of characterization matrix according to the sum of the attention characterization matrix and each layer of characterization matrix.
In a second aspect, an embodiment of the present application provides a text error correction method, for a display device, including:
Correcting the text to be corrected based on the similar confusion set of the sound and the figure and the meaning mechanism of the drawing to obtain an initial correction text,
Candidate recall is carried out on the text to be corrected and the initial correction text, and recall text is obtained;
and sequencing the recall texts, and obtaining a final correction text corresponding to the text to be corrected according to the sequencing result.
In a third aspect, embodiments of the present application provide a server configured to:
receiving text to be corrected from a display device;
Correcting the text to be corrected based on the similar confusion set of the voice and the graph annotation meaning mechanism to obtain an initial correction text,
Candidate recalls are carried out on the text to be corrected and the initial correction text, and a final correction text is obtained according to the sequencing result of the recalled text;
and sending the final error correction text to the display device.
The display device, the text error correction method and the server provided by the application have the beneficial effects that:
According to the embodiment of the application, the pronunciation similar knowledge graph and the shape similar knowledge graph are generated according to the confusion set corresponding to the text to be corrected, the pinyin and the font related knowledge of the Chinese characters are merged into the graph neural network, the deep semantic information among the similar characters is extracted, the knowledge of similar voice and shape can be effectively utilized, and the accuracy rate and recall rate of error detection and correction are improved.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
A schematic diagram of an operational scenario between a display device and a control apparatus according to some embodiments is schematically shown in fig. 1;
a hardware configuration block diagram of a display device 200 according to some embodiments is exemplarily shown in fig. 2;
A hardware configuration block diagram of the control apparatus 100 according to some embodiments is exemplarily shown in fig. 3;
A schematic diagram of the software configuration in a display device 200 according to some embodiments is exemplarily shown in fig. 4;
an icon control interface display schematic of an application in a display device 200 according to some embodiments is illustrated in fig. 5;
an overall flow diagram of text correction according to some embodiments is illustrated in fig. 6;
A flow diagram of a text error correction method according to some embodiments is shown schematically in fig. 7;
A schematic diagram of an end-to-end error detection and correction model is illustrated in fig. 8, in accordance with some embodiments;
A flow diagram of a method of parsing text to be error corrected according to some embodiments is shown schematically in fig. 9;
a flow diagram of a method of creating an adjacency matrix according to some embodiments is shown schematically in fig. 10;
a schematic of a voice interaction interface according to some embodiments is shown schematically in fig. 11;
a voice interaction interface schematic diagram according to some embodiments is exemplarily shown in fig. 12;
A schematic of a voice interaction interface according to some embodiments is schematically shown in fig. 13.
Detailed Description
For the purposes of making the objects, embodiments and advantages of the present application more apparent, an exemplary embodiment of the present application will be described more fully hereinafter with reference to the accompanying drawings in which exemplary embodiments of the application are shown, it being understood that the exemplary embodiments described are merely some, but not all, of the examples of the application.
Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the appended claims. Furthermore, while the present disclosure has been described in terms of an exemplary embodiment or embodiments, it should be understood that each aspect of the disclosure can be practiced separately from the other aspects.
It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated (Unless otherwise indicated). It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" as used in this disclosure refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.
The term "remote control" as used herein refers to a component of an electronic device (such as a display device as disclosed herein) that can be controlled wirelessly, typically over a relatively short distance. Typically, the electronic device is connected to the electronic device using infrared and/or Radio Frequency (RF) signals and/or bluetooth, and may also include functional modules such as WiFi, wireless USB, bluetooth, motion sensors, etc. For example, a hand-held touch remote control replaces most of the physical built-in hard keys in a typical remote control device with a touch screen user interface.
The term "gesture" as used herein refers to a user behavior by which a user expresses an intended idea, action, purpose, and/or result through a change in hand shape or movement of a hand, etc.
A schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment is exemplarily shown in fig. 1. As shown in fig. 1, a user may operate the display apparatus 200 through the mobile terminal 300 and the control device 100.
In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, etc., and the display device 200 is controlled by a wireless or other wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc. For example, the user can input corresponding control commands through volume up and down keys, channel control keys, up/down/left/right movement keys, voice input keys, menu keys, on/off keys and the like on the remote controller, thereby realizing the function of controlling the display device 200.
In some embodiments, mobile terminals, tablet computers, notebook computers, and other smart devices may also be used to control the display device 200. For example, the display device 200 is controlled using an application running on a smart device. The application program, by configuration, can provide various controls to the user in an intuitive User Interface (UI) on a screen associated with the smart device.
In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. For example, a control instruction protocol can be established between the mobile terminal 300 and the display device 200, the remote control keyboard is synchronized to the mobile terminal 300, and the functions of controlling the display device 200 are realized by controlling the user interface on the mobile terminal 300. The audio/video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.
As also shown in fig. 1, the display device 200 is also in data communication with the server 400 via a variety of communication means. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200. By way of example, display device 200 receives software program updates, or accesses a remotely stored digital media library by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers. Other web service content such as video on demand and advertising services are provided through the server 400.
The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limited, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.
The display apparatus 200 may additionally provide a smart network television function of a computer support function, including, but not limited to, a network television, a smart television, an Internet Protocol Television (IPTV), etc., in addition to the broadcast receiving television function.
A hardware configuration block diagram of the display device 200 according to an exemplary embodiment is illustrated in fig. 2.
In some embodiments, at least one of the controller 250, the modem 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the display apparatus 200.
In some embodiments, the display 275 is configured to receive image signals from the first processor output, and to display video content and images and components of the menu manipulation interface.
In some embodiments, display 275 includes a display screen assembly for presenting pictures, and a drive assembly for driving the display of images.
In some embodiments, the video content is displayed from broadcast television content, or alternatively, from various broadcast signals that may be received via a wired or wireless communication protocol. Or may display various image content received from a network communication protocol from a network server side.
In some embodiments, the display 275 is used to present a user-manipulated UI interface generated in the display device 200 and used to control the display device 200.
In some embodiments, depending on the type of display 275, a drive assembly for driving the display is also included.
In some embodiments, display 275 is a projection display and may further include a projection device and a projection screen.
In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example, the communicator may include at least one of a Wifi chip, a bluetooth communication protocol chip, a wired ethernet communication protocol chip, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver.
In some embodiments, the display apparatus 200 may establish control signal and data signal transmission and reception between the communicator 220 and the external control device 100 or the content providing apparatus.
In some embodiments, the user interface 265 may be used to receive infrared control signals from the control device 100 (e.g., an infrared remote control, etc.).
In some embodiments, the detector 230 is a signal that the display device 200 uses to capture or interact with the external environment.
In some embodiments, the detector 230 includes an optical receiver, a sensor for capturing the intensity of ambient light, a parameter change may be adaptively displayed by capturing ambient light, etc.
In some embodiments, the detector 230 may further include an image collector, such as a camera, a video camera, etc., which may be used to collect external environmental scenes, collect attributes of a user or interact with a user, adaptively change display parameters, and recognize a user gesture to realize an interaction function with the user.
In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.
In some embodiments, the display device 200 may adaptively adjust the display color temperature of the image. The display device 200 may be adjusted to display a colder color temperature shade of the image, such as when the temperature is higher, or the display device 200 may be adjusted to display a warmer color shade of the image when the temperature is lower.
In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, that may be used to receive the user's sound. Illustratively, a voice signal including a control instruction for a user to control the display apparatus 200, or an acquisition environmental sound is used to recognize an environmental scene type so that the display apparatus 200 can adapt to environmental noise.
In some embodiments, as shown in fig. 2, the input/output interface 255 is configured to enable data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, command instruction data, or the like.
In some embodiments, the external device interface 240 may include, but is not limited to, any one or more of an HDMI interface, an analog or data high definition component input interface, a composite video input interface, a USB input interface, an RGB port, and the like, which may be high definition multimedia interfaces. The plurality of interfaces may form a composite input/output interface.
In some embodiments, as shown in fig. 2, the modem 210 is configured to receive the broadcast television signal by a wired or wireless receiving manner, and may perform modulation and demodulation processes such as amplification, mixing, and resonance, and demodulate the audio/video signal from the plurality of wireless or wired broadcast television signals, where the audio/video signal may include a television audio/video signal carried in a television channel frequency selected by a user, and an EPG data signal.
In some embodiments, the frequency point demodulated by the modem 210 is controlled by the controller 250, and the controller 250 may send a control signal according to the user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.
In some embodiments, the broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to a broadcasting system of the television signal. Or may be differentiated into digital modulation signals, analog modulation signals, etc., depending on the type of modulation. Or it may be classified into digital signals, analog signals, etc. according to the kind of signals.
In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like. In this way, the set-top box outputs the television audio and video signals modulated and demodulated by the received broadcast television signals to the main body equipment, and the main body equipment receives the audio and video signals through the first input/output interface.
In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored on the memory. The controller 250 may control the overall operation of the display apparatus 200. For example, in response to receiving a user command to select a UI object to be displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user command.
In some embodiments, the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as an operation of displaying a link to a hyperlink page, a document, an image, or the like, or an operation of executing a program corresponding to the icon. The user command for selecting the UI object may be an input command through various input means (e.g., mouse, keyboard, touch pad, etc.) connected to the display device 200 or a voice command corresponding to a voice uttered by the user.
As shown in fig. 2, the controller 250 includes at least one of a random access Memory 251 (Random Access Memory, RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a graphics processor (Graphics Processing Unit, GPU), a central processing unit 254 (Central Processing Unit, CPU), a communication interface (Communication Interface), and a communication Bus 256 (Bus), which connects the respective components.
In some embodiments, RAM 251 is used to store temporary data for the operating system or other on-the-fly programs.
In some embodiments, ROM 252 is used to store instructions for various system boots.
In some embodiments, ROM 252 is used to store a basic input output system, referred to as a basic input output system (Basic Input Output System, BIOS). The system comprises a drive program and a boot operating system, wherein the drive program is used for completing power-on self-checking of the system, initialization of each functional module in the system and basic input/output of the system.
In some embodiments, upon receipt of the power-on signal, the display device 200 power starts up, the CPU runs system boot instructions in the ROM 252, copies temporary data of the operating system stored in memory into the RAM 251, in order to start up or run the operating system. When the operating system is started, the CPU copies temporary data of various applications in the memory to the RAM 251, and then, facilitates starting or running of the various applications.
In some embodiments, CPU processor 254 is used to execute operating system and application program instructions stored in memory. And executing various application programs, data and contents according to various interactive instructions received from the outside, so as to finally display and play various audio and video contents.
In some exemplary embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include one main processor and one or more sub-processors. A main processor for performing some operations of the display apparatus 200 in the pre-power-up mode and/or displaying a picture in the normal mode. One or more sub-processors for one operation in a standby mode or the like.
In some embodiments, the graphics processor 253 is used to generate various graphical objects, such as icons, operational menus, and user input instruction display graphics. The device comprises an arithmetic unit, wherein the arithmetic unit is used for receiving various interaction instructions input by a user to carry out operation and displaying various objects according to display attributes. And a renderer for rendering the various objects obtained by the arithmetic unit, wherein the rendered objects are used for being displayed on a display.
In some embodiments, video processor 270 is configured to receive external video signals, perform video processing such as decompression, decoding, scaling, noise reduction, frame conversion, resolution conversion, image composition, etc., according to standard codec protocols for input signals, and may result in signals that are displayed or played on directly displayable device 200.
In some embodiments, video processor 270 includes a demultiplexing module, a video decoding module, an image compositing module, a frame conversion module, a display formatting module, and the like.
The demultiplexing module is used for demultiplexing the input audio/video data stream, such as the input MPEG-2, and demultiplexes the input audio/video data stream into video signals, audio signals and the like.
And the video decoding module is used for processing the demultiplexed video signals, including decoding, scaling and the like.
And an image synthesis module, such as an image synthesizer, for performing superposition mixing processing on the graphic generator and the video image after the scaling processing according to the GUI signal input by the user or generated by the graphic generator, so as to generate an image signal for display.
The frame number conversion module is used for converting the number of input video frames, such as converting 60Hz frame number into 120Hz frame number or 240Hz frame number, and the common format is realized by adopting a frame inserting mode.
The display format module is used for converting the received frame number into a video output signal, and changing the signal to be in accordance with the display format, such as outputting RGB data signals.
In some embodiments, the graphics processor 253 may be integrated with the video processor, or may be separately configured, where the integrated configuration may perform processing of graphics signals output to the display, and the separate configuration may perform different functions, such as gpu+frc (FRAME RATE Conversion) architecture, respectively.
In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the audio signal according to a standard codec protocol of an input signal, and perform noise reduction, digital-to-analog conversion, and amplification processing, so as to obtain a sound signal that can be played in a speaker.
In some embodiments, video processor 270 may include one or more chips. The audio processor may also comprise one or more chips.
In some embodiments, video processor 270 and audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.
In some embodiments, the audio output, under the control of the controller 250, receives the sound signal output by the audio processor 280, such as the speaker 286, and may output to an external sound output terminal of a sound generating device of an external device, such as an external sound interface or an earphone interface, in addition to a speaker carried by the display device 200 itself, and may further include a near field communication module in the communication interface, such as a bluetooth module for performing sound output of a bluetooth speaker.
The power supply 290 supplies power input from an external power source to the display device 200 under the control of the controller 250. The power supply 290 may include a built-in power circuit installed inside the display device 200, or may be an external power source installed in the display device 200, and a power interface for providing an external power source in the display device 200.
The user interface 265 is used to receive an input signal from a user and then transmit the received user input signal to the controller 250. The user input signal may be a remote control signal received through an infrared receiver, and various user control signals may be received through a network communication module.
In some embodiments, a user inputs a user command through the control apparatus 100 or the mobile terminal 300, the user input interface is then responsive to the user input through the controller 250, and the display device 200 is then responsive to the user input.
In some embodiments, a user may input a user command through a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input command through the Graphical User Interface (GUI). Or the user may input the user command by inputting a specific sound or gesture, the user input interface recognizes the sound or gesture through the sensor, and receives the user input command.
In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of a user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a graphically displayed user interface that is related to computer operations. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.
The memory 260 includes memory storing various software modules for driving the display device 200. Such as various software modules stored in the first memory, including at least one of a base module, a detection module, a communication module, a display control module, a browser module, various service modules, and the like.
The base module is a bottom software module for signal communication between the various hardware in the display device 200 and for sending processing and control signals to the upper modules. The detection module is used for collecting various information from various sensors or user input interfaces and carrying out digital-to-analog conversion and analysis management.
For example, the voice recognition module includes a voice analysis module and a voice instruction database module. The display control module is used for controlling the display to display the image content, and can be used for playing the multimedia image content, the UI interface and other information. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing data communication between the browsing servers. And the service module is used for providing various services and various application programs. Meanwhile, the memory 260 also stores received external data and user data, images of various items in various user interfaces, visual effect maps of focus objects, and the like.
Fig. 3 exemplarily shows a block diagram of a configuration of the control apparatus 100 in accordance with an exemplary embodiment. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.
The control apparatus 100 is configured to control the display device 200, and to receive an input operation instruction of a user, and to convert the operation instruction into an instruction recognizable and responsive to the display device 200, and to function as an interaction between the user and the display device 200. For example, the user responds to the channel addition and subtraction operation by operating the channel addition and subtraction key on the control apparatus 100.
In some embodiments, the control apparatus 100 may be a smart device. For example, the control apparatus 100 may install various applications for controlling the display device 200 according to the user's needs.
In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similarly to the control apparatus 100 after installing an application for manipulating the display device 200. For example, a user may implement the functions of the physical keys of the control apparatus 100 by installing various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.
The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used to control the operation and operation of the control device 100, as well as the communication collaboration among the internal components and the external and internal data processing functions.
The communication interface 130 enables communication of control signals and data signals with the display device 200 under the control of the controller 110. Such as by sending received user input signals to the display device 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.
A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touchpad 142, a sensor 143, keys 144, and other input interfaces. For example, the user can realize the user instruction input function through actions such as voice, touch, gesture, pressing and the like, and the input interface converts the received analog signals into digital signals and converts the digital signals into corresponding instruction signals to be sent to the display device 200.
The output interface includes an interface that transmits the received user instruction to the display device 200. In some embodiments, an infrared interface may be used, as well as a radio frequency interface. For example, when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. For example, when the RF signal interface is used, the user input instruction needs to be converted into a digital signal, and then the digital signal is modulated according to the modulation protocol of the RF control signal and then transmitted to the display device 200 through the RF transmission terminal.
In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is configured with a communication interface 130, such as a WiFi, bluetooth, NFC, etc. module, and may send a user input instruction to the display device 200 through a WiFi protocol, or a bluetooth protocol, or an NFC protocol code.
A memory 190 for storing various operation programs, data and applications for driving and controlling the control device 200 under the control of the controller. The memory 190 may store various control signal instructions input by a user.
And a power supply 180 for providing operation power support for each element of the control device 100 under the control of the controller. May be a battery and associated control circuitry.
In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application. The kernel, shell, and file system together form the basic operating system architecture that allows users to manage files, run programs, and use the system. After power-up, the kernel is started, the kernel space is activated, hardware is abstracted, hardware parameters are initialized, virtual memory, a scheduler, signal and inter-process communication (IPC) are operated and maintained. After the kernel is started, shell and user application programs are loaded again. The application program is compiled into machine code after being started to form a process.
Referring to FIG. 4, in some embodiments, the system is divided into four layers, from top to bottom, an application layer (referred to as an "application layer"), an application framework layer (Application Framework) layer (referred to as a "framework layer"), a An Zhuoyun row layer (Android runtime) and a system library layer (referred to as a "system runtime layer"), and a kernel layer, respectively.
In some embodiments, at least one application program is running in the application program layer, and the application programs may be a Window (Window) program, a system setting program, a clock program, a camera application, etc. of an operating system, or may be an application program developed by a third party developer, such as a hi-see program, a K-song program, a magic mirror program, etc. In particular implementations, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which the embodiments of the present application do not limit.
The framework layer provides an application programming interface (application programming interface, API) and programming framework for the application programs of the application layer. The application framework layer includes a number of predefined functions. The application framework layer corresponds to a processing center that decides to let the applications in the application layer act. Through the API interface, the application program can access the resources in the system and acquire the services of the system in the execution.
As shown in FIG. 4, the application framework layer in an embodiment of the present application includes a Manager (Managers) that includes at least one of an activity Manager (ACTIVITY MANAGER) to interact with all activities running in the system, a Location Manager (Location Manager) to provide system Location service access to system services or applications, a package Manager (PACKAGE MANAGER) to retrieve various information related to application packages currently installed on the device, a notification Manager (Notification Manager) to control the display and removal of notification messages, and a Window Manager (Window Manager) to manage bracketing, windows, toolbars, wallpaper, and desktop components on the user interface.
In some embodiments, the activity manager is used to manage lifecycle of individual applications and general navigational rollback functions, such as controlling exit of an application (including switching a currently displayed user interface in a display window to a system desktop), opening, backing (including switching a currently displayed user interface in a display window to a previous level user interface of a currently displayed user interface), and so forth.
In some embodiments, the window manager is configured to manage all window procedures, such as obtaining a display screen size, determining whether there is a status bar, locking the screen, intercepting the screen, controlling display window changes (e.g., scaling the display window down, dithering, distorting, etc.), and so on.
In some embodiments, the system runtime layer provides support for the upper layer, the framework layer, and when the framework layer is in use, the android operating system runs the C/C++ libraries contained in the system runtime layer to implement the functions to be implemented by the framework layer.
In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the kernel layer at least includes at least one of an audio driver, a display driver, a bluetooth driver, a camera driver, a WIFI driver, a USB driver, an HDMI driver, a sensor driver (such as a fingerprint sensor, a temperature sensor, a touch sensor, a pressure sensor, etc.), and the like.
In some embodiments, the kernel layer further includes a power driver module for power management.
In some embodiments, the software programs and/or modules corresponding to the software architecture in fig. 4 are stored in the first memory or the second memory shown in fig. 2 or fig. 3.
In some embodiments, for a display device with a touch function, taking a split screen operation as an example, the display device receives an input operation (such as a split screen operation) acted on a display screen by a user, and the kernel layer may generate a corresponding input event according to the input operation and report the event to the application framework layer. The window mode (e.g., multi-window mode) and window position and size corresponding to the input operation are set by the activity manager of the application framework layer. And window management of the application framework layer draws a window according to the setting of the activity manager, then the drawn window data is sent to a display driver of the kernel layer, and the display driver displays application interfaces corresponding to the window data in different display areas of the display screen.
In some embodiments, as shown in FIG. 5, the application layer contains at least one application that can display a corresponding icon control in a display, such as a live television application icon control, a video on demand application icon control, a media center application icon control, an application center icon control, a game application icon control, and the like.
In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may provide television signals using inputs from cable television, radio broadcast, satellite services, or other types of live television services. And, the live television application may display video of the live television signal on the display device 200.
In some embodiments, the video on demand application may provide video from different storage sources. Unlike live television applications, video-on-demand provides video displays from some storage sources. For example, video-on-demand may come from the server side of cloud storage, from a local hard disk storage containing stored video programs.
In some embodiments, the media center application may provide various multimedia content playing applications. For example, a media center may be a different service than live television or video on demand, and a user may access various images or audio through a media center application.
In some embodiments, an application center may be provided to store various applications. The application may be a game, an application, or some other application associated with a computer system or other device but which may be run in a smart television. The application center may obtain these applications from different sources, store them in local storage, and then be run on the display device 200.
The hardware or software architecture in some embodiments may be based on the description in the foregoing embodiments, and in some embodiments may be based on other similar hardware or software architectures, so long as the technical solution of the present application may be implemented.
In some embodiments, the application center may be provided with a voice assistant application to implement intelligent voice services, such as searching for media assets, adjusting volume, and the like. The user may wake up the voice assistant application by sending a voice command to the display device, which may be some preset wake-up word, and after the voice assistant application wakes up, the user may interact with the voice assistant application to perform voice control on the display device. After receiving the voice command of the user, the intelligent voice assistant needs to perform voice recognition on the voice command to obtain a recognition text, and a certain error probability exists in the recognition text due to the fact that a plurality of characters are easy to mix up.
In order to solve the technical problems, an embodiment of the application shows an overall flow chart of text error correction, referring to fig. 6, firstly, error correction is carried out on an error correction model from an input end to an end of a natural language text to obtain a first error correction result, wherein the error correction model from the end to the end carries out Bert vector characterization, phonological confusion graph representation of characters, a multi-layer graph neural network, hidden vector classification generation characters and the like on the natural language text, namely the text to be error corrected in sequence, then, an elastic search engine carries out candidate recall on the first error correction binding result according to an error correction word library to obtain a recall result, wherein the candidate recall comprises processing such as elastic search, error correction word library reverse index and the like, finally, candidate sorting is carried out on the recall result to obtain a sorting result, and a final error correction result corresponding to the natural language text is generated according to the sorting result, wherein the candidate sorting comprises processing such as editing distance, threshold filtering and the like.
To further describe the text error correction method in fig. 6, the embodiment of the present application further provides a flowchart of the text error correction method, referring to fig. 7, where the method may be used in a display device, and includes the following steps:
And S10, correcting the text to be corrected based on the similar confusion set of the voice and the graph annotation meaning mechanism to obtain an initial correction text.
In some embodiments, the voice assistant application of the display device may receive the user's voice command after waking up. The controller of the display device obtains the voice command received by the voice assistant application, performs voice conversion on the voice command to obtain a text to be corrected, and the text to be corrected and an actual text corresponding to the voice command may have some errors, so that the actual text needs to be obtained through correction, and the actual text may be called a final corrected text.
In some embodiments, it may take a certain time, such as 1 second, for the display device to correct the error, which may bring a user experience of slow response of the display device to the user if the final error correction text is displayed after error correction, so as to avoid the user waiting for the display device to respond for a long time, and after obtaining the text to be corrected, the display may be controlled to display the text to be corrected first, and correct the error in the background.
In some embodiments, the display device may construct an end-to-end error correction model to initially correct the text to be corrected. Referring to fig. 8, a schematic diagram of an end-to-end error detection and correction model according to some embodiments is shown in fig. 8, where text to be corrected, such as "encounter against the error", is input to a Bert Extractor, an initial characterization matrix H is output, and H includes H0、H1、……Ht+1, where Trm represents the encoded output of the transducer layer, EMB represents word embedding for characters, trm takes EMB as input, and t represents the character length requested by the user.
Inputting an initial characterization matrix into GCN Network (Graph Convolutional Nueral Network, graph convolution neural Network), respectively inputting a pronunciation similar confusion set knowledge graph and a shape similar confusion set knowledge graph of a text to be corrected into GCN Network to update the initial characterization matrix, wherein the GCN Network has 3 layers of layer_1, layer_1 and layer_3,
The GCN Network inputs the output result to a classifier, and the error correction result of the error detection and correction model from end to end through the classifier output, such as "encountering adverse circumstances", can be used as initial error correction text, wherein the classifier can be a hidden vector classifier capable of hidden vector classification, and each dotted box in the classifier represents a predicted probability distribution of each character, such as 80%, 70%, 85%.
The above-mentioned end-to-end error detection and correction model parsing method can refer to fig. 9, which is a flow chart of the text parsing method to be corrected according to some embodiments of the present application, as shown in fig. 9, and the parsing method can include steps S101-S105.
And step S101, extracting features of the text to be corrected to obtain an initial characterization matrix.
In some embodiments, feature extraction may be performed on text to be corrected by a Bert model.
The Bert model uses a bi-directional transducer as Encoder (encoder), using both Masked LM (masked language model, mask language model) and Next Sentence Prediction (next sentence prediction) methods to capture word and sentence level tokens, respectively. And after inputting the text to be corrected into the Bert model, outputting an initial characterization matrix H.
And step S102, creating an adjacent matrix of each character in the text to be corrected according to the adjacent confusion set of the voice and the figure.
The sound-shape similar confusion set comprises a preset pronunciation similar confusion set and a shape similar confusion set, wherein the pronunciation similar confusion set is a preset character set which is easy to be confused due to pronunciation similarity, and the shape similar confusion set is a preset character set which is easy to be confused due to shape similarity. In some embodiments, the set of similar confusion may be obtained by data analysis of user data, which may include user input data on a display device.
In some embodiments, for the text to be corrected "see the character in the reverse competition", the result is that its pronunciation similar confusion set is { gold, silence, border, well, mailing, competition }, its shape similar confusion set is { Beijing, mirror, competition, landscape, border }.
In some embodiments, the confusion set of the 'competition' words may not only comprise the Chinese characters, for example, the Chinese characters after the Jingjing deer decoration completely forbidden by Jin Jingjin, the Jingjing refined soil moisture Jingjing clean doze, and the confusion set after competition is the words after colon.
The method of creating the adjacency matrix can be seen in fig. 10, comprising steps S1021-S1023.
And S1021, acquiring similar pronunciation characters and similar shape characters of each character in the to-be-corrected text in the similar confusion set of the voice and the shape.
And respectively extracting the pronunciation similar characters and the shape similar characters of each character in the pronunciation similar confusion set and the shape similar confusion set.
And step S1022, taking the characters in the text to be corrected, the similar pronunciation characters and the characters in the word stock as nodes, and taking the relation among the characters as edges to establish a similar pronunciation adjacency matrix.
In some embodiments, a Chinese character component word stock commonly used in life can be selected to provide alternative characters for the text to be corrected.
And taking the character in the text to be corrected as a central node, taking the pronunciation similar character of the character and the characters except the character in the text to be corrected and the pronunciation similar character thereof in a character library as edge nodes, taking the relation between the characters as edges, and establishing a knowledge graph of a pronunciation similar confusion set, wherein each edge represents 0 or 1,1 represents that two nodes of the edge are similar, 0 represents that two nodes of the edge are not similar, for example, the edge of the character in the text to be corrected, which is connected with the characters in the pronunciation similar confusion set, can be represented as 1, and the edge of the character in the text to be corrected, which is connected with the characters not belonging to the pronunciation similar confusion set, can be represented as 0.
The knowledge graph of the pronunciation-like confusion set can be expressed as a contiguous matrix of n×n, where N represents the number of commonly used chinese characters, i.e., the number of characters in the word stock, e.g., 5000.
Step S1023, taking the characters in the text to be corrected, the shape similar characters and the characters in the character library as nodes, taking the relation among the characters as edges, and establishing a shape similar adjacency matrix.
And taking the character in the text to be corrected as a central node, taking the shape similar character of the character and the character in the character library as nodes as edge nodes, taking the relation between the characters as edges, and establishing a knowledge graph of the shape similar confusion set, wherein each edge represents 0 or 1,1 represents that two nodes of the edge are similar, and 0 represents that two nodes of the edge are not similar. For example, edges of character connections in the character and shape-similar confusion set in the text to be corrected may be represented as 1, and edges of character connections in the character and word stock in the text to be corrected that do not belong to their shape-similar confusion set may be represented as 0. The knowledge-graph of a shape-similar confusion set may also be represented as an N x N adjacency matrix.
And step S103, inputting the initial characterization matrix and the adjacent matrix into a multi-layer graph convolutional neural network to obtain a next-layer characterization matrix.
A 3-layer graph neural network is constructed, as layerl-layer3 and layerl in fig. 8 are the coded output of the Bert Extractor, the inputs of layer2 and layer3 are the output of the upper layer, and the H and the adjacent matrix A are used as the inputs of the multi-layer graph convolutional neural network to extract the semantic information of a deeper layer. The adjacency matrix A comprises an adjacency matrix corresponding to the knowledge graph of the pronunciation similar confusion set and an adjacency matrix corresponding to the knowledge graph of the shape similar confusion set.
The method comprises the steps of taking an H and an adjacent matrix A as input of a multi-layer graph convolutional neural network to obtain a characterization matrix of a second layer, taking the characterization matrix of the second layer and the adjacent matrix A as input of the multi-layer graph convolutional neural network to obtain a characterization matrix of a third layer, and the like to obtain a characterization matrix Hl of each layer of graph convolutional layer of the multi-layer graph convolutional neural network, wherein the calculation formula is as follows:
(1) Wherein l represents a first layer, I represents an identity matrix corresponding to A,The matrix after feature of the incoming node self-connection, which may be referred to as a adjacency estimation matrix,The diagonal matrix corresponding to the representation a may be referred to as a diagonal estimation matrix, and the values at the diagonal positions thereof, that is, the degrees of the corresponding nodes. i and j are both between 0 and N. Hl-1, the characterization matrix of the upper layer of Hl, Wl represents the training parameters of the first layer.
Step S104, the last layer of characterization matrix of the multi-layer graph convolution neural network is obtained according to a graph attention mechanism.
In some embodiments, a graph attention mechanism may be introduced to combine knowledge of similar pronunciation and similar shape to obtain the final layer of characterization matrix Hl+1.
The attention characterization matrix Cl of knowledge fusion is calculated by adopting an attention mechanism, and the calculation formula is as follows:
(2) Where Cl is a matrix in dimension N x D, D represents the vector dimension after Bert encoding, fk(Ak,Hl)i is the ith row of the convolution output for the graph K, the graph K is the adjacency matrix of the kth word, which can also be expressed as ak, s represents a shape similarity, and p represents a pronunciation similarity.Represents the scalar weight of the ith character for the graph k, l represents the number of layers of the neural network Wa whereBeta is a super parameter and beta may be a constant, e.g. 3.
The characterization matrix Hl+1 for the last layer is calculated according to:
step S105, generating characters through the full connection layer and the probability normalization function.
In some embodiments, the characters may be generated according to a probability normalization function:
(4) Where X represents the entire user request, such as "encounter against competition",The word representing the first position correct is y p representing a probability,Representing the probability that the input is X and the i-th character position is y, W represents the training weight parameter of the fully connected layer.
According to the formula (4), the probability that each character position is a certain character can be obtained, if one character position is a plurality of alternative characters, such as 'context', 'competition', the character corresponding to the maximum probability is selected as the character of the character position, wherein the character of each alternative character position can be obtained according to the characterization matrix Hl+1 of the last layer. In some embodiments, after entering the text to be error corrected "encounter against" into the end-to-end error detection and correction model, the character "encounter against" may be generated and output, which may be referred to as the initial error correction text.
And step S20, carrying out candidate recall on the text to be corrected and the initial correction text to obtain a recall text.
In some embodiments, the initial error correction text may be subjected to an elastic search query, resulting in a first recall text.
The ES (distributed full text search engine) is a full text search server, and may also be used as a NoSQL database to store documents and data in any format. The full-text search engine of the ES is an open source search engine built on Lucene (full-text search framework) and can be used for full-text search and geographic information search.
The initial error correction text is used as a query to perform an elastic search query, and various query modes are available, including matching search, prefix search, suffix search, fuzzy search, and the query can be searched in combination with various modes to obtain a first recall text. For example, the matching search may be an exact search, requiring the word to be exactly the same, the search "pig petty", and the search result may be "pig petty".
In some embodiments, an inverted index data structure may be constructed based on the error correction word stock, and the initial error correction text and the text to be error corrected are respectively used as query queries to obtain a second recall text.
Inverted indexing, also commonly referred to as reverse indexing, placement archive, or reverse archive, is an indexing method used to store a mapping of the storage location of a word in a document or group of documents under a full text search. It is the most commonly used data structure in document retrieval systems. Through the inverted index, a list of documents containing a word can be quickly obtained from the word. The inverted index is mainly composed of two parts, namely a word dictionary and an inverted file. Based on the error correction word stock, mapping from Chinese characters to words is created, inverted indexes are built, error correction results and original texts are used as queries, and similar words in the word stock are searched by combining the number of similar words or the number of similar pinyin and other rule conditions.
In some embodiments, the recall text may include a first recall text and a second recall text.
And step S30, candidate sorting is carried out on the recall text, and a final correction text corresponding to the text to be corrected is obtained according to the sorting result.
The Levenshtein edit distance is a measure of the degree of difference between two strings, and is used to represent the minimum number of times it takes to edit a single character (e.g., modify, insert, delete) when modifying from one string to another. The greater the Levenshtein edit distance, the weaker the correlation of the two strings is indicated.
In some embodiments, the edit distance between each recall text and the original correction text may be calculated separately, and the edit distance may be divided by the length of the longest text in the recall text and the original correction text to obtain the difference degree of the recall text, where the longest text refers to the text with the largest number of words.
Further, a difference threshold may be set, recall text is ranked according to the difference, and recall text having a difference higher than the difference threshold is filtered. Wherein the variance threshold may be set to a constant, such as 0.75.
And taking the recall text with the smallest difference degree as the final correction text of the text to be corrected.
The embodiment of the application also provides a server, which can be configured to execute the text error correction method shown in fig. 7 to correct the text of the Chinese text.
In some embodiments, the server may be communicatively coupled to a display device, and the display device may send the text to be corrected to the server, and after obtaining the final corrected text according to the text correction method shown in fig. 7, the server sends the final corrected text to the display device, so that the display device displays the final corrected text.
Referring to fig. 11-13, a schematic diagram of a voice interaction interface according to some embodiments, as shown in fig. 11, a wake-up word of a voice assistant application may be "hai jungle", after the voice assistant application wakes up, a recording prompt word may be displayed, for example, "listening to the voice command" to prompt a user to issue a voice command, as shown in fig. 12, after the user issues the voice command, the display device may perform voice conversion on the voice command and display the voice command in real time, where the text after voice conversion may be a text to be corrected, for example, "see reverse competition", and after the display device displays the text to be corrected, the display device may perform text correction in a background process of the display device according to the method shown in fig. 7 to obtain a final corrected text, or the display device uploads the text to be corrected to a server, and returns the final corrected text to the display device, as shown in fig. 13, where the display device may refresh the text to be corrected to the final corrected text after obtaining the final corrected text. Further, the display device can also respond according to the final error correction text, such as controlling the display device, or playing corresponding audio/video media assets, etc.
As can be seen from the above embodiments, the embodiment of the present application generates a pronunciation similar knowledge map and a shape similar knowledge map according to the confusion set corresponding to the text to be corrected, blends the pinyin and the font related knowledge of the Chinese characters into the graphic neural network, extracts the deep semantic information between the similar characters, and can effectively utilize the knowledge of the similarity of the pronunciation and the shape, and improve the accuracy and recall rate of error detection and correction.
Since the foregoing embodiments are all described in other modes by reference to the above, the same parts are provided between different embodiments, and the same and similar parts are provided between the embodiments in the present specification. And will not be described in detail herein.
It should be noted that in this specification, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the statement "comprises an" or "comprising" does not exclude that an additional identical element is present in a circuit structure, article or apparatus that comprises the element.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure of the application herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims. The above embodiments of the present application do not limit the scope of the present application.

Claims (8)

Translated fromChinese
1.一种显示设备,其特征在于,包括:1. A display device, comprising:显示器;monitor;控制器,与所述显示器连接,所述控制器被配置为:A controller is connected to the display, and the controller is configured to:响应于接收到用户输入的语音命令,对所述语音命令进行语音转换,得到待纠错文本;In response to receiving a voice command input by a user, performing voice conversion on the voice command to obtain a text to be corrected;控制显示器显示所述待纠错文本;Controlling the display to display the text to be corrected;基于音形相近混淆集和图注意力机制对所述待纠错文本进行纠错,得到初始纠错文本;Correcting the text to be corrected based on a confusion set of similar sound and form and a graph attention mechanism to obtain an initial corrected text;通过倒排索引的词库和预设规则条件分别将所述初始纠错文本和待纠错文本中的字符进行候选召回,得到召回文本,其中,所述倒排索引的词库是基于纠错词库创建的,用于将汉字字符到词的映射进行倒排索引;所述预设规则条件包括与所述初始纠错文本或待纠错文本相似字的个数或相似拼音的个数;Recalling the characters in the initial error-correcting text and the text to be corrected respectively by using an inverted index vocabulary and preset rule conditions to obtain a recalled text, wherein the inverted index vocabulary is created based on the error-correcting vocabulary and is used to perform an inverted index on the mapping of Chinese characters to words; the preset rule conditions include the number of similar characters or the number of similar pinyins to the initial error-correcting text or the text to be corrected;分别计算召回文本与初始纠错文本的编辑距离;将所述编辑距离除以所述召回文本和所述初始纠错文本中最长文本的长度,得到所述召回文本的差异度;根据所述差异度将所述召回文本进行排序,将所述差异度最小的召回文本作为待纠错文本的最终纠错文本;Calculating the edit distance between the recalled text and the initial correction text respectively; dividing the edit distance by the length of the longest text between the recalled text and the initial correction text to obtain the difference between the recalled text and the initial correction text; sorting the recalled texts according to the difference, and selecting the recalled text with the smallest difference as the final correction text of the text to be corrected;控制显示器将所述待纠错文本刷新为最终纠错文本。The display is controlled to refresh the text to be corrected into the final text to be corrected.2.根据权利要求1所述的显示设备,其特征在于,所述基于音形相近混淆集和图注意力机制对所述待纠错文本进行纠错,包括:2. The display device according to claim 1, wherein the correcting the text to be corrected based on the sound-shape confusion set and graph attention mechanism comprises:对待纠错文本进行特征抽取,得到初始表征矩阵;Perform feature extraction on the text to be corrected to obtain the initial representation matrix;根据音形相近混淆集创建所述待纠错文本中每个字符的邻接矩阵;Creating an adjacency matrix for each character in the text to be corrected based on a confusion set of similar sound and form;将所述初始表征矩阵和邻接矩阵输入多层图卷积神经网络,得到下一层表征矩阵;Inputting the initial representation matrix and the adjacency matrix into a multi-layer graph convolutional neural network to obtain a next-layer representation matrix;根据图注意力机制得到所述多层图卷积神经网络的最后一层表征矩阵;Obtaining the last layer representation matrix of the multi-layer graph convolutional neural network according to the graph attention mechanism;通过全连接层和概率归一化函数生成字符。Generate characters through a fully connected layer and a probability normalization function.3.根据权利要求2所述的显示设备,其特征在于,所述根据音形相近混淆集创建所述待纠错文本中每个字符的邻接矩阵,包括:3. The display device according to claim 2, wherein the step of creating an adjacency matrix for each character in the text to be corrected based on the similar sound-shape confusion set comprises:获取所述待纠错文本中每个字符在音形相近混淆集中的发音相似字符和形状相似字符;Obtaining, for each character in the text to be corrected, characters with similar pronunciations and characters with similar shapes in a similar pronunciation-shape confusion set;将所述待纠错文本中的字符、发音相似字符和字库中的字符作为节点,将字符之间的关系作为边,建立发音相似邻接矩阵;Using the characters in the text to be corrected, the characters with similar pronunciations, and the characters in the character library as nodes and the relationships between the characters as edges to establish a pronunciation similarity adjacency matrix;将所述待纠错文本中的字符、形状相似字符和字库中的字符作为节点,将字符之间的关系作为边,建立形状相似邻接矩阵。The characters in the text to be corrected, the characters with similar shapes and the characters in the character library are used as nodes, and the relationships between the characters are used as edges to establish a shape-similar adjacency matrix.4.根据权利要求2所述的显示设备,其特征在于,所述将所述初始表征矩阵和邻接矩阵输入多层图卷积神经网络,得到下一层表征矩阵,包括:4. The display device according to claim 2, wherein inputting the initial representation matrix and the adjacency matrix into a multi-layer graph convolutional neural network to obtain a next-layer representation matrix comprises:将所述邻接矩阵与单位矩阵相加,得到邻接估计矩阵;Adding the adjacency matrix to the identity matrix to obtain an adjacency estimate matrix;计算所述邻接估计矩阵对应的对角矩阵,得到对角估计矩阵;Calculating a diagonal matrix corresponding to the adjacency estimation matrix to obtain a diagonal estimation matrix;根据所述邻接估计矩阵、对角估计矩阵和初始表征矩阵,得到下一层表征矩阵。A next-layer representation matrix is obtained according to the adjacency estimation matrix, the diagonal estimation matrix and the initial representation matrix.5.根据权利要求2所述的显示设备,其特征在于,所述根据图注意力机制得到所述多层图卷积神经网络的最后一层表征矩阵,包括:5. The display device according to claim 2, wherein obtaining the last layer representation matrix of the multi-layer graph convolutional neural network according to the graph attention mechanism comprises:采用注意力机制,计算知识融合的注意力表征矩阵;Use the attention mechanism to calculate the attention representation matrix of knowledge fusion;根据所述注意力表征矩阵和每一层表征矩阵的和,得到最后一层表征矩阵。The final layer representation matrix is obtained according to the sum of the attention representation matrix and each layer representation matrix.6.根据权利要求1所述的显示设备,其特征在于,所述控制器还被配置为:6. The display device according to claim 1, wherein the controller is further configured to:过滤差异度高于差异度阈值的召回文本。Filter the recalled texts whose difference is higher than the difference threshold.7.一种文本纠错方法,用于显示设备,其特征在于,包括:7. A text error correction method for a display device, comprising:基于音形相近混淆集和图注意力机制对待纠错文本进行纠错,得到初始纠错文本,Based on the similar sound-form confusion set and graph attention mechanism, the correction text is corrected to obtain the initial correction text.通过倒排索引的词库和预设规则条件分别将所述初始纠错文本和待纠错文本中的字符进行候选召回,得到召回文本,其中,所述倒排索引的词库是基于纠错词库创建的,用于将汉字字符映射到词;所述预设规则条件包括与所述初始纠错文本或待纠错文本相似字的个数或相似拼音的个数;Recalling the characters in the initial correction text and the text to be corrected respectively by using an inverted index vocabulary and preset rule conditions to obtain a recalled text, wherein the inverted index vocabulary is created based on the correction vocabulary and is used to map Chinese characters to words; the preset rule conditions include the number of similar characters or the number of similar pinyins to the initial correction text or the text to be corrected;分别计算召回文本与初始纠错文本的编辑距离;将所述编辑距离除以所述召回文本和所述初始纠错文本中最长文本的长度,得到所述召回文本的差异度;根据所述差异度将所述召回文本进行排序,将所述差异度最小的召回文本作为待纠错文本的最终纠错文本。Calculate the edit distance between the recalled text and the initial correction text respectively; divide the edit distance by the length of the longest text between the recalled text and the initial correction text to obtain the difference of the recalled text; sort the recalled texts according to the difference, and use the recalled text with the smallest difference as the final correction text of the text to be corrected.8.一种服务器,其特征在于,所述服务器被配置为:8. A server, characterized in that the server is configured to:接收来自显示设备的待纠错文本;receiving a text to be corrected from a display device;基于音形相近混淆集和图注意力机制对所述待纠错文本进行纠错,得到初始纠错文本;Correcting the text to be corrected based on a confusion set of similar sound and form and a graph attention mechanism to obtain an initial corrected text;通过倒排索引的词库和预设规则条件分别将所述初始纠错文本和待纠错文本中的字符进行候选召回,得到召回文本,其中,所述倒排索引的词库是基于纠错词库创建的,用于将汉字字符映射到词;所述预设规则条件包括与所述初始纠错文本或待纠错文本相似字的个数或相似拼音的个数;Recalling the characters in the initial correction text and the text to be corrected respectively by using an inverted index vocabulary and preset rule conditions to obtain a recalled text, wherein the inverted index vocabulary is created based on the correction vocabulary and is used to map Chinese characters to words; the preset rule conditions include the number of similar characters or the number of similar pinyins to the initial correction text or the text to be corrected;分别计算召回文本与初始纠错文本的编辑距离;将所述编辑距离除以所述召回文本和所述初始纠错文本中最长文本的长度,得到所述召回文本的差异度;根据所述差异度将所述召回文本进行排序,将所述差异度最小的召回文本作为待纠错文本的最终纠错文本。Calculate the edit distance between the recalled text and the initial correction text respectively; divide the edit distance by the length of the longest text between the recalled text and the initial correction text to obtain the difference of the recalled text; sort the recalled texts according to the difference, and use the recalled text with the smallest difference as the final correction text of the text to be corrected.
CN202010879686.1A2020-08-272020-08-27 Display device, text error correction method and serverActiveCN114118064B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010879686.1ACN114118064B (en)2020-08-272020-08-27 Display device, text error correction method and server

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010879686.1ACN114118064B (en)2020-08-272020-08-27 Display device, text error correction method and server

Publications (2)

Publication NumberPublication Date
CN114118064A CN114118064A (en)2022-03-01
CN114118064Btrue CN114118064B (en)2025-10-03

Family

ID=80374665

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010879686.1AActiveCN114118064B (en)2020-08-272020-08-27 Display device, text error correction method and server

Country Status (1)

CountryLink
CN (1)CN114118064B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114676684B (en)*2022-03-172024-02-02平安科技(深圳)有限公司Text error correction method and device, computer equipment and storage medium
CN115017276B (en)*2022-03-282022-11-29连芷萱Multi-turn conversation method and system for government affair consultation, government affair robot and storage medium
CN114896965B (en)*2022-05-172023-09-12马上消费金融股份有限公司Text correction model training method and device, text correction method and device
CN115293138B (en)*2022-08-032023-06-09北京中科智加科技有限公司Text error correction method and computer equipment
CN117874089B (en)*2023-12-052024-08-09深圳市六度人和科技有限公司Automatic correction method, device, terminal and storage medium for search text

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111291552A (en)*2020-05-092020-06-16支付宝(杭州)信息技术有限公司Method and system for correcting text content
CN111339757A (en)*2020-02-132020-06-26上海凯岸信息科技有限公司Error correction method for voice recognition result in collection scene
CN111523306A (en)*2019-01-172020-08-11阿里巴巴集团控股有限公司Text error correction method, device and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103886094A (en)*2014-04-032014-06-25江苏物联网研究发展中心Method for error correction and expansion of electronic commerce search engine
CN111274785B (en)*2020-01-212023-06-20北京字节跳动网络技术有限公司Text error correction method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111523306A (en)*2019-01-172020-08-11阿里巴巴集团控股有限公司Text error correction method, device and system
CN111339757A (en)*2020-02-132020-06-26上海凯岸信息科技有限公司Error correction method for voice recognition result in collection scene
CN111291552A (en)*2020-05-092020-06-16支付宝(杭州)信息技术有限公司Method and system for correcting text content

Also Published As

Publication numberPublication date
CN114118064A (en)2022-03-01

Similar Documents

PublicationPublication DateTitle
CN114118064B (en) Display device, text error correction method and server
CN112163086B (en)Multi-intention recognition method and display device
CN112839261B (en)Method for improving matching degree of voice instruction and display equipment
CN111372109B (en) A kind of smart TV and information interaction method
CN111984763B (en)Question answering processing method and intelligent device
CN114945102B (en) Display device and method for character recognition and display
CN112511882A (en)Display device and voice call-up method
CN112182196A (en)Service equipment applied to multi-turn conversation and multi-turn conversation method
CN112002321B (en)Display device, server and voice interaction method
CN113722542A (en)Video recommendation method and display device
CN111949782B (en)Information recommendation method and service equipment
CN114706944B (en) Server and multilingual text semantic understanding method
CN111866568B (en)Display device, server and video collection acquisition method based on voice
CN112188249B (en)Electronic specification-based playing method and display device
CN111464869B (en)Motion position detection method, screen brightness adjustment method and intelligent device
CN114187905A (en)Training method of user intention recognition model, server and display equipment
CN114627864A (en)Display device and voice interaction method
CN111950288B (en)Entity labeling method in named entity recognition and intelligent device
CN112256232B (en)Display device and natural language generation post-processing method
CN114554266B (en)Display device and display method
CN114155846B (en) A semantic slot extraction method and display device
CN113468351A (en)Intelligent device and image processing method
CN113593559B (en)Content display method, display equipment and server
CN112261290B (en)Display device, camera and AI data synchronous transmission method
CN113794915B (en)Server, display device, poetry and singing generation method and medium play method

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp