Movatterモバイル変換


[0]ホーム

URL:


CN112836620A - A multifunctional intelligent electronic seat card device, system, equipment and storage medium - Google Patents

A multifunctional intelligent electronic seat card device, system, equipment and storage medium
Download PDF

Info

Publication number
CN112836620A
CN112836620ACN202110124665.3ACN202110124665ACN112836620ACN 112836620 ACN112836620 ACN 112836620ACN 202110124665 ACN202110124665 ACN 202110124665ACN 112836620 ACN112836620 ACN 112836620A
Authority
CN
China
Prior art keywords
module
information
user
recognition
intelligent electronic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110124665.3A
Other languages
Chinese (zh)
Inventor
张通
刘炳秀
贾雪
陈俊龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUTfiledCriticalSouth China University of Technology SCUT
Priority to CN202110124665.3ApriorityCriticalpatent/CN112836620A/en
Publication of CN112836620ApublicationCriticalpatent/CN112836620A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种多功能智能电子坐席牌装置,包括:数据采集模块,用于获取用户的相关信息;后台控制模块,用于实现对终端的控制、对数据的处理与传输和不同终端的通信控制;存储模块,用于保存使用过程中产生的记录文件、音频文件、视频文件及表情分析结果文件;输出模块,用于输出用户的身份信息和声音信息;集多种功能于一体,能够满足多种场合的需求,不仅可以作为坐席系统来展示用户信息,还可以替代麦克风、笔记本等设备,坐席系统内置麦克风可进行扩音,语音识别可转换语音为文字记录会议内容,并提供了会议记录下载的功能;同时,本发明还增加了面部表情识别功能,可监测用户的参会情绪状态,防止意外发生。

Figure 202110124665

The invention discloses a multifunctional intelligent electronic seat card device, comprising: a data acquisition module for acquiring relevant information of users; a background control module for realizing the control of the terminal, the processing and transmission of data, and the connection of different terminals. Communication control; storage module, used to save record files, audio files, video files and expression analysis result files generated during use; output module, used to output user identity information and voice information; To meet the needs of various occasions, it can not only be used as a seat system to display user information, but also can replace devices such as microphones and notebooks. The function of recording and downloading; at the same time, the present invention also adds the function of facial expression recognition, which can monitor the user's emotional state of participating in the conference and prevent accidents.

Figure 202110124665

Description

Multifunctional intelligent electronic seat board device, system, equipment and storage medium
Technical Field
The invention relates to the field of seat board research, in particular to a multifunctional intelligent electronic seat board device, a system, equipment and a storage medium.
Background
The existing seat system is often used in the scenes of meetings, banks and the like. The traditional conference seat cards are printed in advance on the paper with the fixed size, and then are inserted into the plastic display card, so that the seat cards are useless after being used up, and are neither environment-friendly nor convenient. To address this problem, many types of electronic seating systems have been developed. Although the existing electronic seat system solves the problem of paper waste and adds functions of electronic display, sound amplification and the like, many problems still need to be improved, such as how to more conveniently acquire electronic display screen information, how to more conveniently realize background functions and the like, and a great development space is provided.
In the bank seat system, the electronic seat system facilitates interaction of workers with users, and can also judge whether the workers are in the seats or not or carry out identity verification on face recognition of the users by shooting videos through the camera.
In reality, the electronic seat system used by enterprises or other organizations for holding the conference can uniformly control and display the basic information of the participants through the background.
The existing electronic seat also basically has the function of audio and video acquisition. And collecting audio of a speaker and audio and video data of participants through equipment such as a built-in microphone and a micro camera of the seat system. The microphone function can realize sound amplification by using an audio amplification circuit built in the microphone or an external multi-channel audio power amplifier.
The existing electronic seat system realizes communication, data transmission and the like between a terminal and a background in a wireless communication mode. Wireless communication technology is a communication method for exchanging information by using the characteristic that electromagnetic wave signals can propagate in free space. The existing electronic seat system realizes the function of data transmission between the background and the terminal in a wireless communication mode, wherein the function comprises technologies such as Wi-Fi, WLAN, EDGE and the like. And after the audio, video and other data of each terminal are converted, the converted data are transmitted to a background bus in a wireless communication mode to be stored or processed in other modes. Some electronic seat systems may have voting or service buttons on the base, and the response of the button events also uses the principle of wireless communication.
Currently, an electronic seat system rarely has a lens device, and has a shooting function, which is to judge the presence or absence state of a participant by shooting a video or to authenticate a user by face recognition.
Objective disadvantages of the prior art:
the functions of the existing electronic seat card system are limited, although the display screens can be used for uniformly inputting and modifying the display contents by the background, the following conditions are caused: the position is fixed in advance, and the user cannot decide the seat by himself; when the information such as the name and the like is wrong, the user cannot input or modify the information by himself.
Existing audio modules only have a public address function. Besides the functions, the audio module is also provided with a recording function, and an administrator determines whether to disclose the audio module to determine whether to use the recording function. If the seat system is used in a public place, the recording function can be used when the audio is required to be reused after use; recording functionality may not be available if the use of an agent system would involve a scenario where personal privacy, business or other fields are confidential.
The existing seat system needs to realize the function of interaction between part of the seat system and the background through keys on the base, such as voting keys, service call tea keys and the like, and the keys occupy large space and influence the appearance.
The existing electronic seat system carries out video shooting on a user, and is only used for judging whether the user is in a presence state or a absence state or in an exhausted state, analyzing a sitting posture, and further analyzing the specific emotional state of the user by utilizing the shot video.
The existing voice recognition algorithm is not applied to an electronic seat system of a meeting, psychological consultation and intelligent classroom scene; the existing facial expression recognition algorithm does not combine the muscle activity unit of the face to generate a training model; the existing electronic seat system does not support file downloading, files can only be transmitted to a background storage module for management, but a terminal user cannot directly download the files at the terminal.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art, and provides a multifunctional intelligent electronic seat plate device, a system, equipment and a storage medium, which can be used for meeting, psychological consultation, intelligent classroom, and intelligent electronic seat plates integrating functions of schedule recognition, voice recognition, sound amplification, conference file real-time downloading and the like, can acquire user information through a mobile phone APP, and can modify the display content of a seat system in time when personnel change; in addition, the invention not only can be used as a seat system to display user information, but also can replace equipment such as a microphone, a notebook computer and the like, the microphone arranged in the seat system can amplify the sound, the voice recognition can convert the voice into text to record the conference content, and the function of downloading the conference record is provided; meanwhile, the facial expression recognition function is added, the emotion states of the participants of the users can be monitored, and accidents are prevented.
The invention aims to provide a multifunctional intelligent electronic seat sign device.
The invention also provides a multifunctional intelligent electronic seat sign system.
A third object of the invention is to provide an apparatus.
A fourth object of the present invention is to provide a storage medium.
The first purpose of the invention is realized by the following technical scheme:
the utility model provides a multi-functional intelligent electron agent tablet device which characterized in that includes:
the data acquisition module is used for acquiring relevant information of a user, wherein the relevant information comprises personal information, character information, sound information and video information;
the background control module is used for realizing control over the terminal, processing and transmission of data and communication control over different terminals;
the storage module is used for storing a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process;
and the output module is used for outputting the identity information and the sound information of the user.
Furthermore, the data acquisition module comprises a character acquisition module, a sound acquisition module, a video acquisition module and an agent terminal, wherein the character acquisition module acquires character information uploaded by the user terminal, and the character information comprises identity information and meeting records; the voice acquisition module is used for acquiring voice information of a user, and the video acquisition module is used for acquiring video information of the user.
Furthermore, the background control module comprises a control module, a transmission module, an expression recognition module and a voice recognition module; the expression recognition module is used for recognizing facial expression information in the video information and obtaining expression analysis results; the voice recognition module is used for recognizing voice information and converting the voice information into a character recording file; the control module is used for controlling the seat card system and sending control instructions to other modules; the transmission module is used for transmitting data.
Furthermore, after the expression recognition module receives the video from the transmission module, firstly, preprocessing a video image, performing face detection by adopting an artificial neural network, performing face alignment according to a face locating point detected by the face, performing gray scale and geometric normalization on the image after data enhancement, performing frame aggregation after preprocessing, extracting features, combining multiple frames, taking the face image as input data, and outputting a classification result of a certain type of expression after recognition; and a general system structure integrating privilege information in a deep network to recognize facial expressions is used, input of basic facial action unit information is added during model training, the privilege information is added on the basis of an original facial image and is used as auxiliary output to supervise feature learning, facial expression information is obtained, and expression analysis results are obtained.
Further, the voice recognition module is used for preprocessing input voice, wherein the preprocessing comprises framing, windowing and pre-emphasis; and then, feature extraction is carried out, when actual recognition is carried out, a template is generated for the test voice according to the training process, and finally recognition is carried out according to a distortion judgment criterion.
Further, the output module comprises an information display module and a sound amplifying module; the information display module is used for displaying user information, and the sound amplification module is used for amplifying sound of a user.
The system further comprises a voting module and a service button module, wherein the voting module is used for voting in the conference, and the service button module is used for calling the service.
The second purpose of the invention is realized by the following technical scheme:
a multifunctional intelligent electronic seat board system comprises a management end, a user terminal and a multifunctional intelligent electronic seat board device, wherein a user communicates with the seat board device through the user terminal and obtains an information file according to the authority of the user terminal, and the management end is used for managing the seat board device and the user terminal.
The third purpose of the invention is realized by the following technical scheme:
an apparatus comprising a processor and a memory for storing a processor executable program, the processor, when executing the program stored in the memory, implementing expression recognition and speech recognition of a multi-functional intelligent electronic agent board.
The fourth purpose of the invention is realized by the following technical scheme:
a storage medium stores a program, and when the program is executed by a processor, the expression recognition and the voice recognition of a multifunctional intelligent electronic agent board are realized.
The working process of the invention is as follows:
1. this patent equipment places in user's dead ahead. Before the device is used, a manager opens all devices through a background, sets the property, namely public or private, and controls whether to use the recording function.
2. Before using the seat system, the background enables the display screen to display the user name through unified control, or the user inputs identity information through a mobile phone APP by self, and after the background receives the transmitted data, the information is transmitted to the display screen to be displayed.
3. The background controls the camera to be switched on and off, when the camera is turned on, the camera continuously collects the video image data of the face of the user in the using process of the seat system, the emotion of the user is monitored through an expression recognition algorithm in the background, and the video and emotion analysis results are transmitted to the storage module.
4. The background controls the microphone to make a specified speaker speak within a specified time, or the user controls the microphone switch, and presses the speaking button to turn on the microphone switch to amplify the sound when the speaker needs to speak. If the audio can be disclosed, recording the speaking content, and storing the audio data in the storage module.
5. And the background performs voice recognition corresponding to the speeches of different users through the terminal numbers, converts the speeches into characters, records the characters and stores the records in the storage module.
6. After the meeting is finished, the ordinary user downloads the recording, the expression recognition video and the result of the ordinary user and the overall meeting record data through the mobile phone APP, and the administrator can check all files generated in the using process of the seat system in the background.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides an intelligent electronic seat board which can be used for meeting, psychological consultation and intelligent classroom and integrates the functions of expression recognition, voice recognition, sound amplification, real-time downloading of meeting files and the like. The multifunctional desk integrates multiple functions and can meet the requirements of various occasions. According to the invention, the user information can be obtained through the mobile phone APP, and when personnel change, the display content of the seat system can be modified in time; in addition, the invention not only can be used as a seat system to display user information, but also can replace equipment such as a microphone, a notebook computer and the like, the microphone arranged in the seat system can amplify the sound, the voice recognition can convert the voice into text to record the conference content, and the function of downloading the conference record is provided; meanwhile, the facial expression recognition function is added, the emotion states of the participants of the users can be monitored, and accidents are prevented.
2. The invention can shoot the facial image of the human body through the camera, recognize the facial expression and analyze the emotional state of the user. The real-time online evaluation and test method is suitable for meetings, interviews, psychological consultations, intelligent classes and the like, can analyze the mental state of the user, and can also take preventive measures in case of emergency.
3. The invention uses a general system structure which integrates privilege information in a deep network to recognize facial expressions, and a face recognition algorithm takes a facial muscle activity unit as privilege information to be learned and trained. The method improves the algorithm, integrates privilege information to recognize the facial expression, improves the accuracy of expression recognition, more accurately acquires the emotion change of the user, and reduces inconvenience caused by incorrect emotion analysis.
4. The invention uses the voice recognition function to convert the user speaking content into a text file and store the text file. The background records the speeches of different users according to the speech sequence of the terminal part numbers, the speeches are arranged into records in the using process of the seat system, the recorded contents are automatically generated into text files and then stored in the storage module, and the users can download the text files on the mobile phone APP.
5. The invention utilizes the mobile phone APP to obtain the information of the current user, and the administrator can directly control the system through the APP. The user can select the seat independently, and the user only needs to input the name of the user on the mobile phone, and the content is transmitted to the display screen, so that the heavy preparation before the meeting and the classroom are started, and unnecessary waste is reduced. The administrator can directly control various parameter configurations of the seat system through the use of the mobile terminal APP.
6. The data processing and corresponding algorithm modules are executable program codes which are described and read out and executed by a background. The data processing of complex hardware equipment is not needed, the occupied space of products is saved, the desktop is simple, and the use experience of users is improved.
7. The users of the invention are divided into administrators and ordinary users, and have different authorities for the use of products. The unified switch of the product, the display of the display, the setting of the properties, the use of the microphone camera and the like are controlled by an administrator, so that the management of the use of the product is facilitated. Meanwhile, different users of various text, audio and video files generated in the using process can carry out different operations, so that the file management is convenient, the privacy of the users is respected, and the operation is humanized.
8. This patent will vote and service button design in cell-phone APP. The design that the existing electronic seat board keys are arranged on the base is changed, and the space of the seat board is saved.
9. The whole equipment supports wireless communication, can transmit data to the storage module in a wireless mode, and is convenient to achieve data storage, processing and data analysis operation. The data is stored quickly and cannot be lost, and the safety and reliability of data storage are enhanced. Can subsequently modify and process the data
10. The user can download the required file on the terminal APP. And after the conference is finished, the file can be obtained without downloading and checking in a background. The mobile phone can be downloaded to the mobile phone to be more convenient to check.
Drawings
FIG. 1 is a block diagram of a multifunctional intelligent electronic agent sign device according to the present invention;
fig. 2 is a front view of the seat card apparatus according toembodiment 1 of the present invention;
fig. 3 is a rear view of the seat card device according toembodiment 1 of the present invention;
FIG. 4 is a block diagram of data acquisition inembodiment 1 of the present invention;
FIG. 5 is a diagram of a background control module in theembodiment 1 of the present invention;
fig. 6 is a transmission diagram of a transmission module inembodiment 1 of the present invention;
FIG. 7 is a flow chart of the recognition of the speech recognition module in theembodiment 1 of the present invention;
fig. 8 is a flow chart of the expression recognition module recognition inembodiment 1 of the present invention;
FIG. 9 is a diagram of the implementation steps of the AOAU algorithm in theembodiment 1 of the present invention;
fig. 10 is a block diagram of a moderator card system according toembodiment 2 of the present invention;
fig. 11 is an overall functional structure diagram of the seat card system inembodiment 2 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1:
a multi-functional intelligent electronic agent sign device, as shown in fig. 1, comprising:
the data acquisition module is used for acquiring relevant information of a user, wherein the relevant information comprises personal information, character information, sound information and video information;
the background control module is used for realizing control over the terminal, processing and transmission of data and communication control over different terminals;
the storage module is used for storing a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process;
the output module is used for outputting the identity information and the sound information of the user; the front of the seat card device is shown in fig. 2, and the back of the seat card device is shown in fig. 3.
The data acquisition module is shown in fig. 4 and comprises a character acquisition module, a sound acquisition module, a video acquisition module and an agent terminal, wherein the character acquisition module acquires characters uploaded by a user terminal, the sound acquisition module is used for acquiring sound information of a user, the video acquisition module is used for acquiring video information of the user, namely the data acquisition module is used for acquiring data information of the user, such as identity, mental state, sound and the like, and a background can analyze and process the data conveniently. The method specifically comprises the following steps: the device of the data acquisition module consists of a camera, a seat terminal, a user mobile phone and a hidden micro microphone. And a microphone of the sound acquisition module is arranged in the seat system terminal. The background can control the module to control the speaking sequence, time and the like of the user, the user also presses the speaking button to turn on a microphone switch to amplify the sound, and the button is restored after the completion. The background control module can also decide whether to take the recording or not according to the use occasion. A user inputs information such as own name and the like on a mobile phone APP to obtain character information data; the method comprises the steps that a miniature microphone arranged in the terminal equipment is started to collect audio data of a user after a speech button is pressed; and acquiring a facial video image through the miniature camera. The miniature microphone is arranged in the terminal, and the camera is positioned at the upper left corner of the back of the terminal.
The background control module realizes the control of the terminal, the processing and transmission of various data and the information exchange and transmission with different terminals.
The storage module stores a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process of the patent.
The output module is composed of a microphone and a display screen. The microphone is internally provided with an audio amplification circuit for amplifying the voice of a speaker, and the microphone can realize the sound amplification function under the environment without sound amplification equipment. The display screen is used for displaying after receiving the user identity information input by the APP. The electronic display screen can be uniformly allocated by the background, receives and displays data transmitted by the background, and can be identity information of a participant directly transmitted by the background or identity information such as a user name and the like input through a mobile phone APP. The electronic display screen is preferably an electronic ink screen.
A storage module:
the storage module receives various files, character files and recording files generated after background voice recognition and facial expression recognition for subsequent processing.
Character file
After a user speaks during using the seat system, the audio file is transmitted to the background to perform voice recognition, and then a recording file is generated. Such files may only be read, modified, downloaded, etc. by the user himself, an administrator.
Audio file
If the conference uses the recording function, the recorded audio files are saved in the storage module, and the files can be read and downloaded by all users.
Video and its analysis file
The camera shoots the facial image of the user in the whole conference process and the analysis file after facial expression recognition is stored together. Such files can be read only by the user himself and by upper management persons, others have no authority to read, and anyone cannot modify the analysis files.
The background control module is shown in fig. 5 and comprises a control module, a transmission module, an expression recognition module and a voice recognition module;
the background controls the on-off of the electronic seat system through the control module. After the whole terminal is started, the background also controls the camera and the microphone to work. The background can uniformly allocate the display content of the display screen and can also modify the display of a single seat card. The background realizes the speaking of the appointed object by controlling the microphone and can control the speaking time. And the background controls whether the sound recorder is used or not according to the attributes. Only an administrator can use the mobile phone APP to regulate and control the control module to complete the functions.
The transmission module receives identity information input by the mobile phone APP and transmits the identity information to the transmission module so as to be further transmitted to the display screen for display; receiving audio data collected by a microphone, transmitting the audio data to a voice recognition module, transmitting a text file generated after recognition to a storage module, and transmitting the audio file to the storage module if necessary; the method comprises the steps of receiving a user face video shot by a camera, transmitting the video to a facial expression recognition module, receiving an analysis result after recognition and analysis, and transmitting the video and the analysis result to a storage module together, as shown in fig. 6.
The speech recognition module first pre-processes the input speech as shown in fig. 7, where the pre-processing includes framing, windowing, pre-emphasis, etc. Secondly, feature extraction is carried out, so that the selection of proper feature parameters is particularly important. Commonly used characteristic parameters include: pitch period, formants, short-term average energy or amplitude, Linear Prediction Coefficients (LPC), perceptual weighted prediction coefficients (PLP), short-term average zero-crossing rate, Linear Prediction Cepstral Coefficients (LPCC), autocorrelation functions, mel-frequency cepstral coefficients (MFCC), wavelet transform coefficients, empirical mode decomposition coefficients (EMD), gamma-pass filter coefficients (GFCC), and the like. When actual recognition is carried out, a template is generated for the test voice according to a training process, and finally recognition is carried out according to a distortion judgment criterion.
As shown in fig. 8, after receiving the video from the transmission module, the facial expression recognition module performs preprocessing on the video image, performs face detection using an artificial neural network, performs face alignment according to a face detection face location point (landmark), and performs gray scale and geometric normalization on the image after data enhancement. And after the preprocessing, performing frame aggregation, extracting features, combining multiple frames, inputting the facial image into an expression recognition network, and outputting a classification result of a certain type of expressions.
A general architecture is used herein that integrates privilege information for facial expression recognition in a deep network. The algorithm of facial expression recognition of the method is different from other algorithms in that the input of a basic facial recognition model is added during model training, privilege information is added on the basis of an original facial image, and the privilege information is used as auxiliary output to supervise feature learning. Here, aoau (automatic intermediary Output of Action unit) is used as privilege information, as shown in fig. 9.
When the model is trained, a face image, a real AU label vector of the upper half face, a real AU label vector of the lower half face and a real emotion label are input. And then, extracting facial expression features of the whole face, carrying out AU (AU) recognition on the faces of the upper half part and the lower half part, extracting the features, cascading the three features by using a layer of network, and predicting to obtain the emotion.
The loss function of the AOAU includes facial expression classification loss, upper and lower half facial activity unit recognition loss. And updating parameters in the loss function through reverse transfer to obtain a training model.
And finally, predicting the emotion by using the trained AOAU network, wherein the network has higher emotion prediction accuracy than the accuracy of the latest technologies DeRL, ALFW and the like.
Example 2
A multifunctional intelligent electronic agent board system is shown in figure 10 and comprises a management end, a user terminal and a multifunctional intelligent electronic agent board device, wherein a user communicates with the agent board device through the user terminal and obtains an information file according to the authority of the user, and the management end is used for managing the agent board device and the user terminal. The common user logs in through the user terminal, and the administrator logs in through the management terminal. The method comprises the following specific steps:
the data acquisition module is used for acquiring relevant information of a user, wherein the relevant information comprises personal information, character information, sound information and video information;
the background control module is used for realizing control over the terminal, processing and transmission of data and communication control over different terminals;
the storage module is used for storing a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process;
the output module is used for outputting the identity information and the sound information of the user; specific communication is shown in fig. 11.
Example 3
An apparatus comprising a processor and a memory for storing a processor executable program, the processor, when executing the program stored in the memory, implementing expression recognition and speech recognition of a multi-functional intelligent electronic agent board.
Wherein, the expression recognition is as follows:
after receiving the video from the transmission module, the facial expression recognition module firstly preprocesses the video image, adopts an artificial neural network to carry out face detection, carries out face alignment according to a face positioning point (landmark) detected by the face, and carries out gray scale and geometric normalization on the image after data enhancement. And after the preprocessing, performing frame aggregation, extracting features, combining multiple frames, inputting the facial image into an expression recognition network, and outputting a classification result of a certain type of expressions.
A general architecture is used herein that integrates privilege information for facial expression recognition in a deep network. The algorithm of facial expression recognition of the method is different from other algorithms in that the input of a basic facial recognition model is added during model training, privilege information is added on the basis of learning an original facial image, and the privilege information is used as auxiliary output to supervise feature learning. This patent uses AOAU (automatic Intermediate Output of Action Unit) as privilege information.
When the model is trained, a face image, a real AU label vector of the upper half face, a real AU label vector of the lower half face and a real emotion label are input. And then, extracting facial expression features of the whole face, carrying out AU (AU) recognition on the faces of the upper half part and the lower half part, extracting the features, cascading the three features by using a layer of network, and predicting to obtain the emotion.
The loss function of the AOAU includes facial expression classification loss, upper and lower half facial activity unit recognition loss. And updating parameters in the loss function through reverse transfer to obtain a training model.
And finally, predicting the emotion by using the trained AOAU network, wherein the network has higher emotion prediction accuracy than the accuracy of the latest technologies DeRL, ALFW and the like.
The speech recognition is as follows:
input speech is first pre-processed, where the pre-processing includes framing, windowing, pre-emphasis, and so on. Secondly, feature extraction is carried out, so that the selection of proper feature parameters is particularly important. Commonly used characteristic parameters include: pitch period, formants, short-term average energy or amplitude, Linear Prediction Coefficients (LPC), perceptual weighted prediction coefficients (PLP), short-term average zero-crossing rate, Linear Prediction Cepstral Coefficients (LPCC), autocorrelation functions, mel-frequency cepstral coefficients (MFCC), wavelet transform coefficients, empirical mode decomposition coefficients (EMD), gamma-pass filter coefficients (GFCC), and the like. When actual recognition is carried out, a template is generated for the test voice according to a training process, and finally recognition is carried out according to a distortion judgment criterion.
Example 4:
a storage medium stores a program, and when the program is executed by a processor, the expression recognition and the voice recognition of a multifunctional intelligent electronic agent board are realized. The following were used: the expression recognition is used for recognizing facial expression information in the video information and obtaining expression analysis results; the voice recognition is to recognize voice information and convert the voice information into a character recording file;
it should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

Translated fromChinese
1.一种多功能智能电子坐席牌装置,其特征在于,包括:1. a multifunctional intelligent electronic seat card device, is characterized in that, comprises:数据采集模块,用于获取用户的相关信息,包括个人信息、文字信息、声音信息、视频信息;The data collection module is used to obtain the relevant information of the user, including personal information, text information, sound information, and video information;后台控制模块,用于实现对终端的控制、对数据的处理与传输和不同终端的通信控制;The background control module is used to realize the control of the terminal, the processing and transmission of data and the communication control of different terminals;存储模块,用于保存使用过程中产生的记录文件、音频文件、视频文件及表情分析结果文件;The storage module is used to save the record files, audio files, video files and expression analysis result files generated during use;输出模块,用于输出用户的身份信息和声音信息。The output module is used to output the user's identity information and voice information.2.根据权利要求1所述的一种多功能智能电子坐席牌装置,其特征在于,所述数据采集模块包括文字采集模块、声音采集模块、视频采集模块、坐席终端,文字采集模块获取用户终端上传的文字信息,所述文字信息包含身份信息、会议记录;声音采集模块用于获取用户的声音信息,视频采集模块用于获取用户的视频信息。2. The multifunctional intelligent electronic seat card device according to claim 1, wherein the data acquisition module comprises a text acquisition module, a sound acquisition module, a video acquisition module, and a seat terminal, and the text acquisition module obtains the user terminal Uploaded text information, the text information includes identity information and meeting records; the sound collection module is used to obtain the user's voice information, and the video collection module is used to obtain the user's video information.3.根据权利要求1所述的一种多功能智能电子坐席牌装置,其特征在于,所述后台控制模块包括控制模块、传输模块、表情识别模块、语音识别模块;所述表情识别模块用于识别视频信息中的人脸表情信息,并得到表情分析结果;所述语音识别模块用于识别声音信息,并将声音信息转换为文字记录文件;所述控制模块用于控制坐席牌系统以及对其他模块进行控制指令的发送;所述传输模块用于数据的传输。3. a kind of multifunctional intelligent electronic seat card device according to claim 1 is characterized in that, described background control module comprises control module, transmission module, facial expression recognition module, voice recognition module; Described facial expression recognition module is used for Identify the facial expression information in the video information, and obtain the expression analysis result; the voice recognition module is used to identify the sound information, and the sound information is converted into a text record file; the control module is used to control the seat card system and other The module transmits control instructions; the transmission module is used for data transmission.4.根据权利要求3所述的一种多功能智能电子坐席牌装置,其特征在于,所述表情识别模块接受来自传输模块的视频后,首先对视频图像进行预处理,采用人工神经网络进行人脸检测,根据人脸检测出人脸定位点进行人脸对齐,数据增强后,对图像进行灰度和几何归一化,预处理过后进行帧聚合,提取特征,联合多帧,将面部图像作为输入数据,最终输出某一类表情的分类结果;并使用在深度网络中集成特权信息以进行面部表情识别的通用体系结构,在训练模型时增加基本面部动作单元信息的输入,在原有面部图像的基础上增加了特权信息,作为辅助输出来监督特征学习,获取人脸表情信息,并得到表情分析结果。4. a kind of multifunctional intelligent electronic seat card device according to claim 3, is characterized in that, after described facial expression recognition module accepts the video from transmission module, first preprocesses video image, adopts artificial neural network to carry out human Face detection, face alignment is performed according to the face location points detected by the face, after data enhancement, the image is grayed and geometrically normalized, frame aggregation is performed after preprocessing, features are extracted, multiple frames are combined, and the face image is used as Input data, and finally output the classification results of a certain type of expression; and use a general architecture that integrates privileged information in a deep network for facial expression recognition, and increase the input of basic facial action unit information when training the model. On the basis, privilege information is added as auxiliary output to supervise feature learning, obtain facial expression information, and obtain expression analysis results.5.根据权利要求3所述的一种多功能智能电子坐席牌装置,其特征在于,所述语音识别模块,对输入语音进行预处理,所述预处理包括分帧,加窗,预加重;进而进行特征提取,在进行实际识别时,对测试语音按训练过程产生模板,最后根据失真判决准则进行识别。5. The multifunctional intelligent electronic seat card device according to claim 3, wherein the speech recognition module performs preprocessing on the input speech, and the preprocessing comprises framing, windowing, and pre-emphasis; Then, feature extraction is performed, and during actual recognition, a template is generated for the test speech according to the training process, and finally recognition is performed according to the distortion judgment criterion.6.根据权利要求1所述的一种多功能智能电子坐席牌装置,其特征在于,所述输出模块包括信息显示模块、扩音模块;所述信息显示模块用于显示用户信息,所述扩音模块用于对用户声音进行扩音。6 . The multifunctional intelligent electronic seat card device according to claim 1 , wherein the output module comprises an information display module and a loudspeaker module; the information display module is used to display user information, and the The sound module is used to amplify the user's voice.7.根据权利要求1所述的一种多功能智能电子坐席牌装置,其特征在于,还包括投票模块和服务按钮模块,所述投票模块用于会议中投票,所述服务按钮模块用于呼叫服务。7. The multifunctional intelligent electronic seat card device according to claim 1, further comprising a voting module and a service button module, wherein the voting module is used for voting in a meeting, and the service button module is used for calling Serve.8.一种多功能智能电子坐席牌系统,其特征在于,包括管理端、用户终端和权利要求1-7任一权利要求所述的一种多功能智能电子坐席牌装置,用户通过用户终端与坐席牌装置进行通信,并根据自身权限获取信息文件,管理端用于管理坐席牌装置和用户终端。8. A multifunctional intelligent electronic seat card system, characterized in that, comprising a management terminal, a user terminal and a multifunctional intelligent electronic seat card device described in any of claims 1-7, the user communicates with the user through the user terminal. The seat card device communicates and obtains information files according to its own authority, and the management terminal is used to manage the seat card device and the user terminal.9.一种设备,包括处理器以及用于存储处理器可执行程序的存储器,其特征在于,所述处理器执行存储器存储的程序时,实现权利要求1-7任一项所述的多功能智能电子坐席牌的表情识别和语音识别。9. A device comprising a processor and a memory for storing an executable program of the processor, characterized in that, when the processor executes the program stored in the memory, the multi-function described in any one of claims 1-7 is realized. Expression recognition and voice recognition of intelligent electronic seat cards.10.一种存储介质,存储有程序,其特征在于,程序被处理器执行时,实现权利要求1-7任一项所述的多功能智能电子坐席牌的表情识别和语音识别。10 . A storage medium storing a program, wherein when the program is executed by a processor, the expression recognition and voice recognition of the multifunctional intelligent electronic seat card according to any one of claims 1 to 7 are realized. 11 .
CN202110124665.3A2021-01-292021-01-29 A multifunctional intelligent electronic seat card device, system, equipment and storage mediumPendingCN112836620A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110124665.3ACN112836620A (en)2021-01-292021-01-29 A multifunctional intelligent electronic seat card device, system, equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110124665.3ACN112836620A (en)2021-01-292021-01-29 A multifunctional intelligent electronic seat card device, system, equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN112836620Atrue CN112836620A (en)2021-05-25

Family

ID=75931012

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110124665.3APendingCN112836620A (en)2021-01-292021-01-29 A multifunctional intelligent electronic seat card device, system, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN112836620A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114979549A (en)*2022-05-122022-08-30咪咕动漫有限公司Privacy protection method, system, equipment and storage medium for online conference
CN115831116A (en)*2022-11-172023-03-21上海蓁康电子有限公司 A Smart Voice Interactive System
CN120298559A (en)*2025-06-112025-07-11良胜数字创意设计(杭州)有限公司 A multi-modal driven virtual digital human facial animation generation method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105679192A (en)*2016-02-252016-06-15广东康城物业管理服务有限公司 An electronic nameplate and its setting system based on bluetooth technology
CN105679121A (en)*2016-03-162016-06-15华东师范大学Intelligent teaching system
CN106847116A (en)*2016-12-282017-06-13重庆金鑫科技产业发展有限公司A kind of Intelligent electronic table tablet and a kind of conference system
CN208316929U (en)*2018-03-072019-01-01科大讯飞股份有限公司It attends a banquet card, host equipment and card control system of attending a banquet
CN109413366A (en)*2018-12-242019-03-01杭州欣禾工程管理咨询有限公司A kind of with no paper wisdom video conferencing system based on condition managing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105679192A (en)*2016-02-252016-06-15广东康城物业管理服务有限公司 An electronic nameplate and its setting system based on bluetooth technology
CN105679121A (en)*2016-03-162016-06-15华东师范大学Intelligent teaching system
CN106847116A (en)*2016-12-282017-06-13重庆金鑫科技产业发展有限公司A kind of Intelligent electronic table tablet and a kind of conference system
CN208316929U (en)*2018-03-072019-01-01科大讯飞股份有限公司It attends a banquet card, host equipment and card control system of attending a banquet
CN109413366A (en)*2018-12-242019-03-01杭州欣禾工程管理咨询有限公司A kind of with no paper wisdom video conferencing system based on condition managing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114979549A (en)*2022-05-122022-08-30咪咕动漫有限公司Privacy protection method, system, equipment and storage medium for online conference
CN114979549B (en)*2022-05-122025-09-23咪咕动漫有限公司 Privacy protection method, system, device and storage medium for online conferences
CN115831116A (en)*2022-11-172023-03-21上海蓁康电子有限公司 A Smart Voice Interactive System
CN120298559A (en)*2025-06-112025-07-11良胜数字创意设计(杭州)有限公司 A multi-modal driven virtual digital human facial animation generation method and system

Similar Documents

PublicationPublication DateTitle
US10249304B2 (en)Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
CN112836620A (en) A multifunctional intelligent electronic seat card device, system, equipment and storage medium
Morrison et al.Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice
CN111818294A (en)Method, medium and electronic device for multi-person conference real-time display combined with audio and video
CN110767290A (en)Interactive emotion persuasion and psychological counseling robot
US20200349941A1 (en)Method and system for recording audio content in a group conversation
CN109560941A (en)Minutes method, apparatus, intelligent terminal and storage medium
JP2006190296A (en) Context extraction in multimedia communication system and information providing apparatus and method using the same
CN107918771A (en)Character recognition method and Worn type person recognition system
CN109040723A (en)A kind of control method of conference scenario
CN117135305B (en)Teleconference implementation method, device and system
CN109829691B (en)C/S card punching method and device based on position and deep learning multiple biological features
CN113299309A (en)Voice translation method and device, computer readable medium and electronic equipment
CN113779234A (en) Method, device, device and medium for generating speech minutes of conference speakers
FuruiSpeech recognition technology in the ubiquitous/wearable computing environment
CN208316929U (en)It attends a banquet card, host equipment and card control system of attending a banquet
CN111667837A (en)Conference record acquisition method, intelligent terminal and device with storage function
KR102463243B1 (en)Tinnitus counseling system based on user voice analysis
CN116610717A (en) Data processing method, device, electronic device, and storage medium
KR20200123054A (en)Voice recognition device
JP6823367B2 (en) Image display system, image display method, and image display program
CN106899625A (en)A kind of method and device according to user mood state adjusting device environment configuration information
JP7000547B1 (en) Programs, methods, information processing equipment, systems
CN218585398U (en)Intelligent voice interaction system for comprehensive energy filling station
CN216749300U (en)Voiceprint acquisition system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20210525


[8]ページ先頭

©2009-2025 Movatter.jp