Multifunctional intelligent electronic seat board device, system, equipment and storage mediumTechnical Field
The invention relates to the field of seat board research, in particular to a multifunctional intelligent electronic seat board device, a system, equipment and a storage medium.
Background
The existing seat system is often used in the scenes of meetings, banks and the like. The traditional conference seat cards are printed in advance on the paper with the fixed size, and then are inserted into the plastic display card, so that the seat cards are useless after being used up, and are neither environment-friendly nor convenient. To address this problem, many types of electronic seating systems have been developed. Although the existing electronic seat system solves the problem of paper waste and adds functions of electronic display, sound amplification and the like, many problems still need to be improved, such as how to more conveniently acquire electronic display screen information, how to more conveniently realize background functions and the like, and a great development space is provided.
In the bank seat system, the electronic seat system facilitates interaction of workers with users, and can also judge whether the workers are in the seats or not or carry out identity verification on face recognition of the users by shooting videos through the camera.
In reality, the electronic seat system used by enterprises or other organizations for holding the conference can uniformly control and display the basic information of the participants through the background.
The existing electronic seat also basically has the function of audio and video acquisition. And collecting audio of a speaker and audio and video data of participants through equipment such as a built-in microphone and a micro camera of the seat system. The microphone function can realize sound amplification by using an audio amplification circuit built in the microphone or an external multi-channel audio power amplifier.
The existing electronic seat system realizes communication, data transmission and the like between a terminal and a background in a wireless communication mode. Wireless communication technology is a communication method for exchanging information by using the characteristic that electromagnetic wave signals can propagate in free space. The existing electronic seat system realizes the function of data transmission between the background and the terminal in a wireless communication mode, wherein the function comprises technologies such as Wi-Fi, WLAN, EDGE and the like. And after the audio, video and other data of each terminal are converted, the converted data are transmitted to a background bus in a wireless communication mode to be stored or processed in other modes. Some electronic seat systems may have voting or service buttons on the base, and the response of the button events also uses the principle of wireless communication.
Currently, an electronic seat system rarely has a lens device, and has a shooting function, which is to judge the presence or absence state of a participant by shooting a video or to authenticate a user by face recognition.
Objective disadvantages of the prior art:
the functions of the existing electronic seat card system are limited, although the display screens can be used for uniformly inputting and modifying the display contents by the background, the following conditions are caused: the position is fixed in advance, and the user cannot decide the seat by himself; when the information such as the name and the like is wrong, the user cannot input or modify the information by himself.
Existing audio modules only have a public address function. Besides the functions, the audio module is also provided with a recording function, and an administrator determines whether to disclose the audio module to determine whether to use the recording function. If the seat system is used in a public place, the recording function can be used when the audio is required to be reused after use; recording functionality may not be available if the use of an agent system would involve a scenario where personal privacy, business or other fields are confidential.
The existing seat system needs to realize the function of interaction between part of the seat system and the background through keys on the base, such as voting keys, service call tea keys and the like, and the keys occupy large space and influence the appearance.
The existing electronic seat system carries out video shooting on a user, and is only used for judging whether the user is in a presence state or a absence state or in an exhausted state, analyzing a sitting posture, and further analyzing the specific emotional state of the user by utilizing the shot video.
The existing voice recognition algorithm is not applied to an electronic seat system of a meeting, psychological consultation and intelligent classroom scene; the existing facial expression recognition algorithm does not combine the muscle activity unit of the face to generate a training model; the existing electronic seat system does not support file downloading, files can only be transmitted to a background storage module for management, but a terminal user cannot directly download the files at the terminal.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art, and provides a multifunctional intelligent electronic seat plate device, a system, equipment and a storage medium, which can be used for meeting, psychological consultation, intelligent classroom, and intelligent electronic seat plates integrating functions of schedule recognition, voice recognition, sound amplification, conference file real-time downloading and the like, can acquire user information through a mobile phone APP, and can modify the display content of a seat system in time when personnel change; in addition, the invention not only can be used as a seat system to display user information, but also can replace equipment such as a microphone, a notebook computer and the like, the microphone arranged in the seat system can amplify the sound, the voice recognition can convert the voice into text to record the conference content, and the function of downloading the conference record is provided; meanwhile, the facial expression recognition function is added, the emotion states of the participants of the users can be monitored, and accidents are prevented.
The invention aims to provide a multifunctional intelligent electronic seat sign device.
The invention also provides a multifunctional intelligent electronic seat sign system.
A third object of the invention is to provide an apparatus.
A fourth object of the present invention is to provide a storage medium.
The first purpose of the invention is realized by the following technical scheme:
the utility model provides a multi-functional intelligent electron agent tablet device which characterized in that includes:
the data acquisition module is used for acquiring relevant information of a user, wherein the relevant information comprises personal information, character information, sound information and video information;
the background control module is used for realizing control over the terminal, processing and transmission of data and communication control over different terminals;
the storage module is used for storing a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process;
and the output module is used for outputting the identity information and the sound information of the user.
Furthermore, the data acquisition module comprises a character acquisition module, a sound acquisition module, a video acquisition module and an agent terminal, wherein the character acquisition module acquires character information uploaded by the user terminal, and the character information comprises identity information and meeting records; the voice acquisition module is used for acquiring voice information of a user, and the video acquisition module is used for acquiring video information of the user.
Furthermore, the background control module comprises a control module, a transmission module, an expression recognition module and a voice recognition module; the expression recognition module is used for recognizing facial expression information in the video information and obtaining expression analysis results; the voice recognition module is used for recognizing voice information and converting the voice information into a character recording file; the control module is used for controlling the seat card system and sending control instructions to other modules; the transmission module is used for transmitting data.
Furthermore, after the expression recognition module receives the video from the transmission module, firstly, preprocessing a video image, performing face detection by adopting an artificial neural network, performing face alignment according to a face locating point detected by the face, performing gray scale and geometric normalization on the image after data enhancement, performing frame aggregation after preprocessing, extracting features, combining multiple frames, taking the face image as input data, and outputting a classification result of a certain type of expression after recognition; and a general system structure integrating privilege information in a deep network to recognize facial expressions is used, input of basic facial action unit information is added during model training, the privilege information is added on the basis of an original facial image and is used as auxiliary output to supervise feature learning, facial expression information is obtained, and expression analysis results are obtained.
Further, the voice recognition module is used for preprocessing input voice, wherein the preprocessing comprises framing, windowing and pre-emphasis; and then, feature extraction is carried out, when actual recognition is carried out, a template is generated for the test voice according to the training process, and finally recognition is carried out according to a distortion judgment criterion.
Further, the output module comprises an information display module and a sound amplifying module; the information display module is used for displaying user information, and the sound amplification module is used for amplifying sound of a user.
The system further comprises a voting module and a service button module, wherein the voting module is used for voting in the conference, and the service button module is used for calling the service.
The second purpose of the invention is realized by the following technical scheme:
a multifunctional intelligent electronic seat board system comprises a management end, a user terminal and a multifunctional intelligent electronic seat board device, wherein a user communicates with the seat board device through the user terminal and obtains an information file according to the authority of the user terminal, and the management end is used for managing the seat board device and the user terminal.
The third purpose of the invention is realized by the following technical scheme:
an apparatus comprising a processor and a memory for storing a processor executable program, the processor, when executing the program stored in the memory, implementing expression recognition and speech recognition of a multi-functional intelligent electronic agent board.
The fourth purpose of the invention is realized by the following technical scheme:
a storage medium stores a program, and when the program is executed by a processor, the expression recognition and the voice recognition of a multifunctional intelligent electronic agent board are realized.
The working process of the invention is as follows:
1. this patent equipment places in user's dead ahead. Before the device is used, a manager opens all devices through a background, sets the property, namely public or private, and controls whether to use the recording function.
2. Before using the seat system, the background enables the display screen to display the user name through unified control, or the user inputs identity information through a mobile phone APP by self, and after the background receives the transmitted data, the information is transmitted to the display screen to be displayed.
3. The background controls the camera to be switched on and off, when the camera is turned on, the camera continuously collects the video image data of the face of the user in the using process of the seat system, the emotion of the user is monitored through an expression recognition algorithm in the background, and the video and emotion analysis results are transmitted to the storage module.
4. The background controls the microphone to make a specified speaker speak within a specified time, or the user controls the microphone switch, and presses the speaking button to turn on the microphone switch to amplify the sound when the speaker needs to speak. If the audio can be disclosed, recording the speaking content, and storing the audio data in the storage module.
5. And the background performs voice recognition corresponding to the speeches of different users through the terminal numbers, converts the speeches into characters, records the characters and stores the records in the storage module.
6. After the meeting is finished, the ordinary user downloads the recording, the expression recognition video and the result of the ordinary user and the overall meeting record data through the mobile phone APP, and the administrator can check all files generated in the using process of the seat system in the background.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides an intelligent electronic seat board which can be used for meeting, psychological consultation and intelligent classroom and integrates the functions of expression recognition, voice recognition, sound amplification, real-time downloading of meeting files and the like. The multifunctional desk integrates multiple functions and can meet the requirements of various occasions. According to the invention, the user information can be obtained through the mobile phone APP, and when personnel change, the display content of the seat system can be modified in time; in addition, the invention not only can be used as a seat system to display user information, but also can replace equipment such as a microphone, a notebook computer and the like, the microphone arranged in the seat system can amplify the sound, the voice recognition can convert the voice into text to record the conference content, and the function of downloading the conference record is provided; meanwhile, the facial expression recognition function is added, the emotion states of the participants of the users can be monitored, and accidents are prevented.
2. The invention can shoot the facial image of the human body through the camera, recognize the facial expression and analyze the emotional state of the user. The real-time online evaluation and test method is suitable for meetings, interviews, psychological consultations, intelligent classes and the like, can analyze the mental state of the user, and can also take preventive measures in case of emergency.
3. The invention uses a general system structure which integrates privilege information in a deep network to recognize facial expressions, and a face recognition algorithm takes a facial muscle activity unit as privilege information to be learned and trained. The method improves the algorithm, integrates privilege information to recognize the facial expression, improves the accuracy of expression recognition, more accurately acquires the emotion change of the user, and reduces inconvenience caused by incorrect emotion analysis.
4. The invention uses the voice recognition function to convert the user speaking content into a text file and store the text file. The background records the speeches of different users according to the speech sequence of the terminal part numbers, the speeches are arranged into records in the using process of the seat system, the recorded contents are automatically generated into text files and then stored in the storage module, and the users can download the text files on the mobile phone APP.
5. The invention utilizes the mobile phone APP to obtain the information of the current user, and the administrator can directly control the system through the APP. The user can select the seat independently, and the user only needs to input the name of the user on the mobile phone, and the content is transmitted to the display screen, so that the heavy preparation before the meeting and the classroom are started, and unnecessary waste is reduced. The administrator can directly control various parameter configurations of the seat system through the use of the mobile terminal APP.
6. The data processing and corresponding algorithm modules are executable program codes which are described and read out and executed by a background. The data processing of complex hardware equipment is not needed, the occupied space of products is saved, the desktop is simple, and the use experience of users is improved.
7. The users of the invention are divided into administrators and ordinary users, and have different authorities for the use of products. The unified switch of the product, the display of the display, the setting of the properties, the use of the microphone camera and the like are controlled by an administrator, so that the management of the use of the product is facilitated. Meanwhile, different users of various text, audio and video files generated in the using process can carry out different operations, so that the file management is convenient, the privacy of the users is respected, and the operation is humanized.
8. This patent will vote and service button design in cell-phone APP. The design that the existing electronic seat board keys are arranged on the base is changed, and the space of the seat board is saved.
9. The whole equipment supports wireless communication, can transmit data to the storage module in a wireless mode, and is convenient to achieve data storage, processing and data analysis operation. The data is stored quickly and cannot be lost, and the safety and reliability of data storage are enhanced. Can subsequently modify and process the data
10. The user can download the required file on the terminal APP. And after the conference is finished, the file can be obtained without downloading and checking in a background. The mobile phone can be downloaded to the mobile phone to be more convenient to check.
Drawings
FIG. 1 is a block diagram of a multifunctional intelligent electronic agent sign device according to the present invention;
fig. 2 is a front view of the seat card apparatus according toembodiment 1 of the present invention;
fig. 3 is a rear view of the seat card device according toembodiment 1 of the present invention;
FIG. 4 is a block diagram of data acquisition inembodiment 1 of the present invention;
FIG. 5 is a diagram of a background control module in theembodiment 1 of the present invention;
fig. 6 is a transmission diagram of a transmission module inembodiment 1 of the present invention;
FIG. 7 is a flow chart of the recognition of the speech recognition module in theembodiment 1 of the present invention;
fig. 8 is a flow chart of the expression recognition module recognition inembodiment 1 of the present invention;
FIG. 9 is a diagram of the implementation steps of the AOAU algorithm in theembodiment 1 of the present invention;
fig. 10 is a block diagram of a moderator card system according toembodiment 2 of the present invention;
fig. 11 is an overall functional structure diagram of the seat card system inembodiment 2 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1:
a multi-functional intelligent electronic agent sign device, as shown in fig. 1, comprising:
the data acquisition module is used for acquiring relevant information of a user, wherein the relevant information comprises personal information, character information, sound information and video information;
the background control module is used for realizing control over the terminal, processing and transmission of data and communication control over different terminals;
the storage module is used for storing a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process;
the output module is used for outputting the identity information and the sound information of the user; the front of the seat card device is shown in fig. 2, and the back of the seat card device is shown in fig. 3.
The data acquisition module is shown in fig. 4 and comprises a character acquisition module, a sound acquisition module, a video acquisition module and an agent terminal, wherein the character acquisition module acquires characters uploaded by a user terminal, the sound acquisition module is used for acquiring sound information of a user, the video acquisition module is used for acquiring video information of the user, namely the data acquisition module is used for acquiring data information of the user, such as identity, mental state, sound and the like, and a background can analyze and process the data conveniently. The method specifically comprises the following steps: the device of the data acquisition module consists of a camera, a seat terminal, a user mobile phone and a hidden micro microphone. And a microphone of the sound acquisition module is arranged in the seat system terminal. The background can control the module to control the speaking sequence, time and the like of the user, the user also presses the speaking button to turn on a microphone switch to amplify the sound, and the button is restored after the completion. The background control module can also decide whether to take the recording or not according to the use occasion. A user inputs information such as own name and the like on a mobile phone APP to obtain character information data; the method comprises the steps that a miniature microphone arranged in the terminal equipment is started to collect audio data of a user after a speech button is pressed; and acquiring a facial video image through the miniature camera. The miniature microphone is arranged in the terminal, and the camera is positioned at the upper left corner of the back of the terminal.
The background control module realizes the control of the terminal, the processing and transmission of various data and the information exchange and transmission with different terminals.
The storage module stores a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process of the patent.
The output module is composed of a microphone and a display screen. The microphone is internally provided with an audio amplification circuit for amplifying the voice of a speaker, and the microphone can realize the sound amplification function under the environment without sound amplification equipment. The display screen is used for displaying after receiving the user identity information input by the APP. The electronic display screen can be uniformly allocated by the background, receives and displays data transmitted by the background, and can be identity information of a participant directly transmitted by the background or identity information such as a user name and the like input through a mobile phone APP. The electronic display screen is preferably an electronic ink screen.
A storage module:
the storage module receives various files, character files and recording files generated after background voice recognition and facial expression recognition for subsequent processing.
Character file
After a user speaks during using the seat system, the audio file is transmitted to the background to perform voice recognition, and then a recording file is generated. Such files may only be read, modified, downloaded, etc. by the user himself, an administrator.
Audio file
If the conference uses the recording function, the recorded audio files are saved in the storage module, and the files can be read and downloaded by all users.
Video and its analysis file
The camera shoots the facial image of the user in the whole conference process and the analysis file after facial expression recognition is stored together. Such files can be read only by the user himself and by upper management persons, others have no authority to read, and anyone cannot modify the analysis files.
The background control module is shown in fig. 5 and comprises a control module, a transmission module, an expression recognition module and a voice recognition module;
the background controls the on-off of the electronic seat system through the control module. After the whole terminal is started, the background also controls the camera and the microphone to work. The background can uniformly allocate the display content of the display screen and can also modify the display of a single seat card. The background realizes the speaking of the appointed object by controlling the microphone and can control the speaking time. And the background controls whether the sound recorder is used or not according to the attributes. Only an administrator can use the mobile phone APP to regulate and control the control module to complete the functions.
The transmission module receives identity information input by the mobile phone APP and transmits the identity information to the transmission module so as to be further transmitted to the display screen for display; receiving audio data collected by a microphone, transmitting the audio data to a voice recognition module, transmitting a text file generated after recognition to a storage module, and transmitting the audio file to the storage module if necessary; the method comprises the steps of receiving a user face video shot by a camera, transmitting the video to a facial expression recognition module, receiving an analysis result after recognition and analysis, and transmitting the video and the analysis result to a storage module together, as shown in fig. 6.
The speech recognition module first pre-processes the input speech as shown in fig. 7, where the pre-processing includes framing, windowing, pre-emphasis, etc. Secondly, feature extraction is carried out, so that the selection of proper feature parameters is particularly important. Commonly used characteristic parameters include: pitch period, formants, short-term average energy or amplitude, Linear Prediction Coefficients (LPC), perceptual weighted prediction coefficients (PLP), short-term average zero-crossing rate, Linear Prediction Cepstral Coefficients (LPCC), autocorrelation functions, mel-frequency cepstral coefficients (MFCC), wavelet transform coefficients, empirical mode decomposition coefficients (EMD), gamma-pass filter coefficients (GFCC), and the like. When actual recognition is carried out, a template is generated for the test voice according to a training process, and finally recognition is carried out according to a distortion judgment criterion.
As shown in fig. 8, after receiving the video from the transmission module, the facial expression recognition module performs preprocessing on the video image, performs face detection using an artificial neural network, performs face alignment according to a face detection face location point (landmark), and performs gray scale and geometric normalization on the image after data enhancement. And after the preprocessing, performing frame aggregation, extracting features, combining multiple frames, inputting the facial image into an expression recognition network, and outputting a classification result of a certain type of expressions.
A general architecture is used herein that integrates privilege information for facial expression recognition in a deep network. The algorithm of facial expression recognition of the method is different from other algorithms in that the input of a basic facial recognition model is added during model training, privilege information is added on the basis of an original facial image, and the privilege information is used as auxiliary output to supervise feature learning. Here, aoau (automatic intermediary Output of Action unit) is used as privilege information, as shown in fig. 9.
When the model is trained, a face image, a real AU label vector of the upper half face, a real AU label vector of the lower half face and a real emotion label are input. And then, extracting facial expression features of the whole face, carrying out AU (AU) recognition on the faces of the upper half part and the lower half part, extracting the features, cascading the three features by using a layer of network, and predicting to obtain the emotion.
The loss function of the AOAU includes facial expression classification loss, upper and lower half facial activity unit recognition loss. And updating parameters in the loss function through reverse transfer to obtain a training model.
And finally, predicting the emotion by using the trained AOAU network, wherein the network has higher emotion prediction accuracy than the accuracy of the latest technologies DeRL, ALFW and the like.
Example 2
A multifunctional intelligent electronic agent board system is shown in figure 10 and comprises a management end, a user terminal and a multifunctional intelligent electronic agent board device, wherein a user communicates with the agent board device through the user terminal and obtains an information file according to the authority of the user, and the management end is used for managing the agent board device and the user terminal. The common user logs in through the user terminal, and the administrator logs in through the management terminal. The method comprises the following specific steps:
the data acquisition module is used for acquiring relevant information of a user, wherein the relevant information comprises personal information, character information, sound information and video information;
the background control module is used for realizing control over the terminal, processing and transmission of data and communication control over different terminals;
the storage module is used for storing a recording file, an audio file, a video file and an expression analysis result file which are generated in the using process;
the output module is used for outputting the identity information and the sound information of the user; specific communication is shown in fig. 11.
Example 3
An apparatus comprising a processor and a memory for storing a processor executable program, the processor, when executing the program stored in the memory, implementing expression recognition and speech recognition of a multi-functional intelligent electronic agent board.
Wherein, the expression recognition is as follows:
after receiving the video from the transmission module, the facial expression recognition module firstly preprocesses the video image, adopts an artificial neural network to carry out face detection, carries out face alignment according to a face positioning point (landmark) detected by the face, and carries out gray scale and geometric normalization on the image after data enhancement. And after the preprocessing, performing frame aggregation, extracting features, combining multiple frames, inputting the facial image into an expression recognition network, and outputting a classification result of a certain type of expressions.
A general architecture is used herein that integrates privilege information for facial expression recognition in a deep network. The algorithm of facial expression recognition of the method is different from other algorithms in that the input of a basic facial recognition model is added during model training, privilege information is added on the basis of learning an original facial image, and the privilege information is used as auxiliary output to supervise feature learning. This patent uses AOAU (automatic Intermediate Output of Action Unit) as privilege information.
When the model is trained, a face image, a real AU label vector of the upper half face, a real AU label vector of the lower half face and a real emotion label are input. And then, extracting facial expression features of the whole face, carrying out AU (AU) recognition on the faces of the upper half part and the lower half part, extracting the features, cascading the three features by using a layer of network, and predicting to obtain the emotion.
The loss function of the AOAU includes facial expression classification loss, upper and lower half facial activity unit recognition loss. And updating parameters in the loss function through reverse transfer to obtain a training model.
And finally, predicting the emotion by using the trained AOAU network, wherein the network has higher emotion prediction accuracy than the accuracy of the latest technologies DeRL, ALFW and the like.
The speech recognition is as follows:
input speech is first pre-processed, where the pre-processing includes framing, windowing, pre-emphasis, and so on. Secondly, feature extraction is carried out, so that the selection of proper feature parameters is particularly important. Commonly used characteristic parameters include: pitch period, formants, short-term average energy or amplitude, Linear Prediction Coefficients (LPC), perceptual weighted prediction coefficients (PLP), short-term average zero-crossing rate, Linear Prediction Cepstral Coefficients (LPCC), autocorrelation functions, mel-frequency cepstral coefficients (MFCC), wavelet transform coefficients, empirical mode decomposition coefficients (EMD), gamma-pass filter coefficients (GFCC), and the like. When actual recognition is carried out, a template is generated for the test voice according to a training process, and finally recognition is carried out according to a distortion judgment criterion.
Example 4:
a storage medium stores a program, and when the program is executed by a processor, the expression recognition and the voice recognition of a multifunctional intelligent electronic agent board are realized. The following were used: the expression recognition is used for recognizing facial expression information in the video information and obtaining expression analysis results; the voice recognition is to recognize voice information and convert the voice information into a character recording file;
it should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.