Disclosure of Invention
The invention aims to provide a personalized language model construction method of an intelligent voice assistant, which aims to overcome the defects in the prior art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
The application discloses a personalized language model construction method of an intelligent voice assistant, which comprises the following steps:
S1, collecting data according to daily use of a user;
S2, preprocessing the collected data, screening and removing interference content;
S3, extracting characteristics of the data, and dividing the data according to the characteristics;
s4, training a personalized language model aiming at the characteristic data by utilizing a deep learning algorithm, and dynamically adjusting according to the use condition of a user;
S5, verifying the accuracy and effect of the model;
And S6, deploying the trained model into an intelligent voice assistant to realize real-time interaction.
Preferably, the S1 comprises the following steps of collecting voice data, behavior data and feedback data of a user through daily interaction with the user, and collecting image data matched with the voice data, the behavior data and the feedback data.
Preferably, the step S2 includes the following sub-steps:
S21, cleaning the data collected in the S1 to remove interference content;
S22, identifying the data according to the acquired information and the content of the data, and marking the data so as to facilitate subsequent processing;
And S23, carrying out normalization processing on the data, facilitating subsequent data extraction and processing, and ensuring the data quality.
Preferably, the S3 comprises the following steps of extracting the characteristics of the data according to the content of the data and combining the preprocessed information, wherein the characteristics are extracted as voice characteristics, emotion characteristics, behavior characteristics and image characteristics.
Preferably, the step S5 comprises the following steps of verifying the accuracy of the model through cross verification, collecting real-time feedback of a user, and evaluating the effect of the model according to the evaluation of the user.
Preferably, the image acquisition comprises the following:
A1, acquiring image data in the use process of a user through a camera;
a2, cleaning, labeling and normalizing the acquired image data;
A3, extracting information features in the image, including portrait features and object features;
A4, identifying the content in the image, judging the identity information of the speaker, and marking in a combined way;
a5, storing the processed image data into a database.
Preferably, the intelligent content recommendation system further comprises a content recommendation system, wherein the content recommendation system judges the interests and the use stages of the user according to the collected data and the content actively input by the user, and performs intelligent content recommendation and pushing.
Preferably, the interactive mode recommendation comprises voice dialogue, story telling, game interaction and the like, so that the interestingness and attractiveness of the interaction are improved.
Preferably, the intelligent voice assistant comprises a data acquisition mechanism for acquiring data for model training, a loudspeaker for playing voice music, a memory for temporarily storing data, a storage device for storing data for a long term and a processor for data processing and model operation, wherein the data acquisition mechanism comprises a microphone for acquiring voice data and a camera for acquiring image data.
The application also discloses a personalized language model construction device of the intelligent voice assistant, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the personalized language model construction method of the intelligent voice assistant when executing the executable codes.
The application also discloses a computer readable storage medium, which stores a program, and when the program is executed by a processor, the personalized language model construction method of the intelligent voice assistant is realized.
The invention has the beneficial effects that:
(1) According to the embodiment of the invention, through the personalized language model, the intelligent voice assistant can better understand and respond to the needs of children and provide more personalized services;
(2) According to the embodiment of the invention, the content recommendation system can recommend proper learning content and entertainment according to the interests and growth stages of children, so that the life of the children is enriched;
(3) The embodiment of the invention adopts various interaction modes to design, thereby increasing the interest and attraction of interaction and stimulating the learning interest of children;
(4) The model in the embodiment of the invention can be dynamically adjusted according to the growth and development of children, and continuously provides high-quality service;
(5) According to the embodiment of the invention, through the collection and processing of the image data, the growth process of children can be comprehensively recorded, and more rich growth records are provided.
The features and advantages of the present invention will be described in detail by way of example with reference to the accompanying drawings.
Detailed Description
The present invention will be further described in detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
Referring to fig. 1, an embodiment of the present invention provides a method for constructing a personalized language model of an intelligent voice assistant, and a method for constructing a personalized language model of an intelligent voice assistant.
The personalized language model construction method comprises the following steps:
S1, collecting data according to daily use of a user;
S2, preprocessing the collected data, screening and removing interference content;
S3, extracting characteristics of the data, and dividing the data according to the characteristics;
s4, training a personalized language model aiming at the characteristic data by utilizing a deep learning algorithm, and dynamically adjusting according to the use condition of a user;
S5, verifying the accuracy and effect of the model;
And S6, deploying the trained model into an intelligent voice assistant to realize real-time interaction.
The S1 comprises the following steps of collecting voice data, behavior data and feedback data of a user through daily interaction with the user, and collecting image data matched with the voice data, the behavior data and the feedback data.
The step S2 comprises the following substeps:
S21, cleaning the data collected in the S1 to remove interference content;
S22, identifying the data according to the acquired information and the content of the data, and marking the data so as to facilitate subsequent processing;
And S23, carrying out normalization processing on the data, facilitating subsequent data extraction and processing, and ensuring the data quality.
And S3, carrying out feature extraction on the data according to the content of the data and combining the preprocessed information, wherein the feature extraction comprises the steps of extracting the data into voice features, emotion features, behavior features and image features.
The S5 comprises the following steps of verifying the accuracy of the model through cross verification, collecting real-time feedback of a user, and evaluating the effect of the model according to the evaluation of the user.
The image acquisition comprises the following contents:
A1, acquiring image data in the use process of a user through a camera;
a2, cleaning, labeling and normalizing the acquired image data;
A3, extracting information features in the image, including portrait features and object features;
A4, identifying the content in the image, judging the identity information of the speaker, and marking in a combined way;
a5, storing the processed image data into a database.
The content recommendation system is used for judging the interests and the use stages of the user according to the collected data and the content actively input by the user, and performing intelligent content recommendation and pushing.
The interactive mode recommendation comprises voice dialogue, story telling, game interaction and the like, so that the interestingness and attractiveness of the interaction are improved.
The intelligent voice assistant comprises a data acquisition mechanism for acquiring data for model training, a loudspeaker for playing voice music, a memory for temporarily storing data, a storage device for storing data for a long time and a processor for data processing and model operation, wherein the data acquisition mechanism comprises a microphone for acquiring voice data and a camera for acquiring image data.
Examples:
In the embodiment, the intelligent voice assistant is used for children to accompany growth and carries out intelligent treatment on the growth and life of the children;
the intelligent voice assistant hardware includes the following:
And the microphone is used for collecting voice data.
And the loudspeaker is used for playing voice and music.
And the processor is used for data processing and model operation.
And the memory is used for temporarily storing data.
Storage device for long-term storage of data.
And the camera is used for collecting image data.
The personalized language model construction method comprises the following steps:
And collecting data, namely collecting voice data, behavior data and feedback data through daily interaction with children, and collecting image data through a camera.
The data collection includes the following:
1. When a child interacts with an assistant, voice data are collected through a microphone and uploaded to the cloud in real time;
2. The camera captures a scene image every a fixed amount of time according to the preset time, stores the scene image in a local database after encryption, and simultaneously, when a child interacts with an assistant, the camera acquires the image in real time so as to synchronize with interaction data.
And (3) data preprocessing, namely cleaning, labeling and normalizing the collected voice and image data to ensure the data quality.
And extracting the characteristics of voice, emotion, behavior and images for subsequent modeling.
Model training, namely training a personalized language model by using a deep learning algorithm (such as LSTM, transformer and the like), wherein the model can be dynamically adjusted according to the age, the interest and the growth stage of children.
Model evaluation, namely evaluating the accuracy and effect of the model through cross verification and user feedback.
And (3) model deployment, namely deploying the trained model into the intelligent voice assistant to realize real-time interaction.
Content recommendation system, which recommends proper learning content and entertainment according to the interest and growth stage of children.
The interactive mode design is to design various interactive modes including voice dialogue, story telling, game interaction and the like, so that the interactive interestingness and attraction are increased.
An image data processing module:
And (3) image acquisition, namely acquiring image data of the children in daily life through a camera.
And (3) image preprocessing, namely cleaning, labeling and normalizing the acquired image data.
And extracting the image characteristics, namely extracting key characteristics in the image, such as faces, objects and the like.
And (3) image recognition, namely recognizing the content in the image by utilizing an image recognition algorithm, and assisting in judging the identity of the speaker.
And (3) image recording, namely storing the processed image data into a database for recording the growth process of the children.
In one possible embodiment, different data collection methods, such as parental assist collection, third party data platform, etc., are used, and the data collection is followed by comprehensive calculation and integration.
In one possible embodiment, the feature extraction algorithm employed includes MFCC, CNN, and the like.
In one possible embodiment, the deep learning model used includes GRU, BERT, and the like.
In one possible embodiment, the content recommendation algorithm employed includes collaborative filtering, deep learning recommendation systems, and the like.
In one possible embodiment, different interaction modes are designed, such as virtual role playing, interactive games, etc.
In one possible embodiment, the image recognition algorithm used includes Convolutional Neural Network (CNN), deep-learning image recognition model, and the like.
The embodiment of the personalized language model construction device of the intelligent voice assistant can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 2, a hardware structure diagram of an apparatus with optional data processing capability where a personalized language model construction device of an intelligent voice assistant of the present invention is located is shown in fig. 2, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 2, the apparatus with optional data processing capability where an embodiment is located generally includes other hardware according to an actual function of the apparatus with optional data processing capability, which is not described herein again. The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements a personalized language model construction device of an intelligent voice assistant in the above embodiment.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device of any device having data processing capabilities, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, or alternatives falling within the spirit and principles of the invention.