CN120220660A

Movatterモバイル変換

Info

Publication number: CN120220660A
Application number: CN202510647340.1A
Authority: CN
Inventors: 陈思玫; 吴柳毅; 秦子伟; 章亮; 黄涛
Original assignee: Yuantu Artificial Intelligence Hangzhou Co ltd
Current assignee: Yuantu Artificial Intelligence Hangzhou Co ltd
Priority date: 2025-05-20
Filing date: 2025-05-20
Publication date: 2025-06-27

Abstract

The invention discloses a personalized language model construction method of an intelligent voice assistant, which comprises the following steps of S1, preprocessing collected data according to daily use of a user, screening and removing interference content, S3, extracting characteristics of the data, dividing the data according to the characteristics, S4, training a personalized language model according to the characteristic data by using a deep learning algorithm, dynamically adjusting the personalized language model according to the use condition of the user, S5, verifying the accuracy and effect of the model, S6, deploying the trained model into the intelligent voice assistant to realize real-time interaction, and the intelligent voice assistant can better understand and respond to the demands of children through the personalized language model to provide more personalized services.

Description

Personalized language model construction method of intelligent voice assistant

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a personalized language model construction method of an intelligent voice assistant.

Background

With the rapid development of artificial intelligence technology, intelligent voice assistants are increasingly used in households. Particularly in the aspects of child education and growth accompaniment, the intelligent voice assistant can provide rich interactive experience to help children learn new knowledge and skills. However, the existing intelligent voice assistant generally adopts a general language model, lacks personalized design aiming at the characteristics of children, and cannot fully meet the needs of the children.

The existing intelligent voice assistant mainly depends on a voice recognition technology, lacks the processing capability of image data, and cannot comprehensively record the growth process of children;

For example, chinese patent CN110134623a, "a child education device based on voice recognition" proposes providing education content through voice interaction, but only relies on voice data, and lacks comprehensive analysis on multidimensional information such as child behavior and expression, so that the interaction content is single, and the child interest change cannot be dynamically adapted;

Chinese patent CN112487123a, a child intelligent dialogue system, adopts a pre-training language model to generate dialogue, but the model parameters are fixed, and cannot be optimized in real time according to child feedback, so that the interaction repeatability is high after long-term use, and long-term attractiveness is difficult to maintain.

Meanwhile, the intelligent voice assistant in the market at present mostly adopts a universal language model and a fixed content library, and has defects in individuation and adaptability although basic voice interaction functions can be provided.

In summary, the prior art has mainly the following drawbacks:

1. there is a lack of personalized language models for children's features.

2. The content is single, and cannot be dynamically adjusted according to the interest and growth stage of the child.

3. The interaction mode is single, and the interestingness and the attraction are lacked.

4. The lack of processing capability on image data cannot fully record the growth process of children.

Improvements are therefore needed to address the above-mentioned deficiencies.

Disclosure of Invention

The invention aims to provide a personalized language model construction method of an intelligent voice assistant, which aims to overcome the defects in the prior art.

In order to achieve the above purpose, the present invention provides the following technical solutions:

The application discloses a personalized language model construction method of an intelligent voice assistant, which comprises the following steps:

S1, collecting data according to daily use of a user;

S2, preprocessing the collected data, screening and removing interference content;

S3, extracting characteristics of the data, and dividing the data according to the characteristics;

s4, training a personalized language model aiming at the characteristic data by utilizing a deep learning algorithm, and dynamically adjusting according to the use condition of a user;

S5, verifying the accuracy and effect of the model;

And S6, deploying the trained model into an intelligent voice assistant to realize real-time interaction.

Preferably, the S1 comprises the following steps of collecting voice data, behavior data and feedback data of a user through daily interaction with the user, and collecting image data matched with the voice data, the behavior data and the feedback data.

Preferably, the step S2 includes the following sub-steps:

S21, cleaning the data collected in the S1 to remove interference content;

S22, identifying the data according to the acquired information and the content of the data, and marking the data so as to facilitate subsequent processing;

And S23, carrying out normalization processing on the data, facilitating subsequent data extraction and processing, and ensuring the data quality.

Preferably, the S3 comprises the following steps of extracting the characteristics of the data according to the content of the data and combining the preprocessed information, wherein the characteristics are extracted as voice characteristics, emotion characteristics, behavior characteristics and image characteristics.

Preferably, the step S5 comprises the following steps of verifying the accuracy of the model through cross verification, collecting real-time feedback of a user, and evaluating the effect of the model according to the evaluation of the user.

Preferably, the image acquisition comprises the following:

A1, acquiring image data in the use process of a user through a camera;

a2, cleaning, labeling and normalizing the acquired image data;

A3, extracting information features in the image, including portrait features and object features;

A4, identifying the content in the image, judging the identity information of the speaker, and marking in a combined way;

a5, storing the processed image data into a database.

Preferably, the intelligent content recommendation system further comprises a content recommendation system, wherein the content recommendation system judges the interests and the use stages of the user according to the collected data and the content actively input by the user, and performs intelligent content recommendation and pushing.

Preferably, the interactive mode recommendation comprises voice dialogue, story telling, game interaction and the like, so that the interestingness and attractiveness of the interaction are improved.

Preferably, the intelligent voice assistant comprises a data acquisition mechanism for acquiring data for model training, a loudspeaker for playing voice music, a memory for temporarily storing data, a storage device for storing data for a long term and a processor for data processing and model operation, wherein the data acquisition mechanism comprises a microphone for acquiring voice data and a camera for acquiring image data.

The application also discloses a personalized language model construction device of the intelligent voice assistant, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the personalized language model construction method of the intelligent voice assistant when executing the executable codes.

The application also discloses a computer readable storage medium, which stores a program, and when the program is executed by a processor, the personalized language model construction method of the intelligent voice assistant is realized.

The invention has the beneficial effects that:

(1) According to the embodiment of the invention, through the personalized language model, the intelligent voice assistant can better understand and respond to the needs of children and provide more personalized services;

(2) According to the embodiment of the invention, the content recommendation system can recommend proper learning content and entertainment according to the interests and growth stages of children, so that the life of the children is enriched;

(3) The embodiment of the invention adopts various interaction modes to design, thereby increasing the interest and attraction of interaction and stimulating the learning interest of children;

(4) The model in the embodiment of the invention can be dynamically adjusted according to the growth and development of children, and continuously provides high-quality service;

(5) According to the embodiment of the invention, through the collection and processing of the image data, the growth process of children can be comprehensively recorded, and more rich growth records are provided.

The features and advantages of the present invention will be described in detail by way of example with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of the steps of a personalized language model construction method of an intelligent voice assistant according to the present invention;

fig. 2 is a schematic illustration of an apparatus of an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

Referring to fig. 1, an embodiment of the present invention provides a method for constructing a personalized language model of an intelligent voice assistant, and a method for constructing a personalized language model of an intelligent voice assistant.

The personalized language model construction method comprises the following steps:

S1, collecting data according to daily use of a user;

S5, verifying the accuracy and effect of the model;

The S1 comprises the following steps of collecting voice data, behavior data and feedback data of a user through daily interaction with the user, and collecting image data matched with the voice data, the behavior data and the feedback data.

The step S2 comprises the following substeps:

S21, cleaning the data collected in the S1 to remove interference content;

And S3, carrying out feature extraction on the data according to the content of the data and combining the preprocessed information, wherein the feature extraction comprises the steps of extracting the data into voice features, emotion features, behavior features and image features.

The S5 comprises the following steps of verifying the accuracy of the model through cross verification, collecting real-time feedback of a user, and evaluating the effect of the model according to the evaluation of the user.

The image acquisition comprises the following contents:

A1, acquiring image data in the use process of a user through a camera;

a2, cleaning, labeling and normalizing the acquired image data;

a5, storing the processed image data into a database.

The content recommendation system is used for judging the interests and the use stages of the user according to the collected data and the content actively input by the user, and performing intelligent content recommendation and pushing.

The interactive mode recommendation comprises voice dialogue, story telling, game interaction and the like, so that the interestingness and attractiveness of the interaction are improved.

The intelligent voice assistant comprises a data acquisition mechanism for acquiring data for model training, a loudspeaker for playing voice music, a memory for temporarily storing data, a storage device for storing data for a long time and a processor for data processing and model operation, wherein the data acquisition mechanism comprises a microphone for acquiring voice data and a camera for acquiring image data.

Examples:

In the embodiment, the intelligent voice assistant is used for children to accompany growth and carries out intelligent treatment on the growth and life of the children;

the intelligent voice assistant hardware includes the following:

And the microphone is used for collecting voice data.

And the loudspeaker is used for playing voice and music.

And the processor is used for data processing and model operation.

And the memory is used for temporarily storing data.

Storage device for long-term storage of data.

And the camera is used for collecting image data.

And collecting data, namely collecting voice data, behavior data and feedback data through daily interaction with children, and collecting image data through a camera.

The data collection includes the following:

1. When a child interacts with an assistant, voice data are collected through a microphone and uploaded to the cloud in real time;

2. The camera captures a scene image every a fixed amount of time according to the preset time, stores the scene image in a local database after encryption, and simultaneously, when a child interacts with an assistant, the camera acquires the image in real time so as to synchronize with interaction data.

And (3) data preprocessing, namely cleaning, labeling and normalizing the collected voice and image data to ensure the data quality.

And extracting the characteristics of voice, emotion, behavior and images for subsequent modeling.

Model training, namely training a personalized language model by using a deep learning algorithm (such as LSTM, transformer and the like), wherein the model can be dynamically adjusted according to the age, the interest and the growth stage of children.

Model evaluation, namely evaluating the accuracy and effect of the model through cross verification and user feedback.

And (3) model deployment, namely deploying the trained model into the intelligent voice assistant to realize real-time interaction.

Content recommendation system, which recommends proper learning content and entertainment according to the interest and growth stage of children.

The interactive mode design is to design various interactive modes including voice dialogue, story telling, game interaction and the like, so that the interactive interestingness and attraction are increased.

An image data processing module:

And (3) image acquisition, namely acquiring image data of the children in daily life through a camera.

And (3) image preprocessing, namely cleaning, labeling and normalizing the acquired image data.

And extracting the image characteristics, namely extracting key characteristics in the image, such as faces, objects and the like.

And (3) image recognition, namely recognizing the content in the image by utilizing an image recognition algorithm, and assisting in judging the identity of the speaker.

And (3) image recording, namely storing the processed image data into a database for recording the growth process of the children.

In one possible embodiment, different data collection methods, such as parental assist collection, third party data platform, etc., are used, and the data collection is followed by comprehensive calculation and integration.

In one possible embodiment, the feature extraction algorithm employed includes MFCC, CNN, and the like.

In one possible embodiment, the deep learning model used includes GRU, BERT, and the like.

In one possible embodiment, the content recommendation algorithm employed includes collaborative filtering, deep learning recommendation systems, and the like.

In one possible embodiment, different interaction modes are designed, such as virtual role playing, interactive games, etc.

In one possible embodiment, the image recognition algorithm used includes Convolutional Neural Network (CNN), deep-learning image recognition model, and the like.

The embodiment of the personalized language model construction device of the intelligent voice assistant can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 2, a hardware structure diagram of an apparatus with optional data processing capability where a personalized language model construction device of an intelligent voice assistant of the present invention is located is shown in fig. 2, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 2, the apparatus with optional data processing capability where an embodiment is located generally includes other hardware according to an actual function of the apparatus with optional data processing capability, which is not described herein again. The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements a personalized language model construction device of an intelligent voice assistant in the above embodiment.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device of any device having data processing capabilities, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, or alternatives falling within the spirit and principles of the invention.

Claims

Translated fromChinese

1.一种智能语音助手的个性化语言模型构建方法，其特征在于：包括如下步骤：1. A method for constructing a personalized language model for an intelligent voice assistant, characterized in that it comprises the following steps:

S1：根据用户的日常使用采集数据；S1: Collect data based on users’ daily usage;

S2：对收集到的数据进行预处理，筛选并去除干扰内容；S2: Preprocess the collected data to filter and remove interference content;

S3：对数据进行特征提取，并按照特征进行划分；S3: Extract features from the data and divide them according to the features;

S4：利用深度学习算法针对特征数据进行个性化语言模型的训练，并根据用户的使用情况进行动态调整；S4: Use deep learning algorithms to train personalized language models based on feature data, and dynamically adjust them based on user usage;

S5：对模型的准确性和效果进行验证；S5: Verify the accuracy and effectiveness of the model;

S6：将训练后的模型部署至智能语音助手中，实现实时交互。S6: Deploy the trained model to the intelligent voice assistant to achieve real-time interaction.

2.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：所述S1包括如下内容：通过与用户的日常互动，收集用户的语音数据、行为数据和反馈数据，并采集与之匹配的图像数据。2. A method for constructing a personalized language model for an intelligent voice assistant as described in claim 1, characterized in that: S1 includes the following content: through daily interaction with the user, collecting the user's voice data, behavior data and feedback data, and collecting image data matching therewith.

3.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：所述S2包括如下子步骤：3. The method for constructing a personalized language model of an intelligent voice assistant according to claim 1, wherein S2 comprises the following sub-steps:

S21：对S1收集到的数据进行清洗，去除干扰内容；S21: Clean the data collected by S1 to remove interference content;

S22：根据数据的采集信息以及内容进行识别，对数据进行标注，便于后续处理；S22: Identify and label the data based on the collected information and content of the data to facilitate subsequent processing;

S23：对数据进行归一化处理，便于后续数据提取及处理，确保数据质量。S23: Normalize the data to facilitate subsequent data extraction and processing and ensure data quality.

4.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：所述S3包括如下内容：根据数据的内容并结合预处理的信息，对数据进行特征提取，包括提取为语音特征、情感特征、行为特征和图像特征。4. A method for constructing a personalized language model for an intelligent voice assistant as described in claim 1, characterized in that: S3 includes the following content: based on the content of the data and combined with pre-processed information, feature extraction is performed on the data, including extraction into voice features, emotional features, behavioral features and image features.

5.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：所述S5包括如下内容：通过交叉验证，对模型的准确性进行验证；并采集用户的实时反馈，根据用户的评价评估模型的效果。5. A method for constructing a personalized language model for an intelligent voice assistant as described in claim 1, characterized in that: S5 includes the following contents: verifying the accuracy of the model through cross-validation; and collecting real-time feedback from users, and evaluating the effect of the model based on the user's evaluation.

6.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：图像采集包括如下内容：6. The method for constructing a personalized language model of an intelligent voice assistant according to claim 1, wherein the image acquisition includes the following contents:

A1：通过摄像头采集用户使用过程中的图像数据；A1: Collect image data during user use through the camera;

A2：对采集的图像数据进行清洗、标注和归一化处理；A2: Clean, label and normalize the collected image data;

A3：提取图像中的信息特征，包括人像特征、物体特征；A3: Extract information features from the image, including portrait features and object features;

A4：识别图像中的内容，判断说话人的身份信息，并结合进行标注；A4: Identify the content in the image, determine the speaker’s identity information, and annotate it;

A5：将处理后的图像数据存储到数据库中。A5: Store the processed image data into the database.

7.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：还包括内容推荐系统，根据采集到的数据以及用户主动输入的内容，判断用户的兴趣和使用阶段，进行内容智能推荐及推送。7. A method for constructing a personalized language model for an intelligent voice assistant as described in claim 1, characterized in that it also includes a content recommendation system, which judges the user's interests and usage stage based on the collected data and the content actively input by the user, and performs intelligent content recommendation and push.

8.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：还包括互动方式推荐，包括语音对话、故事讲述、游戏互动等，增加互动的趣味性和吸引力。8. The method for constructing a personalized language model for an intelligent voice assistant as described in claim 1, characterized in that it also includes recommendations for interactive methods, including voice dialogue, story telling, game interaction, etc., to increase the fun and attractiveness of the interaction.

9.如权利要求1所述的一种智能语音助手的个性化语言模型构建方法，其特征在于：包括一种智能语音助手，包括用于采集数据进行模型训练的数据采集机构、用于播放语音音乐的扬声器、用于临时存储数据的内存、用于长期存储数据的存储设备和用于数据处理和模型运算的处理器，所述数据采集机构包括用于采集语音数据的麦克风和用于采集图像数据的摄像头。9. A method for constructing a personalized language model for an intelligent voice assistant as described in claim 1, characterized in that it includes an intelligent voice assistant, including a data acquisition mechanism for collecting data for model training, a speaker for playing voice music, a memory for temporarily storing data, a storage device for long-term storage of data, and a processor for data processing and model calculation, wherein the data acquisition mechanism includes a microphone for collecting voice data and a camera for collecting image data.