CN120672401A

Movatterモバイル変換

Info

Publication number: CN120672401A
Application number: CN202510663345.3A
Authority: CN
Inventors: 王争辉
Original assignee: East China Institute of Technology
Current assignee: East China Institute of Technology
Priority date: 2025-05-22
Filing date: 2025-05-22
Publication date: 2025-09-19

Abstract

The invention relates to a method and a system for pushing a multi-mode VR advertisement based on face recognition, wherein the method comprises the steps of collecting multi-mode data and preprocessing the multi-mode data; the method comprises the steps of performing feature extraction from preprocessed multi-mode data by adopting a deep learning network model, performing dynamic weighted fusion on image features, voice features and touch data after feature extraction, constructing and training a multi-mode VR advertisement prediction model, inputting fused multi-mode feature vectors into the model, outputting user interest labels and behavior prediction results, constructing an advertisement database, selecting corresponding advertisements to push based on current interest labels and historical behavior data of users, acquiring interaction data of the users on the pushed advertisements, and performing feedback dynamic adjustment on advertisement matching algorithm weights according to the interaction data of the users. According to the invention, through acquiring multi-mode data and accurately predicting by adopting the model, the accurate identification of the user behavior and preference is realized, and personalized advertisement pushing is provided.

Description

Multi-mode VR advertisement pushing method and system based on face recognition

Technical Field

The invention relates to the technical field of VR advertisements, in particular to a multi-mode VR advertisement pushing method and system based on face recognition.

Background

With the rapid development of Virtual Reality (VR) technology, the VR technology is increasingly widely applied in a plurality of fields such as entertainment, education, business and the like, and an immersive brand new experience is brought to users. In the commercial area, advertising is also increasingly incorporated into VR environments as one of the important marketing tools.

The conventional VR advertisement pushing mode is mostly single, advertisement is usually put on the basis of simple attributes (such as age, gender and the like) or preset scenes of users, the lack of individuation mode leads to lower matching degree of advertisements and actual demands and interests of the users, the users possibly feel tired of a large number of irrelevant or uninteresting advertisements, and therefore user experience is reduced, meanwhile, updating and adjustment of advertisement content are inflexible, and when user behaviors and preferences change, a system cannot dynamically adjust advertisement pushing strategies in time, so that matching degree of advertisements and current states of the users is reduced.

Therefore, there is a need to provide new face recognition based multi-modality VR advertisement push methods and systems.

Disclosure of Invention

Based on the above problems existing in the prior art, an object of the embodiment of the present invention is to provide a method and a system for pushing a multi-modal VR advertisement based on face recognition.

In order to achieve the above purpose, the technical scheme adopted by the invention is that the multi-mode VR advertisement pushing method based on face recognition comprises the following steps:

S1, acquiring multi-mode data of a user in a virtual reality environment, and preprocessing the multi-mode data;

S2, extracting features from the preprocessed multi-mode data by adopting a deep learning network model;

S3, carrying out dynamic weighted fusion on the image features, the voice features and the touch data after feature extraction to generate uniform multi-modal feature vectors;

S4, constructing and training a multi-mode VR advertisement prediction model, inputting the fused multi-mode feature vector into the multi-mode VR advertisement prediction model, and outputting user interest labels and behavior prediction results;

S5, constructing an advertisement database, and selecting corresponding advertisement content to be embedded into a VR scene for pushing based on the current interest tag and the historical behavior data of the user;

s6, acquiring interaction data of the user on the pushed advertisements, and feeding back and dynamically adjusting advertisement matching algorithm weights according to the interaction data of the user.

Further, the multi-mode data comprises image data, voice data and touch data, the image data acquisition comprises the steps of adopting at least two high-resolution RGB cameras which are respectively arranged on two sides of the front of the VR helmet and used for capturing facial expressions, eyeball movements and head gestures of a user in real time, and the touch data acquisition comprises the step of configuring a piezoelectric touch sensor on a VR handle and used for detecting the holding power and operation gestures of the user.

Further, the preprocessing of the multi-modal data includes compression processing of the image data, noise reduction processing of the voice data, and normalization processing of the haptic data.

Further, the feature extraction of the network model adopting deep learning from the preprocessed multi-modal data includes:

s21, performing feature extraction on image data by adopting a deep-learning convolutional neural network, wherein the convolutional neural network adopts ResNet-50;

Step S22, performing feature extraction on voice data by adopting a deep learning cyclic neural network, wherein the cyclic neural network adopts a long-term and short-term memory network;

and S23, performing feature extraction on the tactile data by adopting a one-dimensional convolutional neural network.

Further, the weighting and fusing the image features, the voice features and the touch data after feature extraction to generate a unified multi-modal feature vector includes:

Step S31, judging according to the current scene of the user, and dynamically updating the weight coefficient by adopting an exponential smoothing method;

The calculation formula of the dynamic updating weight coefficient by the exponential smoothing method is as follows:

Wherein, theTo represent the weighting factor of the mth modality (e.g. visual, auditory, tactile) at time t,The weight coefficient of the mth mode at the time t-1 is represented by a smoothing coefficient,Presetting a weight for a scene of an mth modality;

And S32, mapping the visual, auditory and tactile feature vectors to a unified dimension through a deep learning full-connection layer, and then screening key features through an attention mechanism to generate a fused multi-modal feature vector.

Further, the constructing and training a multi-mode VR advertisement prediction model, inputting the fused multi-mode feature vector into the multi-mode VR advertisement prediction model, and outputting a user interest tag and a behavior prediction result, including:

Step S41, a multi-mode VR advertisement prediction model adopts a mixed model architecture combining a support vector machine and a deep neural network;

step S42, training a multi-mode VR advertisement prediction model by using a large amount of marked multi-mode data;

step S43, taking the fused multi-mode feature vector as the input of the model, comprising visual features, auditory features and information, integrating the multi-mode feature vector through a full-connection layer, ensuring that each feature can influence the prediction result of the model, and outputting user interest labels and behavior prediction results;

step S44, adopting an online learning mechanism, the multi-mode VR advertisement prediction model can receive new user data in real time and dynamically update model parameters to adapt to user preference changes.

Further, the obtaining the interactive data of the user on the push advertisement, and feeding back and dynamically adjusting the weight of the advertisement matching algorithm according to the interactive data of the user, includes:

Step S61, obtaining interactive data of a user on the push advertisement, wherein the interactive data comprise click rate, stay time, haptic feedback intensity and voice evaluation;

Step S62, carrying out deep analysis on the collected interaction data, and extracting key features;

and step S63, adopting a reinforcement learning framework, and dynamically adjusting the weight of the advertisement matching algorithm according to user feedback.

The utility model provides a multimode VR advertisement push system based on face identification, is applied to above-mentioned multimode VR advertisement push method based on face identification, the system includes:

the data acquisition module is used for acquiring multi-mode data of a user in a virtual reality environment and preprocessing the multi-mode data;

The feature extraction module is used for extracting features from the preprocessed multi-mode data by adopting a deep learning network model;

the weighting fusion module is used for carrying out dynamic weighting fusion on the image features, the voice features and the touch data after the feature extraction to generate a unified multi-mode feature vector;

the modeling training module is used for constructing and training a multi-mode VR advertisement prediction model;

The behavior prediction module is used for inputting the fused multi-mode feature vector into a multi-mode VR advertisement prediction model and outputting a user interest tag and a behavior prediction result;

the advertisement pushing module is used for constructing an advertisement database, and selecting corresponding advertisement content to be embedded into the VR scene for pushing based on the current interest tag and the historical behavior data of the user;

and the algorithm optimization module is used for acquiring the interaction data of the user on the push advertisement and dynamically adjusting the weight of the advertisement matching algorithm according to the feedback of the interaction data of the user.

The embodiment of the invention also provides a network side service end, which comprises the following steps:

The multi-modal VR advertisement push method based on face recognition comprises the steps of providing a plurality of face recognition-based multi-modal VR advertisement push methods, providing at least one face recognition-based multi-modal VR advertisement push method, providing at least one face recognition-based multi-modal VR advertisement, and providing at least one face recognition-based multi-modal VR advertisement.

The embodiment of the invention also provides a computer readable storage medium which stores a computer program, and the computer program realizes the multi-mode VR advertisement pushing method based on face recognition when being executed by a processor.

The multi-modal VR advertisement pushing method based on face recognition has the advantages that multi-modal data of a user in a virtual reality environment are collected, the multi-modal data are preprocessed, a deep learning network model is adopted to conduct feature extraction from the preprocessed multi-modal data, dynamic weighted fusion is conducted on image features, voice features and touch data after feature extraction, unified multi-modal feature vectors are generated, a multi-modal VR advertisement prediction model is built and trained, the fused multi-modal feature vectors are input into the multi-modal VR advertisement prediction model, user interest labels and behavior prediction results are output, an advertisement database is built, corresponding advertisement content is selected to be embedded into a VR scene for pushing based on current interest labels and historical behavior data of the user, interactive data of the user for pushing advertisements are obtained, and advertisement matching algorithm weights are dynamically adjusted according to feedback of the interactive data of the user. According to the multi-mode VR advertisement pushing method based on face recognition, through collecting multi-mode data and preprocessing, dynamic weighting fusion is carried out after the deep learning model is used for accurately extracting features, user interest and behavior prediction are output in real time by combining the prediction model and an online learning mechanism, personalized advertisements are pushed by utilizing a multi-dimensional tag advertisement library and a dynamic matching algorithm, and finally closed-loop optimization is fed back according to user interaction, so that more accurate and personalized advertisement pushing is realized, the conversion rate of advertisements is improved, the user experience is enhanced, and meanwhile, a more efficient advertisement throwing platform is provided for advertisers.

Drawings

The invention is further described below with reference to the drawings and examples.

In the figure:

fig. 1 is a flowchart of a multi-mode VR advertisement pushing method based on face recognition according to a first embodiment of the present invention;

Fig. 2 is a schematic block diagram of a multi-mode VR advertisement push system based on face recognition according to a second embodiment of the present invention;

Fig. 3 is a schematic structural diagram of a network-side server according to a third embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First embodiment:

The first embodiment of the invention provides a multimode VR advertisement pushing method based on face recognition, which comprises the steps of collecting multimode data of a user in a virtual reality environment, preprocessing the multimode data, extracting features from the preprocessed multimode data by adopting a network model for deep learning, dynamically weighting and fusing image features, voice features and touch data after the features are extracted to generate unified multimode feature vectors, constructing and training a multimode VR advertisement prediction model, inputting the fused multimode feature vectors into the multimode VR advertisement prediction model, outputting user interest tags and behavior prediction results, constructing an advertisement database, selecting corresponding advertisement content to be pushed into the VR scene based on current interest tags and historical behavior data of the user, acquiring interactive data of the pushed advertisement by the user, and dynamically adjusting advertisement matching algorithm weights according to feedback of the interactive data of the user. According to the multi-mode VR advertisement pushing method based on face recognition, through collecting multi-mode data and preprocessing, dynamic weighting fusion is carried out after the deep learning model is used for accurately extracting features, user interest and behavior prediction are output in real time by combining the prediction model and an online learning mechanism, personalized advertisements are pushed by utilizing a multi-dimensional tag advertisement library and a dynamic matching algorithm, and finally closed-loop optimization is fed back according to user interaction, so that more accurate and personalized advertisement pushing is realized, the conversion rate of advertisements is improved, the user experience is enhanced, and meanwhile, a more efficient advertisement throwing platform is provided for advertisers.

The implementation details of the multi-mode VR advertisement pushing method based on face recognition in the present embodiment are specifically described below, and the following details are provided only for easy understanding, but are not required to implement the present embodiment, and a specific flow in the present embodiment is shown in fig. 1.

Step S1, acquiring multi-mode data of a user in a virtual reality environment, and preprocessing the multi-mode data.

Specifically, the multimodal data includes image data, voice data, and haptic data.

The image data acquisition comprises the adoption of at least two high-resolution RGB cameras which are respectively arranged on two sides of the front part of the VR helmet and used for capturing facial expressions, eyeball movements and head gestures of a user in real time.

The voice data collection includes configuring four high sensitivity microphones distributed at the edge of the VR headset for capturing the user's voice instructions and ambient sounds.

The collection of haptic data includes configuring a piezoelectric haptic sensor on the VR handle for detecting a user's grip and operation gestures.

Preprocessing the multimodal data includes compressing the image data, denoising the speech data, and normalizing the haptic data.

Specifically, because the collected image data has larger data volume, the direct processing of the image data can lead to higher calculation cost and transmission delay, and the H.264 coding standard is adopted to perform the inter-frame compression of the video stream, so that the data transmission bandwidth requirement can be obviously reduced while the video quality is ensured. In the compression process, an image denoising algorithm is adopted, a non-local mean filtering algorithm is adopted as an example to effectively eliminate motion blur and noise by analyzing similar areas in an image, further detail and edge information of the image are kept, clearer image data are provided for subsequent feature extraction, the local mean filtering algorithm can effectively eliminate the motion blur, and the system can be ensured to accurately capture facial expression and head gesture of a user.

The spectral subtraction is adopted to eliminate environmental noise, and by analyzing the spectral characteristics of the voice signals, the spectral subtraction can effectively remove background noise, improve the accuracy of voice recognition and provide high-quality data for subsequent voice feature extraction.

Because the sensitivity of the tactile sensors may be different in different VR handles, the sensitivity difference between the different sensors can be eliminated by the normalization processing of the tactile data, the normalized tactile signals are normalized to generate normalized tactile intensity vectors, and the normalization processing ensures the comparability of the data collected by the different sensors by mapping the values of the tactile signals into a uniform range, thereby providing standardized data support for subsequent tactile feature extraction and behavior analysis.

And S2, extracting features from the preprocessed multi-mode data by adopting a deep learning network model.

Specifically, the specific steps of extracting features from the preprocessed multi-modal data by adopting the deep learning network model include:

And S21, performing feature extraction on the image data by adopting a deep learning convolutional neural network, wherein the convolutional neural network adopts ResNet-50.

Specifically, by extracting the facial expression key points of the user, such as the mouth angle radian, the eye contour, and the like, the facial expression key points of the user can reflect the emotional state and the focus of attention of the user. As an example, a corner of the mouth rising may indicate that the user is in a pleasant state, while a change in the eye contour may indicate the degree of concentration of the user. In addition, the head movement track features of the user are extracted through the convolutional neural network, and the head movement track can reflect the visual focus and behavior habit of the user. For example, a user moving his head around frequently may indicate curiosity about the surrounding environment, while a long gaze in a certain direction may indicate an interest in that direction.

And S22, performing feature extraction on the voice data by adopting a deep learning cyclic neural network, wherein the cyclic neural network adopts a long-term and short-term memory network.

Specifically, a long-short-term memory network is used to extract time sequence emotion characteristics, such as intonation, speech speed and the like, from the voice signal, and the time sequence emotion characteristics can reflect the emotion state of the user. For example, an increase in intonation may indicate that the user is in an excited state, while an increase in speech rate may indicate that the user is in a stressed state.

The time sequence features in the touch data, such as click frequency, sliding direction and the like, are automatically learned through the one-dimensional convolutional neural network, so that the operation habit and preference of a user can be reflected.

And step S3, carrying out dynamic weighted fusion on the image features, the voice features and the touch data after feature extraction to generate a unified multi-mode feature vector.

Specifically, the specific step of performing weighted fusion on the image features, the voice features and the touch data after feature extraction to generate a unified multi-modal feature vector includes:

and S31, judging according to the current scene of the user, and dynamically updating the weight coefficient by adopting an exponential smoothing method.

Specifically, the current interaction scene is judged through head gesture and gaze point detection in the image data. As an example, when a user gives a voice instruction, a voice interaction scene is determined, at which time the auditory mode weight is 50%, the visual weight is 40%, and the tactile weight is 10%. When the fixation point in the image data is fixed on the virtual object (stay >1 s), the visual focus scene is judged, the visual weight is 60%, the auditory weight is 25%, and the tactile weight is 15%. When a slide/pinch gesture in the haptic data is detected, a gesture operation scene is determined, at which time the haptic weight is 40%, the visual weight is 35%, and the auditory weight is 25%.

Wherein, theTo represent the weighting factor of the mth modality (e.g. visual, auditory, tactile) at time t,The weight coefficient of the mth mode at the time t-1 is represented by a smoothing coefficient,And presetting a weight for the scene of the mth modality.

By exponential smoothing method, combining historical weightsAnd scene preset weightsDynamically updating current modal weightsThe weight distribution is more fit with the real-time scene and the history rule.

And S4, constructing and training a multi-mode VR advertisement prediction model, inputting the fused multi-mode feature vector into the multi-mode VR advertisement prediction model, and outputting user interest labels and behavior prediction results.

Specifically, the steps of constructing and training a multi-mode VR advertisement prediction model, inputting the fused multi-mode feature vector into the multi-mode VR advertisement prediction model, and outputting the user interest tag and the behavior prediction result include:

in step S41, the multi-mode VR advertisement prediction model adopts a hybrid model architecture combining a Support Vector Machine (SVM) and a Deep Neural Network (DNN).

The SVM part is mainly used for processing linear separable characteristics and can effectively classify user interests, and the DNN part is used for capturing complex nonlinear relations and predicting user behaviors.

Step S42, training the multi-mode VR advertisement prediction model by using a large amount of marked multi-mode data.

Specifically, the training data includes behavior characteristics of the user in different scenes and corresponding interest tags and behavior results. Model parameters are adjusted by an optimization algorithm (such as an Adam optimizer) so that the model can accurately predict the interests and behaviors of the user.

Step S43, taking the fused multi-mode feature vector as the input of the model, comprising visual features, auditory features and information, integrating the multi-mode feature vector through a full-connection layer, ensuring that each feature can influence the prediction result of the model, and outputting the user interest labels and the behavior prediction result.

Specifically, when a user operates or interacts in a VR environment, multi-mode data is collected in real time, preprocessed and extracted in characteristics, and then input into a multi-mode VR advertisement prediction model. The multi-mode VR advertisement prediction model adopts an online learning mechanism, can receive new user data in real time and dynamically update model parameters, ensures that the model can always accurately reflect the latest behaviors and preferences of the user, and further improves the prediction accuracy and the advertisement pushing correlation.

And S5, constructing an advertisement database, and selecting corresponding advertisement content to be embedded into the VR scene for pushing based on the current interest tag and the historical behavior data of the user.

In particular, the advertisement database stores advertisement content in a multi-dimensional tagged manner. Each advertisement entry corresponds to a user's behavior and a user's interest tag, facilitating rapid matching of the model to appropriate advertisement content. Based on the current interest tag and the historical behavior data of the user, selecting the most suitable advertisement content by adopting a dynamic matching algorithm, and calculating the matching degree score of each advertisement and the user by comprehensively considering the interest preference, the behavior mode and the relevance and the priority weight of the advertisement content of the user by the algorithm. And generating an advertisement pushing queue according to the matching degree score, and selecting the first few advertisements with the highest matching degree score for pushing so as to ensure that the advertisement content received by the user is in the best line with the interests and preferences of the user.

Furthermore, the advertisement content is seamlessly embedded into the VR scene, so that the immersion of the user is enhanced, the advertisement content and the virtual environment are naturally fused, and the advertisement acceptability and the interactive willingness of the user are improved.

And S6, acquiring interaction data of the user on the pushed advertisements, and feeding back and dynamically adjusting advertisement matching algorithm weights according to the interaction data of the user.

Specifically, the specific step of obtaining the interaction data of the user on the push advertisement and feeding back the interaction data of the user to dynamically adjust the weight of the advertisement matching algorithm comprises the following steps:

step S61, the interactive data of the user on the push advertisement is obtained, wherein the interactive data comprise click rate, stay time, haptic feedback intensity and voice evaluation.

The method comprises the steps of determining a click rate of a user, wherein the click rate is the ratio of the times of clicking the advertisement by the user to the times of displaying the advertisement, reflecting the direct interest degree of the user on the advertisement, the stay time is the stay time of the user on the advertisement, reflecting the attention degree and the potential interest of the user on the advertisement, the haptic feedback strength is the strength and the frequency of the haptic operation of the user on advertisement content through a VR handle, indicating the participation degree and the preference of the user on the advertisement, the voice evaluation is the voice evaluation content of the user on the advertisement, analyzing the emotional response of the user through an emotion analysis model, and judging the preference degree of the user on the advertisement.

And S62, carrying out deep analysis on the collected interaction data, and extracting key features.

Specifically, the preference of the user to the advertisement of the specific type is identified by analyzing the click behavior pattern of the user, the interest concentration degree of the user to different advertisement contents is known by analyzing the distribution of the stay time, and the analysis result provides data support for the optimization of the advertisement matching algorithm.

Step S63, adopting a reinforcement learning framework, and dynamically adjusting the weight of the advertisement matching algorithm according to user feedback.

Specifically, the advertisement matching algorithm is dynamically adjusted by adopting a reinforcement learning framework. In the reinforcement learning process, the interactive data of the user is regarded as an environment feedback signal, and the weight of the advertisement matching algorithm is adjusted according to the positive and negative conditions of the feedback signal.

As an example, according to the click rate and the stay time of the user, the weight of the advertisement matching algorithm is dynamically adjusted to improve the priority of the advertisement with high click rate, so as to ensure that the most suitable advertisement content can be selected for pushing under different scenes, thereby improving the conversion rate of the advertisement and the satisfaction degree of the user.

The multi-modal VR advertisement pushing method based on face recognition comprises the steps of collecting multi-modal data of a user in a virtual reality environment, preprocessing the multi-modal data, extracting features from the preprocessed multi-modal data by adopting a network model for deep learning, dynamically weighting and fusing image features, voice features and touch data after the features are extracted to generate unified multi-modal feature vectors, constructing and training a multi-modal VR advertisement prediction model, inputting the fused multi-modal feature vectors into the multi-modal VR advertisement prediction model, outputting user interest tags and behavior prediction results, constructing an advertisement database, selecting corresponding advertisement content to be embedded into the VR scene for pushing based on current interest tags and historical behavior data of the user, acquiring interactive data of the pushed advertisement by the user, and dynamically adjusting advertisement matching algorithm weights according to feedback of the interactive data of the user. According to the multi-mode VR advertisement pushing method based on face recognition, through collecting multi-mode data and preprocessing, dynamic weighting fusion is carried out after the deep learning model is used for accurately extracting features, user interest and behavior prediction are output in real time by combining the prediction model and an online learning mechanism, personalized advertisements are pushed by utilizing a multi-dimensional tag advertisement library and a dynamic matching algorithm, and finally closed-loop optimization is fed back according to user interaction, so that more accurate and personalized advertisement pushing is realized, the conversion rate of advertisements is improved, the user experience is enhanced, and meanwhile, a more efficient advertisement throwing platform is provided for advertisers.

Second embodiment:

As shown in fig. 2, a second embodiment of the present invention provides a multi-modal VR advertisement push system based on face recognition, which includes a data acquisition module 201, a feature extraction module 202, a weighted fusion module 203, a modeling training module 204, a behavior prediction module 205, an advertisement push module 206, and an algorithm optimization module 207.

Specifically, the system comprises a data acquisition module 201 for acquiring multi-modal data of a user in a virtual reality environment and preprocessing the multi-modal data, a feature extraction module 202 for extracting features from the preprocessed multi-modal data by adopting a network model for deep learning, a weighted fusion module 203 for dynamically weighting and fusing image features, voice features and touch data after feature extraction to generate unified multi-modal feature vectors, a modeling training module 204 for constructing and training a multi-modal VR advertisement prediction model, a behavior prediction module 205 for inputting the fused multi-modal feature vectors into the multi-modal VR advertisement prediction model and outputting user interest tags and behavior prediction results, an advertisement pushing module 206 for constructing an advertisement database and selecting corresponding advertisement content to be embedded into a VR scene for pushing based on the current interest tags and historical behavior data of the user, and an algorithm optimization module 207 for acquiring interactive data of the pushed advertisement by the user and feeding back and dynamically adjusting advertisement matching algorithm weights according to the interactive data of the user.

It is to be noted that this embodiment is a system example corresponding to the first embodiment, and can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, a detailed description is omitted here. Accordingly, the related art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units that are not so close to solving the technical problem presented by the present invention are not introduced in the present embodiment, but this does not indicate that other units are not present in the present embodiment.

The third embodiment of the present invention relates to a network-side server, as shown in fig. 3, including at least one processor 302, and a memory 301 communicatively connected to the at least one processor 302, where the memory 301 stores instructions executable by the at least one processor 302, and the instructions are executed by the at least one processor 302, so that the at least one processor 302 can perform the above-mentioned data processing method.

Where the memory 301 and the processor 302 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 302 and the memory 301 together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 302 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 302.

The processor 302 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 301 may be used to store data used by processor 302 in performing operations.

A fourth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program, when executed by the processor, implements the face recognition based multi-modal VR advertisement push method in the first embodiment.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The foregoing is merely an embodiment of the present application, and a specific structure and characteristics of common knowledge in the art, which are well known in the scheme, are not described herein, so that a person of ordinary skill in the art knows all the prior art in the application date or before the priority date, can know all the prior art in the field, and has the capability of applying the conventional experimental means before the date, and a person of ordinary skill in the art can complete and implement the present embodiment in combination with his own capability in the light of the present application, and some typical known structures or known methods should not be an obstacle for a person of ordinary skill in the art to implement the present application. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present application, and these should also be considered as the scope of the present application, which does not affect the effect of the implementation of the present application and the utility of the patent. The protection scope of the present application is subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

Translated fromChinese

1.一种基于人脸识别的多模态VR广告推送方法，其特征在于，包括：1. A multimodal VR advertising push method based on face recognition, characterized by comprising:

S1，采集用户在虚拟现实环境中的多模态数据，并对多模态数据进行预处理；S1, collects multimodal data of users in a virtual reality environment and preprocesses the multimodal data;

S2，采用深度学习的网络模型从预处理后的多模态数据中进行特征提取；S2, uses a deep learning network model to extract features from preprocessed multimodal data;

S3，将特征提取后的图像特征、语音特征和触觉数据进行动态加权融合，生成统一的多模态特征向量；S3, dynamically weighted fusion of the extracted image features, speech features, and tactile data to generate a unified multimodal feature vector;

S4，构建并训练多模态VR广告预测模型，将融合后的多模态特征向量输入到多模态VR广告预测模型中，输出用户兴趣标签和行为预测结果；S4, building and training a multimodal VR advertising prediction model, inputting the fused multimodal feature vector into the multimodal VR advertising prediction model, and outputting user interest tags and behavior prediction results;

S5，构建广告数据库，基于用户当前兴趣标签与历史行为数据，选择对应的广告内容嵌入VR场景中进行推送；S5: Build an advertising database and select corresponding advertising content based on the user's current interest tags and historical behavior data, embed it into the VR scene, and push it;

S6，获取用户对推送广告的交互数据，根据用户的交互数据进行反馈动态调整广告匹配算法权重。S6, obtaining the user's interaction data on the pushed advertisements, and dynamically adjusting the advertisement matching algorithm weights based on the user's interaction data feedback.

2.根据权利要求1所述的基于人脸识别的多模态VR广告推送方法，其特征在于，所述多模态数据包括图像数据、语音数据和触觉数据，所述图像数据的采集包括采用至少两个高分辨率RGB摄像头，分别安装在VR头盔的前部两侧，用于实时捕捉用户的面部表情、眼球运动以及头部姿态；所述触觉数据的采集包括在VR手柄上配置有压电式触觉传感器，用于检测用户的握持力度和操作手势。2. The multimodal VR advertising push method based on face recognition according to claim 1 is characterized in that the multimodal data includes image data, voice data and tactile data, the image data is collected by using at least two high-resolution RGB cameras, which are respectively installed on the front sides of the VR helmet to capture the user's facial expressions, eye movements and head posture in real time; the tactile data is collected by configuring a piezoelectric tactile sensor on the VR handle to detect the user's grip strength and operation gestures.

3.根据权利要求1所述的基于人脸识别的多模态VR广告推送方法，其特征在于，所述对多模态数据进行预处理包括对图像数据进行压缩处理、对语音数据进行降噪处理以及对触觉数据进行标准化处理。3. The multimodal VR advertising push method based on face recognition according to claim 1 is characterized in that the preprocessing of multimodal data includes compressing image data, performing noise reduction processing on voice data, and standardizing tactile data.

4.根据权利要求3所述的基于人脸识别的多模态VR广告推送方法，其特征在于，所述采用深度学习的网络模型从预处理后的多模态数据中进行特征提取，包括：4. The multimodal VR advertising push method based on face recognition according to claim 3, wherein the network model using deep learning extracts features from the preprocessed multimodal data, comprising:

步骤S21，采用深度学习的卷积神经网络对图像数据进行特征提取，卷积神经网络采用ResNet-50；Step S21, using a deep learning convolutional neural network to extract features from the image data, and the convolutional neural network uses ResNet-50;

步骤S22，采用深度学习的循环神经网络对语音数据进行特征提取，循环神经网络采用长短期记忆网络；Step S22, using a deep learning recurrent neural network to extract features from the speech data, where the recurrent neural network uses a long short-term memory network;

步骤S23，采用一维卷积神经网络对触觉数据进行特征提取。Step S23: extract features from the tactile data using a one-dimensional convolutional neural network.

5.根据权利要求1所述的基于人脸识别的多模态VR广告推送方法，其特征在于，所述将特征提取后的图像特征、语音特征和触觉数据进行加权融合，生成统一的多模态特征向量，包括：5. The multimodal VR advertising push method based on face recognition according to claim 1, wherein the step of weightedly fusing the extracted image features, voice features, and tactile data to generate a unified multimodal feature vector comprises:

步骤S31，根据用户当前的场景进行判断，采用指数平滑法动态更新权重系数；Step S31, judging based on the user's current scenario, and dynamically updating the weight coefficient using exponential smoothing method;

指数平滑法动态更新权重系数的计算公式为：The calculation formula for dynamically updating the weight coefficient of the exponential smoothing method is:

其中，为表示第m个模态(如视觉、听觉、触觉)在t时刻的权重系数，为第m个模态在t-1时刻的权重系数，a为平滑系数，为第m个模态的场景预设权重；in, is the weight coefficient of the mth modality (such as vision, hearing, touch) at time t, is the weight coefficient of the mth mode at time t-1, a is the smoothing coefficient, Preset weights for the scene of the mth mode;

步骤S32，将视觉、听觉、触觉特征向量通过深度学习的全连接层映射至统一维度，随后通过注意力机制筛选关键特征，生成融合后的多模态特征向量。In step S32, the visual, auditory, and tactile feature vectors are mapped to a unified dimension through a fully connected layer of deep learning, and then key features are filtered through an attention mechanism to generate a fused multimodal feature vector.

6.根据权利要求1所述的基于人脸识别的多模态VR广告推送方法，其特征在于，所述构建并训练多模态VR广告预测模型，将融合后的多模态特征向量输入到多模态VR广告预测模型中，输出用户兴趣标签和行为预测结果，包括：6. The multimodal VR advertising push method based on face recognition according to claim 1, characterized in that the step of constructing and training a multimodal VR advertising prediction model, inputting the fused multimodal feature vector into the multimodal VR advertising prediction model, and outputting user interest tags and behavior prediction results comprises:

步骤S41，多模态VR广告预测模型采用支持向量机与深度神经网络相结合的混合模型架构；Step S41: The multimodal VR advertising prediction model adopts a hybrid model architecture combining support vector machine and deep neural network;

步骤S42，使用大量标注过的多模态数据对多模态VR广告预测模型进行训练；Step S42: training a multimodal VR advertising prediction model using a large amount of labeled multimodal data;

步骤S43，将融合后的多模态特征向量作为模型的输入，包含视觉特征、听觉特征和信息，将多模态特征向量通过全连接层进行整合，确保每个特征都能够对模型的预测结果产生影响，输出用户兴趣标签和行为预测结果；Step S43: The fused multimodal feature vector is used as the input of the model, including visual features, auditory features, and information. The multimodal feature vector is integrated through a fully connected layer to ensure that each feature can affect the model's prediction results, and the user interest tag and behavior prediction results are output;

步骤S44，采用在线学习机制，多模态VR广告预测模型能够实时接收新的用户数据并动态更新模型参数，适配用户偏好变化。In step S44, an online learning mechanism is adopted, and the multimodal VR advertising prediction model can receive new user data in real time and dynamically update model parameters to adapt to changes in user preferences.

7.根据权利要求1所述的基于人脸识别的多模态VR广告推送方法，其特征在于，所述获取用户对推送广告的交互数据，根据用户的交互数据进行反馈动态调整广告匹配算法权重，包括：7. The multimodal VR advertising push method based on face recognition according to claim 1, characterized in that the step of obtaining user interaction data on the pushed advertisements and dynamically adjusting the advertisement matching algorithm weight based on feedback from the user interaction data comprises:

步骤S61，获取用户对推送广告的交互数据包括点击率、停留时长、触觉反馈强度和语音评价；Step S61, obtaining user interaction data on the pushed advertisement, including click-through rate, dwell time, tactile feedback intensity, and voice evaluation;

步骤S62，对采集到的交互数据进行深入分析，提取关键特征；Step S62: Analyze the collected interaction data in depth to extract key features;

步骤S63，采用强化学习框架，根据用户反馈动态调整广告匹配算法权重。Step S63: Using a reinforcement learning framework, dynamically adjust the weight of the advertisement matching algorithm based on user feedback.

8.一种基于人脸识别的多模态VR广告推送系统，其特征在于，应用于权利要求1-7所述的基于人脸识别的多模态VR广告推送方法，所述系统包括：8. A multimodal VR advertising push system based on face recognition, characterized in that it is applied to the multimodal VR advertising push method based on face recognition according to claims 1-7, and the system comprises:

数据获取模块，用于采集用户在虚拟现实环境中的多模态数据，并对多模态数据进行预处理；The data acquisition module is used to collect multimodal data of users in the virtual reality environment and pre-process the multimodal data;

特征提取模块，用于采用深度学习的网络模型从预处理后的多模态数据中进行特征提取；Feature extraction module, used to extract features from pre-processed multimodal data using a deep learning network model;

加权融合模块，用于将特征提取后的图像特征、语音特征和触觉数据进行动态加权融合，生成统一的多模态特征向量；The weighted fusion module is used to dynamically weight the image features, speech features, and tactile data after feature extraction to generate a unified multimodal feature vector;

建模训练模块，用于构建并训练多模态VR广告预测模型；Modeling and training module, used to build and train multimodal VR advertising prediction models;

行为预测模块，用于将融合后的多模态特征向量输入到多模态VR广告预测模型中，输出用户兴趣标签和行为预测结果；The behavior prediction module is used to input the fused multimodal feature vector into the multimodal VR advertising prediction model and output user interest tags and behavior prediction results;

广告推送模块，用于构建广告数据库，基于用户当前兴趣标签与历史行为数据，选择对应的广告内容嵌入VR场景中进行推送；The advertising push module is used to build an advertising database and select corresponding advertising content based on the user's current interest tags and historical behavior data, embedding it into the VR scene for push;

算法优化模块，用于获取用户对推送广告的交互数据，根据用户的交互数据进行反馈动态调整广告匹配算法权重。The algorithm optimization module is used to obtain user interaction data on pushed ads and dynamically adjust the ad matching algorithm weights based on feedback from user interaction data.

9.一种网络侧服务端，其特征在于，包括：9. A network-side server, comprising:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如权利要求1至7中任一项所述的基于人脸识别的多模态VR广告推送方法。The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the multimodal VR advertising push method based on face recognition as described in any one of claims 1 to 7.

10.一种计算机可读存储介质，存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的基于人脸识别的多模态VR广告推送方法。10. A computer-readable storage medium storing a computer program, characterized in that when the computer program is executed by a processor, it implements the multimodal VR advertising push method based on face recognition according to any one of claims 1 to 7.