Disclosure of Invention
The invention aims to provide an artificial intelligence-based virtual fitting personalized clothing recommendation method, which fuses multi-mode semantic map modeling, a cross-mode contrast learning network, a virtual fitting image generation technology and a behavior feedback-driven optimization algorithm, and details how to realize a closed loop flow of clothing personalized recommendation and multi-angle fitting image generation based on image data, text description, body type parameters and interactive behaviors of a user.
According to the embodiment of the invention, the virtual fitting personalized clothing recommendation method based on artificial intelligence comprises the following steps of:
S1, acquiring image data, text description data and historical interaction behavior data of a user to generate a multi-mode user feature set;
S2, constructing a multi-mode heterogeneous semantic graph, and modeling and embedding a multi-mode heterogeneous semantic graph structure to represent;
s3, extracting structured embedded vectors of user nodes and clothing nodes from the multi-modal heterogeneous semantic graphs, inputting the structured embedded vectors and the multi-modal user feature sets into a cross-modal contrast learning network model, and carrying out semantic alignment training in a shared embedded space by constructing positive and negative sample pairs;
s4, optimizing the structural parameters and the training super parameters of the cross-modal comparison learning network model through a raccoon optimization algorithm to generate an optimized cross-modal comparison learning network model;
S5, applying the optimized cross-modal comparison learning network model to a recommendation task, calculating a semantic matching degree score between a user and clothes, generating a recommendation candidate set, inputting the recommendation candidate set and user body type parameters into a virtual fitting image generation unit together, generating a fitting image of the user wearing the recommendation clothes by combining the clothes image, and outputting an interactable multi-angle fitting view;
And S6, collecting behavior feedback data of the user on the try-on image, generating a user behavior feedback feature vector, and updating the edge weight in the multi-modal heterogeneous semantic map and the training sample composition of the cross-modal contrast learning network, and periodically executing the steps S2 to S5 to form a closed-loop personalized recommendation flow of the self-adaptive iterative optimization.
Optionally, the multi-mode user feature set is generated by fusing a visual feature vector, a semantic feature vector and a behavior feature vector, wherein the visual feature vector is extracted by inputting image data of a user to a convolutional neural network, the semantic feature vector is extracted by inputting text description data to a language understanding unit, and the behavior feature vector is generated by encoding historical behavior data of the user.
Optionally, the S2 specifically includes:
S21, constructing a node set, wherein the node set comprises user nodes, clothing nodes and clothing attribute nodes, the user nodes are used for representing user objects with individual identifications, the clothing nodes are used for representing recommended target clothing objects, and the clothing attribute nodes are used for representing label information of styles, colors, seasons and brands of clothing;
s22, establishing an edge relation between a user node and a clothing node based on user history interaction behavior data, wherein the edge relation is used for representing that clicking, collecting, purchasing and fitting behavior association exists between the user and the clothing;
S23, establishing an edge relation between the clothing nodes and the clothing attribute nodes based on the clothing metadata and the tag information, wherein the edge relation is used for representing attribute association of styles, colors and brands of clothing;
s24, setting an initial side weight for the constructed side relation, wherein the side weight is set according to the interaction frequency of the user, the behavior type and the association strength between similar labels, and is used for adjusting the recommended path and the graph nerve propagation weight;
s25, carrying out structural modeling on the multi-mode heterogeneous semantic graph by adopting a heterogeneous graph modeling method, so that heterogeneous relation information among various nodes is reserved, and meanwhile, a complete graph structural representation is established;
S26, embedding and representing the multi-modal heterogeneous semantic graph based on the graph neural network structure, and encoding various nodes in the multi-modal heterogeneous semantic graph into a structured embedded vector through a multi-layer information aggregation mechanism, wherein the embedded vector is used for inputting a cross-modal contrast learning network model.
Optionally, the step S3 specifically includes:
S31, extracting structured embedded vectors of user nodes and clothing nodes from the multi-mode heterogeneous semantic graph, wherein the structured embedded vectors are respectively expressed asAndAnd extracting a fusion representation vector from the multimodal user feature set;
S32, introducing a map fusion gating coefficientConstructing a nonlinear gating fusion mechanism, embedding and fusing the semantic of the map structure and the modality, and generating a user representation vector:
;
Wherein, theAs a function of the Sigmoid,For the element-level multiplication to be performed,In the case of a multi-layer perceptron network,Fusing gating coefficients for the map;
S33, representing the vector of the userGarment structure embedding vectorConstructing cross-modal sample pairsConstructing a positive sample pair and a negative sample pair according to historical interaction information of a user and clothes;
s34, introducing a semantic multi-layer comparison mechanism, and setting the number of semantic comparison layer numbersConstructing a plurality of semantic granularity comparison tasks, including overall matching comparison, style attribute comparison and color semantic comparison, and performing independent loss calculation on each layer;
s35, introducing a multi-factor driving dynamic temperature control mechanism into each contrast layer to define the firstThe temperature parameters of the training iterations are:
;
Wherein, theIs used as a basic temperature super-parameter,,,A time adjustment factor, a similarity variance adjustment factor, and a loss sensitivity adjustment factor,Representing the variance of the positive and negative samples of the current batch versus the similarity,A comparative loss value representing the current lot,As a logarithmic function;
S36, training the semantic alignment of the user representation vector and the clothing representation vector in the shared embedded space by using the cross-modal contrast learning network model and minimizing the multi-layer semantic contrast loss function of the weighted fusion.
Optionally, the step S4 specifically includes:
S41, setting optimized structural parameters and training superparameters in a cross-modal contrast learning network model to form an optimized target set comprising map fusion gating coefficientsSemantic contrast layer progressionTime adjustment factorSimilarity variance adjustment factorAnd loss sensitivity adjustment factor;
S42, dividing the optimization target set into structure fusion subspaces based on parameter function attributesWith training control subspacesAnd initializing two raccoon sub-populations、Each raccoon individual represents a set of parameter combinations to be optimized;
S43, executing a local memory driving mechanism, an environment disturbance mechanism and a collaborative guiding updating mechanism of a raccoon optimization algorithm in each sub-population, performing multi-round iterative optimization on the individual raccoon population, and storing each round of optimal raccoon individual into a corresponding sub-population memory bank;
S44, introducing a dynamic memory window mechanism, and setting the length of the current iterative memory window as:
;
Wherein, theFor the initial window length to be the same,For the feedback sensitivity adjustment factor to be used,For the current round fitness score, the window length is used to control the number of optimal solutions retained by the subgroup memory,For the length of the memory window in the current round of optimization,For the fitness score of the previous round,Is a very small normal number constant;
s45, pressing the memory banks of each sub-population after each round of optimizationMemorizing window length limitation, reserving a local optimal individual, and replacing an outdated historical solution;
S46, introducing an attention migration mechanism of semantic perception of the map every other timeIn the round migration period, calculating an inter-population migration attention vector based on the edge density change between the user node and the attribute node in the multi-mode heterogeneous semantic graph:
;
Wherein, the、Indicating that the sub-population embeds impact weights on the user nodes and apparel nodes,Representing the change value of the weight intensity from the user to the attribute in the map,Representing sub-populationsSub-population ofThe attention weighting coefficient of the migration,For the weight adjustment factor of the structure change of the map edge,For normalization operations, inter-population migration operations are ultimately performed:
;
Wherein, theIs a sub-populationMiddle (f)The parameters of the individual raccoons represent vectors,Is a sub-populationThe current wheel adaptability of the raccoon individual parameter vector is optimal;
S47, respectively selecting the optimal individual parameters of the current round from the two sub-populations、Merging into globally optimal parameter combinations;
S48, defining a compound fitness function of triple consistency evaluationThe following are provided:
;
Wherein, theRepresenting positive sample pairs at the firstAverage similarity in the layer semantic space,In order to recommend the reconstruction error of the multi-angle virtual try-on image of the clothing,The ranking quality index between the user behavior feedback and the recommendation ranking is calculated according to the damage accumulation gain by comparing the actual clicking, collecting and purchasing behaviors of the user with the recommendation list, and is used for measuring the matching degree of the recommendation ranking result and the actual preference of the user,,,Is a balance coefficient;
s49, sorting all raccoon individuals according to the fitness function calculation result, and determining the current global optimal parameter combination;
S410, combining the optimal parametersThe method is applied to the cross-modal comparison learning network model, and the map fusion mechanism, the semantic comparison hierarchical structure and the temperature regulation strategy are updated to output the finally optimized cross-modal comparison learning network model.
Optionally, the step S5 specifically includes:
s51, deploying the optimized cross-modal comparison learning network model into a recommendation task module, inputting a fusion feature vector and a clothing embedding vector of a user, and executing semantic similarity calculation;
S52, according to the semantic similarity calculation result, matching degree scoring and sorting are carried out on candidate clothes, and a plurality of clothes with highest matching degree are selected from the candidate clothes to form a recommendation candidate set;
S53, acquiring body type parameter information of a user, including body type dimension characteristics of height, weight, shoulder width, waistline and hip circumference, and carrying out standardized modeling on the body type parameter information of the user;
S54, inputting the image characteristics of each piece of clothing in the recommended candidate set, the body type parameters and the fusion characteristics of the user into a virtual fitting image generation unit together, and executing image synthesis operation of clothing fitting effects;
S55, in the image synthesis process, respectively setting a plurality of observation visual angles, and generating corresponding multi-angle fitting images aiming at each piece of candidate clothes;
s56, organizing the generated multi-angle image set into an interactive display interface, wherein a user can conduct visual preview on each piece of recommended clothing, and the visual preview comprises interactive operations of image switching, rotation, scaling and body type fitting effect comparison.
Optionally, the body type parameter information of the user specifically includes body type dimension characteristics of height, weight, shoulder width, waistline and hip circumference, and the body type dimension characteristics are used for driving the virtual fitting image generating unit to generate a fitting image conforming to the body type characteristics of the user.
Optionally, the step S6 specifically includes:
S61, after the user finishes the try-on image browsing of the recommended clothing, collecting behavior feedback data of the user;
S62, preprocessing and encoding the collected behavior feedback data to generate a user behavior feedback feature vector for describing the current preference change of the user;
S63, updating edge weights in the multi-mode heterogeneous semantic graphs based on the user behavior feedback feature vectors, wherein the updating comprises adjustment of connection strength between user nodes and clothing nodes and between user nodes and attribute nodes, and reflects user interest reconstruction trend;
s64, re-extracting the structure embedded representation of the user and the clothing node according to the updated multi-mode heterogeneous semantic map structure, wherein the structure embedded representation is used for generating a new semantic alignment training sample composition and comprises the relation update of a positive sample pair and a negative sample pair;
S65, periodically re-executing multi-mode semantic map modeling, cross-mode contrast learning training and raccoon optimization parameter updating, recommending and fitting image generation processes to form a dynamic self-updated recommended iteration closed loop;
And S66, after the closed loop optimization period of each round is finished, adjusting a semantic graph structure, a matching strategy and a visual synthesis mode in a recommendation process according to the accumulated user behavior data change condition, and improving adaptability and feedback response capability of personalized recommendation.
Optionally, the behavior feedback data of the user specifically includes click, browsing duration, image switching, grading and interaction information of collection, and is used for dynamically adjusting semantic map structures and training sample composition, so as to optimize personalized recommendation effects.
The beneficial effects of the invention are as follows:
According to the artificial intelligence-based virtual fitting personalized clothing recommendation method, a complete closed loop system from semantic understanding, interest matching to visual verification and feedback learning is constructed by introducing key technical components such as a multi-mode heterogeneous semantic map, a cross-mode contrast learning network model, a raccoon optimization algorithm, a virtual fitting image generation unit and the like on the basis of the prior art. Compared with the traditional recommendation system which only relies on image retrieval or collaborative filtering, the invention realizes the multi-mode end-to-end processing flow from user data acquisition to try-on image output, and truly opens up barriers between three stages of recommendation, fitting and feedback.
The multi-mode heterogeneous semantic map constructed by the invention fully integrates user images, text descriptions, behavior records and clothing attribute data, and is embedded and represented through a graph neural structure, so that the complex relationship between user interest preferences and clothing semantic tags is effectively captured. Based on the structured embedding of the atlas extraction, the atlas extraction and the multi-modal characteristics of the user are fused and then sent into a cross-modal contrast learning network, and the matching accuracy of the user-clothing pair in the semantic space is improved by using a layered semantic alignment mechanism. By introducing raccoon optimization algorithm to search and optimize the model structure and super parameters, the generalization capability and the recommendation stability of the model are obviously enhanced, and particularly, the method still has good effect in the scenes with large personality difference and serious cold start.
In addition, the recommendation candidate result is innovatively combined with the user body type parameter, and the virtual fitting image generating unit is introduced to generate the interactive multi-angle fitting image, so that a user can not only know what to recommend, but also see the wearing effect, and the user participation feeling and recommendation trust degree are greatly improved. The try-on image not only exists as a display result, but also is in linkage with clicking, scoring, collection and other behaviors of the user, so that quantifiable behavior feedback characteristics are formed. The system dynamically updates the semantic map side weights and the training sample structure through the feedback information, and periodically retrains the recommendation model, so that the self-adaptive capturing and real-time personalized recommendation capability of the user interests is realized.
In conclusion, the invention breaks through the technical bottleneck of the traditional recommendation system in the aspects of personal suitability, visual interactivity and self-optimizing capability, achieves the comprehensive optimizing effects of more accurate recommendation, more realistic fitting, more intelligent feedback and more autonomous system, and has higher practicability and popularization value.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings. The drawings are simplified schematic representations which merely illustrate the basic structure of the invention and therefore show only the structures which are relevant to the invention.
Referring to fig. 1 and 2, an artificial intelligence based virtual fitting personalized clothing recommendation method includes the steps of:
S1, acquiring image data, text description data and historical interaction behavior data of a user to generate a multi-mode user feature set;
S2, constructing a multi-mode heterogeneous semantic graph, and modeling and embedding a multi-mode heterogeneous semantic graph structure to represent;
s3, extracting structured embedded vectors of user nodes and clothing nodes from the multi-modal heterogeneous semantic graphs, inputting the structured embedded vectors and the multi-modal user feature sets into a cross-modal contrast learning network model, and carrying out semantic alignment training in a shared embedded space by constructing positive and negative sample pairs;
s4, optimizing the structural parameters and the training super parameters of the cross-modal comparison learning network model through a raccoon optimization algorithm to generate an optimized cross-modal comparison learning network model;
S5, applying the optimized cross-modal comparison learning network model to a recommendation task, calculating a semantic matching degree score between a user and clothes, generating a recommendation candidate set, inputting the recommendation candidate set and user body type parameters into a virtual fitting image generation unit together, generating a fitting image of the user wearing the recommendation clothes by combining the clothes image, and outputting an interactable multi-angle fitting view;
And S6, collecting behavior feedback data of the user on the try-on image, generating a user behavior feedback feature vector, and updating the edge weight in the multi-modal heterogeneous semantic map and the training sample composition of the cross-modal contrast learning network, and periodically executing the steps S2 to S5 to form a closed-loop personalized recommendation flow of the self-adaptive iterative optimization.
According to the virtual fitting personalized clothing recommendation method provided by the invention, a closed-loop recommendation system integrating data acquisition, multi-mode modeling, semantic alignment, intelligent optimization, visual generation and feedback learning is constructed through the steps S1 to S6, so that the virtual fitting personalized clothing recommendation method has remarkable beneficial effects. The method not only integrates the multi-mode user characteristics such as images, texts, behaviors and the like, but also constructs a heterogeneous semantic map to comprehensively express the multi-dimensional relationship between the user interests and the clothing semantics. Through the training of the cross-modal contrast learning network model and the parameter tuning of the raccoon optimization algorithm, the semantic matching accuracy and the model generalization capability are effectively improved. The system links the recommended result and the user body type parameter to generate a multi-angle try-on image with high sense of reality, and realizes direct landing from 'recommendation' to 'visual wearing'. More importantly, the behavior feedback of the user on the fitting image is acquired in real time and used for updating the semantic graph structure and training sample composition, so that a self-adaptive optimization closed loop of recommendation, fitting, feedback and recommendation is formed. The method remarkably improves the individuation level, recommendation credibility and user experience of clothing recommendation, and has good practicability and popularization value.
In this embodiment, the multi-modal user feature set is generated by fusing a visual feature vector, a semantic feature vector and a behavior feature vector, wherein the visual feature vector is generated by inputting image data of a user to a convolutional neural network for extraction, the semantic feature vector is generated by inputting text description data to a language understanding unit for extraction, and the behavior feature vector is generated by encoding historical behavior data of the user.
In this embodiment, the S2 specifically includes:
S21, constructing a node set, wherein the node set comprises user nodes, clothing nodes and clothing attribute nodes, the user nodes are used for representing user objects with individual identifications, the clothing nodes are used for representing recommended target clothing objects, and the clothing attribute nodes are used for representing label information of styles, colors, seasons and brands of clothing;
s22, establishing an edge relation between a user node and a clothing node based on user history interaction behavior data, wherein the edge relation is used for representing that clicking, collecting, purchasing and fitting behavior association exists between the user and the clothing;
S23, establishing an edge relation between the clothing nodes and the clothing attribute nodes based on the clothing metadata and the tag information, wherein the edge relation is used for representing attribute association of styles, colors and brands of clothing;
s24, setting an initial side weight for the constructed side relation, wherein the side weight is set according to the interaction frequency of the user, the behavior type and the association strength between similar labels, and is used for adjusting the recommended path and the graph nerve propagation weight;
s25, carrying out structural modeling on the multi-mode heterogeneous semantic graph by adopting a heterogeneous graph modeling method, so that heterogeneous relation information among various nodes is reserved, and meanwhile, a complete graph structural representation is established;
S26, embedding and representing the multi-modal heterogeneous semantic graph based on the graph neural network structure, and encoding various nodes in the multi-modal heterogeneous semantic graph into a structured embedded vector through a multi-layer information aggregation mechanism, wherein the embedded vector is used for inputting a cross-modal contrast learning network model.
The multi-mode heterogeneous semantic map is systematically constructed, has the capability of accurately expressing the multi-dimensional relationship between the user and the clothes, and has obvious beneficial effects. By introducing user nodes, clothing nodes and clothing attribute nodes and combining the historical click, collection, purchase and trial-pass actions of the user, an interactive graph structure with rich expression is constructed, so that a recommendation system not only can understand the user behavior, but also can identify semantic causes behind the user preference. Meanwhile, a clothing attribute relation network is constructed by using clothing metadata and tag information, so that the interpretability and classification capability of clothing semantic information are enhanced. By setting the edge weight, the system can dynamically adjust the connection weight between nodes according to the interaction frequency and the behavior intensity, and provide data support for the propagation path and the attention mechanism in the graph neural network. The introduction of the heterogeneous graph structure ensures the independence and the expandability of the node relations of different types, and the embedded representation of the graph neural network enables the node semantic features to be uniformly encoded into the trainable vector representation, so that the subsequent cross-mode contrast learning is facilitated. In the whole, the map modeling method greatly improves the expression capability of the recommendation system in the aspects of user interest modeling and semantic association learning, and provides a high-quality structural basis for subsequent recommendation effect improvement and personalized semantic alignment.
In this embodiment, the step S3 specifically includes:
S31, extracting structured embedded vectors of user nodes and clothing nodes from the multi-mode heterogeneous semantic graph, wherein the structured embedded vectors are respectively expressed asAndAnd extracting a fusion representation vector from the multimodal user feature set;
S32, introducing a map fusion gating coefficientConstructing a nonlinear gating fusion mechanism, embedding and fusing the semantic of the map structure and the modality, and generating a user representation vector:
;
Wherein, theAs a function of the Sigmoid,For the element-level multiplication to be performed,In the case of a multi-layer perceptron network,Fusing gating coefficients for the map;
S33, representing the vector of the userGarment structure embedding vectorConstructing cross-modal sample pairsConstructing a positive sample pair and a negative sample pair according to historical interaction information of a user and clothes;
s34, introducing a semantic multi-layer comparison mechanism, and setting the number of semantic comparison layer numbersConstructing a plurality of semantic granularity comparison tasks, including overall matching comparison, style attribute comparison and color semantic comparison, and performing independent loss calculation on each layer;
s35, introducing a multi-factor driving dynamic temperature control mechanism into each contrast layer to define the firstThe temperature parameters of the training iterations are:
;
Wherein, theIs used as a basic temperature super-parameter,,,A time adjustment factor, a similarity variance adjustment factor, and a loss sensitivity adjustment factor,Representing the variance of the positive and negative samples of the current batch versus the similarity,A comparative loss value representing the current lot,As a logarithmic function;
S36, training the semantic alignment of the user representation vector and the clothing representation vector in the shared embedded space by using the cross-modal contrast learning network model and minimizing the multi-layer semantic contrast loss function of the weighted fusion.
The invention provides a cross-modal contrast learning method for fusing the semantic of a map structure and multi-modal feature information, which has the remarkable beneficial effects of improving the matching precision of personalized recommended semantic and the robustness of a model. By extracting the pattern structure of the user and the clothing, embedding and combining the multi-mode user characteristics, the system realizes flexible control on pattern semantic enhancement by using a gating mechanism, so that the end user representation has both structure information and mode perception capability. By constructing positive and negative sample pairs and introducing a semantic multi-layer comparison mechanism, the system can learn semantic similarity between a user and clothes on multiple granularity levels, and the capturing capability of the implicit semantic hierarchy in user preference is remarkably improved. Meanwhile, a multi-factor driven dynamic temperature control mechanism is introduced, and temperature parameters are dynamically adjusted by combining time progress, similarity distribution and training loss, so that the gradient of contrast loss is more stable, and the model training process has more adaptability and generalization capability. Finally, the network model optimizes the representation consistency of the user and the clothes in the shared embedded space by weighting and fusing multi-layer semantic comparison loss, realizes the modeling capability of a recommendation system with richer semantic hierarchy, more accurate recommendation and quicker convergence, and provides a high-quality matching scoring basis for the generation of subsequent recommendation candidates.
In this embodiment, the S4 specifically includes:
S41, setting optimized structural parameters and training superparameters in a cross-modal contrast learning network model to form an optimized target set comprising map fusion gating coefficientsSemantic contrast layer progressionTime adjustment factorSimilarity variance adjustment factorAnd loss sensitivity adjustment factor;
S42, dividing the optimization target set into structure fusion subspaces based on parameter function attributesWith training control subspacesAnd initializing two raccoon sub-populations、Each raccoon individual represents a set of parameter combinations to be optimized;
S43, executing a local memory driving mechanism, an environment disturbance mechanism and a collaborative guiding updating mechanism of a raccoon optimization algorithm in each sub-population, performing multi-round iterative optimization on the individual raccoon population, and storing each round of optimal raccoon individual into a corresponding sub-population memory bank;
S44, introducing a dynamic memory window mechanism, and setting the length of the current iterative memory window as:
;
Wherein, theFor the initial window length to be the same,For the feedback sensitivity adjustment factor to be used,For the current round fitness score, the window length is used to control the number of optimal solutions retained by the subgroup memory,For the length of the memory window in the current round of optimization,For the fitness score of the previous round,Is a very small normal number constant;
s45, pressing the memory banks of each sub-population after each round of optimizationMemorizing window length limitation, reserving a local optimal individual, and replacing an outdated historical solution;
S46, introducing an attention migration mechanism of semantic perception of the map every other timeIn the round migration period, calculating an inter-population migration attention vector based on the edge density change between the user node and the attribute node in the multi-mode heterogeneous semantic graph:
;
Wherein, the、Indicating that the sub-population embeds impact weights on the user nodes and apparel nodes,Representing the change value of the weight intensity from the user to the attribute in the map,Representing sub-populationsSub-population ofThe attention weighting coefficient of the migration,For the weight adjustment factor of the structure change of the map edge,For normalization operations, inter-population migration operations are ultimately performed:
;
Wherein, theIs a sub-populationMiddle (f)The parameters of the individual raccoons represent vectors,Is a sub-populationThe current wheel adaptability of the raccoon individual parameter vector is optimal;
S47, respectively selecting the optimal individual parameters of the current round from the two sub-populations、Merging into globally optimal parameter combinations;
S48, defining a compound fitness function of triple consistency evaluationThe following are provided:
;
Wherein, theRepresenting positive sample pairs at the firstAverage similarity in the layer semantic space,In order to recommend the reconstruction error of the multi-angle virtual try-on image of the clothing,The ranking quality index between the user behavior feedback and the recommendation ranking is calculated according to the damage accumulation gain by comparing the actual clicking, collecting and purchasing behaviors of the user with the recommendation list, and is used for measuring the matching degree of the recommendation ranking result and the actual preference of the user,,,Is a balance coefficient;
s49, sorting all raccoon individuals according to the fitness function calculation result, and determining the current global optimal parameter combination;
S410, combining the optimal parametersThe method is applied to the cross-modal comparison learning network model, and the map fusion mechanism, the semantic comparison hierarchical structure and the temperature regulation strategy are updated to output the finally optimized cross-modal comparison learning network model.
The invention provides a cross-modal contrast learning network parameter optimization method based on an improved raccoon optimization algorithm, which mainly integrates a dynamic memory window mechanism and a sub-population attention migration mechanism of atlas semantic perception, and realizes the combined efficient optimization of structural parameters and training super-parameters. Firstly, the invention divides the model parameters into two subspaces of structural fusion and training control, and initializes raccoon sub-population respectively, so that each type of parameters can adaptively evolve in independent space. By introducing a dynamic memory window mechanism, the system can dynamically adjust the depth of the memory bank according to adaptability fluctuation in the optimization process, so that the local convergence risk of an early optimal solution is avoided. Further, the sub-population attention migration mechanism for semantic perception of the map guides weighted migration of parameter knowledge among different subgroups based on real-time change of edge weight density between a user and attribute nodes, and semantic relevance and task-crossing generalization capability of an optimization direction are effectively improved. Finally, the system jointly evaluates semantic matching precision, image generation quality and user feedback ordering indexes through a triple consistency fitness function, and ensures that the selected optimal parameter combination has omnibearing performance in a real recommended scene. Overall, the method effectively improves model optimization efficiency, adaptation capacity of a recommendation system and stability of a training process, and has high practicability and innovation.
In this embodiment, the step S5 specifically includes:
s51, deploying the optimized cross-modal comparison learning network model into a recommendation task module, inputting a fusion feature vector and a clothing embedding vector of a user, and executing semantic similarity calculation;
S52, according to the semantic similarity calculation result, matching degree scoring and sorting are carried out on candidate clothes, and a plurality of clothes with highest matching degree are selected from the candidate clothes to form a recommendation candidate set;
S53, acquiring body type parameter information of a user, including body type dimension characteristics of height, weight, shoulder width, waistline and hip circumference, and carrying out standardized modeling on the body type parameter information of the user;
S54, inputting the image characteristics of each piece of clothing in the recommended candidate set, the body type parameters and the fusion characteristics of the user into a virtual fitting image generation unit together, and executing image synthesis operation of clothing fitting effects;
S55, in the image synthesis process, respectively setting a plurality of observation visual angles, and generating corresponding multi-angle fitting images aiming at each piece of candidate clothes;
s56, organizing the generated multi-angle image set into an interactive display interface, wherein a user can conduct visual preview on each piece of recommended clothing, and the visual preview comprises interactive operations of image switching, rotation, scaling and body type fitting effect comparison.
The invention constructs a personalized recommendation execution flow integrating semantic recommendation and visual try-on, and has remarkable practicability and user experience improving effect. By disposing the optimized cross-modal comparison learning network model in the recommendation task module, the system can accurately calculate semantic similarity between the fusion characteristics of the user and the clothes embedding, and complete screening and sequencing of candidate clothes based on the matching degree, so that the recommendation result is ensured to be highly fit with the potential interests of the user. On the basis, user body shape parameter information is introduced and standardized modeling is carried out, so that recommendation is not only stopped at a semantic level, but also accurate depiction of user body shape characteristics is realized. And then, the system inputs the recommendation result and the user body type information to the virtual fitting image generation unit in a combined way, so as to generate a multi-angle wearing image with a simulation effect, and the visual experience of the recommendation content is remarkably improved. Through the multi-view output and interactive display interface, the user can carry out realism judgment and personalized decision on the recommended clothes at the visual level, so that the user trust and participation sense are enhanced. In the whole, the method not only improves the recommendation accuracy, but also opens up a recommendation closed loop from 'interest identification' to 'cross-lap verification', and has remarkable user friendliness and commercial application value.
In this embodiment, the body shape parameter information of the user specifically includes body shape dimension characteristics of height, weight, shoulder width, waistline and hip circumference, and is used to drive the virtual fitting image generating unit to generate a fitting image according with the body shape characteristics of the user.
In this embodiment, the step S6 specifically includes:
S61, after the user finishes the try-on image browsing of the recommended clothing, collecting behavior feedback data of the user;
S62, preprocessing and encoding the collected behavior feedback data to generate a user behavior feedback feature vector for describing the current preference change of the user;
S63, updating edge weights in the multi-mode heterogeneous semantic graphs based on the user behavior feedback feature vectors, wherein the updating comprises adjustment of connection strength between user nodes and clothing nodes and between user nodes and attribute nodes, and reflects user interest reconstruction trend;
s64, re-extracting the structure embedded representation of the user and the clothing node according to the updated multi-mode heterogeneous semantic map structure, wherein the structure embedded representation is used for generating a new semantic alignment training sample composition and comprises the relation update of a positive sample pair and a negative sample pair;
S65, periodically re-executing multi-mode semantic map modeling, cross-mode contrast learning training and raccoon optimization parameter updating, recommending and fitting image generation processes to form a dynamic self-updated recommended iteration closed loop;
And S66, after the closed loop optimization period of each round is finished, adjusting a semantic graph structure, a matching strategy and a visual synthesis mode in a recommendation process according to the accumulated user behavior data change condition, and improving adaptability and feedback response capability of personalized recommendation.
The invention constructs a dynamic self-adaptive optimization mechanism based on user behavior feedback drive, and remarkably improves the learning ability and long-term performance of the personalized recommendation system. According to the method, after the user finishes the try-on image browsing of the recommended clothes, behavior data such as clicking, stay time, scoring, collection and the like are timely collected and encoded into behavior feedback feature vectors for describing user preference changes. The system adjusts the side weights in the multi-mode heterogeneous semantic graph based on the feedback characteristics, so that the structural reconstruction of the user interests is realized, and the connection strength between the user nodes and the clothes or attribute nodes dynamically evolves. Meanwhile, based on the updated map structure, the system automatically updates the semantic alignment training sample, and ensures that the training process continuously fits the latest preference of the user. By periodically re-executing the map modeling, model training, parameter optimization and recommendation processes, the system builds a closed-loop architecture of recommendation, feedback, optimization and re-recommendation, and has continuous learning capability. Finally, the system can also adjust the semantic propagation path and the image synthesis strategy according to the historical behavior change, so that the individuation adaptability and the response speed are further improved. Overall, the method realizes the feedback sensing and the self-evolution capability of the system in the real sense, and has the beneficial effects of long-term individual fitting and accurate and continuous recommendation.
In this embodiment, the behavior feedback data of the user specifically includes click, browsing duration, image switching, scoring and interaction information of collection, which are used for dynamically adjusting semantic map structures and training sample composition, and optimizing personalized recommendation effects.
Example 1:
in order to verify the feasibility of the invention in implementation, the invention is applied to a large electronic commerce platform, the platform selects 200 new users which do not generate purchasing behavior within 30 days, wherein 100 people use the existing traditional clothing recommendation system as a comparison group, and 100 people use an intelligent recommendation system which is provided with the method of the invention as an experiment group, and the system completes the complete flow from clothing recommendation to simulated image generation to feedback learning in an automatic and personalized mode.
When a user accesses the platform for the first time, the experiment group user needs to upload a clear front half body photograph, fill out dressing preference and brief body type description, automatically identify image features by the platform, and generate multi-mode user feature vectors by combining user browsing history, behavior tracks and text information. The system then models semantic relationships between users and clothes based on the constructed multi-modal heterogeneous semantic graphs, and inputs a cross-modal contrast learning network model to perform semantic alignment training. The model regulated by raccoon optimization algorithm is used for generating personalized clothing recommendation candidate sets, and the system generates multi-angle fitting images for the recommended clothing after combining body type parameters such as the height, weight, shoulder width, waistline, hip circumference and the like of the user, so that the user can interactively browse and fit on line.
Compared with the traditional system, the system provided by the invention has the advantages that the recommendation quality and the user experience are obviously improved. From experimental monitoring data, the recommended Top-3 accuracy rate in the experimental group reaches 85.1%, the accuracy rate is improved by about 16 percent compared with the traditional system, the click rate of the virtual fitting image reaches 75.2%, and the click rate is far higher than that of the control group by 42.6%. The average browsing time of the user on the try-on image page is increased from 36 seconds to 59 seconds, the high attention of the user on the try-on image content is displayed, the purchase conversion rate is increased from 32.8% to 42.1%, the body type satisfaction degree score is also increased from 3.9 to 4.7 minutes, and the user commonly feeds back the try-on effect is real and the matching degree is high. In addition, the average number of behavior feedback generated by each user in the experimental group is 22 times and is 1.7 times that of the control group, so that sufficient data is provided for subsequent self-adaptive optimization of the system.
Table 1 key index comparison table of the inventive system and conventional recommendation system
According to the data in table 1, it can be clearly seen that the personalized virtual fitting recommendation system provided by the invention is comprehensively superior to the traditional recommendation system in terms of a plurality of core indexes, and shows remarkable performance advantages and user experience improvement. Firstly, in the aspect of recommending the Top-3 accuracy, the system reaches 85.1%, and the traditional recommendation system is only 69.1%, and the lifting amplitude is up to 16 percentage points. The recommendation model trained by the atlas modeling, cross-modal contrast learning and raccoon optimization algorithm can more accurately identify the user interests and match the user interests to the most suitable clothing commodity, and the recommendation accuracy is remarkably improved.
At the rate at which the user clicks on the virtual fitting image, the experimental group user click rate was 75.2%, while the control group was only 42.6%. The difference reflects that the virtual fitting image generated by the invention has higher attraction and interaction value, can effectively guide the user to explore the recommendation result deeply, and is a direct embodiment of the simulation and individuality suitability of the image generation unit. The average browsing time length of the user in recommending the try-on page is also increased from 36 seconds to 59 seconds of the traditional system, so that the system disclosed by the invention can attract the user to pay attention to the recommended content more permanently, and the participation degree of the user and the system viscosity are improved. This "immersive recommendation" experience is particularly critical to promoting conversion.
The system of the invention also has obvious advantages in purchasing conversion, reaching 42.1%, and improving the conversion rate by nearly 10 percent compared with 32.8% of the traditional system. This means that more accurate recommendation and visual try-on effect is helpful for users to make more confident purchasing decisions, and commodity sales potential is greatly improved from a commercial perspective.
In the aspect of subjective experience, the body type satisfaction degree of a user on a try-on image is scored to be 4.7 minutes (full score of 5 minutes), and the traditional system is 3.9 minutes, which shows that the wearing image generated by combining the body type parameters of the user in the method is more fit with the real body shape of the user, and the acceptance degree and satisfaction degree of the user are improved.
Finally, in the aspect of behavior feedback number, the system provided by the invention has 22 times of interaction behaviors generated by users on average per person, and the traditional system is only 13 times, so that the system is proved to not only enhance the interaction activity of the users, but also accumulate more high-quality training data for the subsequent recommendation optimization of the system, and strengthen the continuous learning capability of the model.
In the system, through introducing multi-mode feature fusion, semantic map modeling, intelligent optimization algorithm and visual fitting experience, the accuracy and individuation degree of a recommendation result are improved, the interactive experience and commercial conversion value of a user are enhanced, and the technical feasibility and market popularization potential of the method in a real application scene are fully verified.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.