Disclosure of Invention
The invention aims to solve the technical problem of providing a commodity graph background changing method, device, equipment and medium based on Lora model training, which improves the drawing efficiency and reduces the drawing cost.
In a first aspect, the invention provides a commodity graph background-changing method based on Lora model training, which comprises the following steps:
Step 1, acquiring a first commodity image of various commodities, wherein the first commodity image is a picture with a set shooting angle, and comprises a plurality of pictures of the same commodity in different proportions in the picture, so as to obtain training data;
Step 2, labeling each first commodity graph through a multi-mode model according to a background label with a set format to form a corresponding label file;
step 3, setting the training round number, the learning rate and the training image pixels, inputting training data and a label file, and performing model training to obtain a required background Lora model;
Step 4, carrying out image matting on a second commodity image with a background to be replaced to obtain a commodity main body mask;
And 5, loading the background Lora model in a graphic algorithm, inputting the edge information, the background prompt words and the commodity main body mask into the graphic algorithm, and generating a commodity replacement map, wherein the background prompt words correspond to the background labels with the set formats.
In a second aspect, the present invention provides a commodity graph background-changing device based on Lora model training, including:
The system comprises a training data acquisition module, a data acquisition module and a data processing module, wherein the training data acquisition module acquires first commodity diagrams of various commodities, the first commodity diagrams are pictures with preset shooting angles, and the first commodity diagrams comprise a plurality of pictures of the same commodity in different proportions in the pictures, so that training data are obtained;
a label module is arranged, and each first commodity image is labeled through a multi-mode model according to a background label with a set format to form a corresponding label file;
The training model module is used for setting the training round number, the learning rate and training image pixels, inputting training data and a label file, and performing model training to obtain a required background Lora model;
The image acquisition module is used for carrying out image matting on a second commodity image with a background to be replaced, and acquiring a commodity main body mask;
and the picture generation module loads the background Lora model in a picture generation algorithm, inputs the edge information, the background prompt words and the commodity main body mask into the picture generation algorithm, and generates a commodity replacement picture, wherein the background prompt words correspond to the background labels with the set formats.
In a third aspect, the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect when executing the program.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of the first aspect.
The one or more technical schemes provided by the invention have at least the following technical effects or advantages:
The invention successfully solves the background changing requirement of commodity graphs of commodity sellers, sellers can generate exquisite commodity graphs only by simply shooting themselves, greatly improves the efficiency of manufacturing commodity graphs, reduces the operation cost of sellers, and does not need to use professional shooting teams, cushion graph editing of art designing and the like;
the light and shadow optimization method can greatly improve commodity synthesis efficiency, can effectively solve the problem of light and shadow deficiency which easily occurs in the traditional light and shadow synthesis algorithm, and enables the generated commodity graph to have higher authenticity, fusion effect and illumination effect.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Detailed Description
The technical scheme in the embodiment of the application has the following overall thought:
(1) The data preprocessing needs various commodity graphs, wherein various angles of upward shooting, horizontal shooting and downward shooting are needed to be included, various backgrounds are needed to be included, commodities with different size proportions, namely, commodity partial areas occupy the proportion of the whole picture, double-cube sharpening interpolation scaling (shrinking or enlarging) is carried out on the commodity graphs, all images are scaled to 1024 x 1024, definition optimization and the like, and the step needs to screen various commodity graphs, including various different backgrounds.
(2) And (3) marking data, and carrying out detailed structural description on the processed commodity graph main body and the background, wherein the formats comprise shooting angles, background element details, commodity main body colors and commodity main body names, and the formatted marking can greatly improve the effect of the prompt words in the use of the Lora model.
(3) And constructing a data set, and corresponding the text labels to the commodity images one by one.
(4) Model training, namely training a Lora model by using a Lora training script, setting model training parameters before the model, setting and adjusting the model training parameters, setting the total training wheel number to 15000, setting the learning rate to 0.0001, setting the training image size to 1024, and running the script to start model training by one key.
(5) Loading the trained commodity graph Lora in a graph generation algorithm, setting the loading weight of the Lora to 0.75, calling a canny algorithm, and adjusting the weight parameter weight to 0.9, wherein the main purpose of the parameter is to control the influence intensity of the canny algorithm on edge information extraction. And transmitting the mask image and the prompt word with the scene and the illumination description into the icon together for mask redrawing to obtain the commodity replacement diagram.
(6) The commodity replacement diagram is transferred into Ic-light to carry out illumination optimization of an image by using a prompt word again, or a back-removed diagram and a background diagram in the commodity replacement diagram are obtained, the back-removed diagram is enhanced through Gamma transformation, the brightness of a commodity main body in the back-removed diagram is improved by a first set value to obtain an enhanced diagram, the background diagram is mapped to an HSV space, a V space in the HSV space is reduced by a second set value and then mapped back to an RGB space to obtain a background darkening diagram, the enhanced diagram and the background darkening diagram are synthesized to obtain a first synthesized diagram, first latent information is extracted from the first synthesized diagram through VAE decoding, the back-removed diagram and the background diagram are synthesized to obtain a second synthesized diagram, a ControlNet model is adopted to extract first set information from the second synthesized diagram, the first latent information and the first set information are sent to a diffusion model to generate a first shadow guiding diagram, second lantent information is extracted from the first shadow guiding diagram, the second lantent information and the first set information are sent to a diffusion model to generate a second shadow guiding diagram, and the second shadow guiding diagram is reserved on the first shadow guiding diagram through a high shadow guiding algorithm, and then the commodity migration algorithm is obtained by adopting the intermediate shadow guiding diagram.
Model training, namely the training process of the Lora model, lora is a model fine tuning technology, and the quantity of parameters required for fine tuning is reduced by inserting a low-rank matrix into a pre-trained large model, so that the training efficiency is improved and overfitting is avoided. In the application scene of the commodity image, the Lora training is used for generating or optimizing the commodity image of the electronic commerce, so that the commodity image can be better adapted to a specific background.
And (3) extracting a mask image of the commodity image to be replaced to be used as a mask of the primitive image, so as to ensure that the commodity main body is unchanged during redrawing.
The Canny algorithm is an image processing technology, is mainly used for edge detection, namely, identifying edges in images, and can extract clear and accurate edge information from the images. And combining img2img pictorial drawings to limit the commodity body edge and ensure the consistency of commodity body edge information after background replacement and the commodity diagram edge to be replaced.
Ic-light can control the illumination of the image through the prompt words, so that the illumination of the foreground main body is consistent with that of the background environment, and the foreground main body and the background environment are integrated, and the illumination of the image is affected by properly giving the illumination prompt words by using the prompt words, so that the fusion effect of the commodity main body and the background is improved.
Example 1
As shown in fig. 1, the embodiment provides a commodity graph background-changing method based on the Lora model training, which comprises the following steps:
Step 1, acquiring a first commodity image of various commodities, wherein the first commodity image is a picture with a set shooting angle, and comprises a plurality of pictures of the same commodity in different proportions in the picture, so as to obtain training data;
Step 2, labeling each first commodity graph through a multi-mode model according to a background label with a set format to form a corresponding label file;
step 3, setting the training round number, the learning rate and the training image pixels, inputting training data and a label file, and performing model training to obtain a required background Lora model;
Step 4, carrying out image matting on a second commodity image with a background to be replaced to obtain a commodity main body mask;
And 5, loading the background Lora model in a graphic algorithm, inputting the edge information, the background prompt words and the commodity main body mask into the graphic algorithm, and generating a commodity replacement map, wherein the background prompt words correspond to the background labels with the set formats.
In this embodiment, preferably, the method further includes step 6 of obtaining a back-removed image and a background image in the commodity replacement image, where the back-removed image is an image of a commodity main body portion remaining after the background is deleted in the commodity replacement image, enhancing the back-removed image by Gamma conversion, increasing brightness of the commodity main body in the back-removed image by a first set value to obtain an enhanced image, mapping the background image to HSV space, reducing V space in HSV space by a second set value, then mapping the V space back to RGB space to obtain a background darkened image, synthesizing the enhanced image and the background darkened image to obtain a first synthesized image, extracting first latent information from the first synthesized image by VAE decoding, synthesizing the back-removed image and the background image to obtain a second synthesized image, extracting first setting information from the second synthesized image by using a control net model, transmitting the first 383256 information and the first setting information to a diffusion model to generate a first shadow guide image, extracting second lantent information from the first shadow guide image, transmitting the second lantent information and the first setting information to the diffusion model to generate a second shadow guide image, and then optimizing the second shadow guide image by using a contrast algorithm to obtain an intermediate image, and obtaining the intermediate image.
In the embodiment, the step 1 specifically includes obtaining a first commodity image of each commodity, where the first commodity image is a picture with a set shooting angle, and the first commodity image includes a plurality of pictures of the same commodity in different proportions in the picture, and scaling all the first commodity images so that the size of each first commodity image is 1024 x 1024, and obtaining training data.
In the embodiment, the step 4 specifically includes that a second commodity image with a background to be replaced is scratched to obtain a commodity main mask, and a canny algorithm is called to extract edge information of the second commodity image, wherein the weight parameter weight of the canny algorithm is 0.9;
The step 5 specifically includes loading the background Lora model in a graphic algorithm, setting the loading weight of the background Lora model to be 0.75, inputting the edge information, the background prompt word and the commodity main mask into the graphic algorithm, and generating a commodity replacement map, wherein the background prompt word corresponds to the background label with the set format.
Based on the same inventive concept, the application also provides a device corresponding to the method in the first embodiment, and the details of the second embodiment are shown.
Example two
As shown in fig. 2, in this embodiment, a commodity graph background replacing device based on the Lora model training is provided, including:
The system comprises a training data acquisition module, a data acquisition module and a data processing module, wherein the training data acquisition module acquires first commodity diagrams of various commodities, the first commodity diagrams are pictures with preset shooting angles, and the first commodity diagrams comprise a plurality of pictures of the same commodity in different proportions in the pictures, so that training data are obtained;
a label module is arranged, and each first commodity image is labeled through a multi-mode model according to a background label with a set format to form a corresponding label file;
The training model module is used for setting the training round number, the learning rate and training image pixels, inputting training data and a label file, and performing model training to obtain a required background Lora model;
The image acquisition module is used for carrying out image matting on a second commodity image with a background to be replaced, and acquiring a commodity main body mask;
and the picture generation module loads the background Lora model in a picture generation algorithm, inputs the edge information, the background prompt words and the commodity main body mask into the picture generation algorithm, and generates a commodity replacement picture, wherein the background prompt words correspond to the background labels with the set formats.
In this embodiment, preferably, the method further includes an optimizing light and shadow module, obtaining a back-removed image and a background image in the commodity replacement image, wherein the back-removed image is a picture of a commodity main body part remaining after a background is deleted in the commodity replacement image, enhancing the back-removed image through Gamma transformation, improving brightness of the commodity main body in the back-removed image by a first set value to obtain an enhanced image, mapping the background image to an HSV space, reducing a second set value in the V space, then mapping the V space back to the RGB space to obtain a background darkening image, synthesizing the enhanced image and the background darkening image to obtain a first synthesized image, extracting first latent information from the first synthesized image through VAE decoding, synthesizing the back-removed image and the background image to obtain a second synthesized image, extracting first set information from the second synthesized image by a control net model, transmitting the first latent information and the first set information to a diffusion model to generate a first light and shadow guiding image, extracting second lantent information from the first light and shadow guiding image, transmitting the second lantent information and the first set information to the diffusion model to generate a high-contrast algorithm guiding image, and obtaining intermediate image migration algorithm by the second image.
In this embodiment, the training data acquisition module is preferably configured to acquire a first commodity image of each commodity, where the first commodity image is a picture with a set shooting angle, and the first commodity image includes a plurality of pictures of the same commodity in different proportions in the picture, and scale all the first commodity images so that the size of each first commodity image is 1024 x 1024, to obtain training data.
In the embodiment, the picture acquisition module specifically performs matting on a second commodity image with a background to be replaced to acquire a commodity main body mask thereof, calls a canny algorithm to extract edge information of the second commodity image, and the weighting parameter weight of the canny algorithm is 0.9;
The image generation module specifically loads the background Lora model in an image generation algorithm, the loading weight of the background Lora model is set to be 0.75, the edge information, the background prompt word and the commodity main body mask are input into the image generation algorithm, and a commodity replacement image is generated, wherein the background prompt word corresponds to the background label with the set format.
Since the device described in the second embodiment of the present invention is a device for implementing the method described in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the device, and thus the detailed description thereof is omitted herein. All devices used in the method according to the first embodiment of the present invention are within the scope of the present invention.
Based on the same inventive concept, the application provides an electronic device embodiment corresponding to the first embodiment, and the details of the third embodiment are shown in the specification.
Example III
The present embodiment provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where any implementation of the first embodiment may be implemented when the processor executes the computer program.
Since the electronic device described in this embodiment is a device for implementing the method in the first embodiment of the present application, those skilled in the art will be able to understand the specific implementation of the electronic device and various modifications thereof based on the method described in the first embodiment of the present application, so how the electronic device implements the method in the embodiment of the present application will not be described in detail herein. The apparatus used to implement the methods of embodiments of the present application will be within the scope of the intended protection of the present application.
Based on the same inventive concept, the application provides a storage medium corresponding to the first embodiment, and the detail of the fourth embodiment is shown in the specification.
Example IV
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, can implement any of the implementation modes of the embodiment.
The technical scheme provided by the embodiment of the application has at least the following technical effects or advantages:
the embodiment successfully solves the background changing requirement of the commodity image of the commodity seller, the seller only needs to simply shoot the commodity image, then the shot commodity image is used for generating the required synthetic image by using the technical scheme, the efficiency of manufacturing the commodity image is greatly improved, the operation cost of the seller is reduced, and professional shooting team, the cushion image editing of an artist and the like are not needed;
the light and shadow optimization method can greatly improve commodity synthesis efficiency, can effectively solve the problem of light and shadow deficiency which easily occurs in the traditional light and shadow synthesis algorithm, and enables the generated commodity graph to have higher authenticity, fusion effect and illumination effect.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.