Disclosure of Invention
The invention provides an AI model generation method, an AI model generation system and a storage medium based on artificial intelligence, which are used for solving the technical problem of non-ideal model generation effect in the prior art.
In order to achieve the above object, the following AI model generation method based on artificial intelligence is provided, which mainly comprises the following steps:
step S1: setting a generation mode of an image, inputting a target bottom layer picture if the generation mode based on the imported bottom layer picture is selected, and inputting a description text if the generation mode based on the text description is selected, wherein the description text consists of keywords which are derived from a preset keyword word stock;
step S2: setting style words, wherein the style words are obtained based on different types of style fragments, each style fragment corresponds to a category of word stock, each category of word stock comprises a plurality of affix intervals, each affix interval comprises a plurality of words with similar semantics, different numbers of style fragments correspond to different affix intervals, and when the number of style fragments of the same type is determined, one word is extracted from the word stock corresponding to the style fragment of the type and the word affix interval corresponding to the number of style fragments of the type to serve as the style word;
step S3: setting an abstract threshold, and calculating an abstract value of a generated image based on a matching formula after finishing setting all the style words, wherein the matching formula is as follows:
wherein A is
n For the affix section in which the extracted style word is located, ++>
For the affix interval A
n-1 With affix interval A
n If the abstract value of the generated image is larger than the abstract threshold, judging whether to continue to generate the image, if yes, continuing to execute the step S4, and if not, resetting the style word based on the step S2;
step S4: forming a first sentence pattern by a plurality of style words selected by various style fragments according to a generation mode of importing the bottom layer picture, generating a style picture according to the first sentence pattern, analyzing the bottom layer picture, marking pixel points of the bottom layer picture, splitting the bottom layer picture into a plurality of independent first pixel areas according to the marks, splitting the style picture into a second pixel area corresponding to the first pixel area, determining the first pixel area which needs to be changed of the bottom layer picture, transferring and transforming the second pixel area corresponding to the style picture into the bottom layer picture according to an image generation model, generating a target image, inserting the style words into a description text according to a generation mode of character description, generating a second sentence pattern, and generating the target image according to the second sentence pattern by the image generation model;
step S5: the target image generated in step S4 is acquired, and the target image is converted into a vector image.
Further, in step S4, the step of migrating the style picture to the bottom picture includes the following steps:
determining first pixel areas of a bottom layer picture to be changed, setting weights occupied by all the first pixel areas in a target image, if all the first pixel areas of the bottom layer picture are required to be changed, transferring second pixel areas of the style picture to the bottom layer picture based on the weights set by all the first pixel areas, if only part of the first pixel areas of the bottom layer picture are required to be changed, extracting the first pixel areas to be changed from the bottom layer picture, generating an original picture and a plurality of independent first sub-pictures, extracting the second pixel areas corresponding to the first pixel areas from the style picture, forming a plurality of independent second sub-pictures, transferring the second sub-pictures to the first sub-pictures based on the weights set by all the first sub-pictures, forming a plurality of sub-target images, splicing the original picture with all the sub-target images, and generating a target image.
Further, generating the vector image includes the steps of:
step S51: obtaining a target image, setting a super-resolution conversion model, inputting the target image into the super-resolution conversion model to obtain a super-resolution target image, converting the super-resolution target image into a first image based on the vector conversion model, converting the first image into a second image based on the grating conversion model, wherein the first image is a vector image, the second image is a grating image, the image generation model identifies the target image and the second image, obtaining a first identification result and a second identification result, judging whether the first identification result is identical with the second identification result, outputting the first image as the target image if yes, and executing step S52 if no;
step S52: the image generation model changes the noise added in the generation process and regenerates the target image, step S51 is executed again, and if the first image regenerated based on step S51 cannot still be output as the target image, the image generation model changes the noise again and regenerates the target image, and steps S51 and S52 are executed repeatedly until the first image is output as the target image.
Further, a noise learning model is arranged in the image generation model, the noise learning model takes a conversion algorithm of the target image and the vector conversion model and the second recognition result of the first image as input, noise generated by the image generation model as output, the relation between the conversion algorithm of the vector conversion model and the noise and the second recognition result is learned, when the image generation model needs to regenerate the target image, the conversion algorithm of the target image and the vector conversion image and the second recognition result are input into the noise learning model, the target noise is acquired, and the image generation model regenerates the target image based on the target noise.
Further, a super-resolution conversion model is established based on the following steps:
obtaining a high-resolution learning image set, wherein images in the learning image set are high-resolution images, converting the images in the learning image set into low-resolution images, obtaining a low-resolution learning image set, inputting the images in the low-resolution learning image set into a super-resolution conversion model, generating a super-resolution image, obtaining differences between the super-resolution image and the corresponding high-resolution images in the learning image set based on a contrast algorithm, and adjusting parameters of the super-resolution conversion model based on the differences until the differences are within an allowable range.
The invention also provides an AI model generation system based on artificial intelligence, which is used for realizing the model generation method as set forth in any one of the above, and comprises the following steps: the input module is used for importing a bottom layer picture or inputting a description text;
the style word generation module is internally provided with a plurality of style fragment slots, each style fragment slot can be filled with at most one type of a plurality of style fragments, and after the style fragments in all style fragment slots are filled, the corresponding style words are extracted from the word stock;
the judging module is used for calculating an abstract value of an image generated based on the style words based on the matching formula and judging whether the style words are required to be regenerated by using the style word generating module;
the semantic generation module is used for generating a first sentence pattern based on the selected style words or importing the style words into the descriptive text to generate a second sentence pattern;
the image generation module is used for generating the style picture based on the first sentence pattern or generating a target image based on the second sentence pattern;
the style migration module migrates the style picture to the bottom picture to generate a target image;
and the format conversion module is used for converting the target image into a vector image.
The invention also provides a computer storage medium comprising computer readable instructions which, when read and executed by a computer, cause the computer to perform the model generation method as claimed in any one of the preceding claims.
The beneficial effects of the invention are as follows:
the invention has two model generation modes, so that a user can select the descriptive text input by the user to directly generate the model, and can also perform style transformation on the basis of importing pictures by the user, thereby obtaining the model required by the user, and realizing the diversification of the model generation modes. And by means of a style word selection mechanism of style fragments, a user can supplement own imagination by means of the style fragments and bind imaging styles wanted by the user with corresponding words, so that descriptive texts of images can be rapidly and accurately generated, and imaging quality of the images is guaranteed.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
As shown in fig. 1, the AI model generation method based on artificial intelligence includes:
step S1: setting a generation mode of an image, inputting a target bottom layer picture if the generation mode based on the imported bottom layer picture is selected, and inputting a description text if the generation mode based on the text description is selected, wherein the description text consists of keywords which are derived from a preset keyword word stock;
in this embodiment, when the bottom layer picture is imported, the local picture stored in the importing storage may be selected, or the picture may be obtained from the network to be imported.
Step S2: setting style words, wherein the style words are obtained based on different types of style fragments, each style fragment corresponds to a class of word stock, each class of word stock comprises a plurality of affix intervals, each affix interval comprises a plurality of words with similar semantics, different numbers of style fragments correspond to different affix intervals, and when the number of style fragments of the same type is determined, one word is extracted from the word stock corresponding to the style fragment of the type and the word affix interval corresponding to the number of style fragments of the type to serve as a style word;
as an example, the present embodiment provides six different kinds of style pieces, and limits the upper limit of the selected kinds of style pieces to 3 kinds, and limits the selected number of each style piece to 9; the word stock corresponding to the six style fragments is respectively of a shape, adjective, a place, a subject, a platform name and an artistic style, each word stock comprises nine affix sections, for example, a first affix section of the place word stock comprises words such as grasslands, wild fields and the like, and a second affix section comprises words such as oceans, seas, wands and the like; in actual operation, if style fragments representing a place are selected and the number is determined to be 2, one word is extracted from the second affix interval to serve as a style word, the extraction mode can display words contained in the corresponding affix interval to a user, the user can actively select the words, the style word can be obtained through random extraction, the imaging style wanted by the user can be bound with the corresponding words through the step, and therefore descriptive text of an image can be generated rapidly and accurately, and imaging quality of the image is guaranteed.
Step S3: setting an abstract threshold, and calculating an abstract value of a generated image based on a matching formula after finishing setting all style words, wherein the matching formula is as follows:
wherein A is
n For the affix section in which the extracted style word is located, ++>
For the affix interval A
n-1 With affix interval A
n Is used for the control of the number of contradictions,if the abstract value of the generated image is larger than the abstract threshold, judging whether to continue to generate the image, if yes, continuing to execute the step S4, and if not, resetting the style word based on the step S2;
because the style words are selected in the word stock, the words of different word stocks can be combined in advance to carry out a generation test, and the matching degree between different words can be obtained according to a final imaging result; the specific calculation principle of the matching formula is described below, for example, in this embodiment, the abstract threshold is set to 10, 1 adjective, 2 places and 3 style fragments of the subject are selected, then words are selected from the prefix section containing the brilliant word, the prefix section containing the grassland and the prefix section containing the taboo, and then the calculated score based on the matching formula is:
wherein the method comprises the steps of
Is a conflict value between the brilliant and the grassland, < >>
Is the contradiction value between grassland and Sipunk, < >>
For conflict values of splendid and taboo, the taboo indicates that the environment is mechanized and the background is darker, and the conflict with grasslands and splendid is higher, so that the conflict value is higher; through the step, the user can know the probability of generating the abstract image by the currently selected style words.
Step S4: if a generation mode based on leading-in of the bottom layer picture is selected, forming a first sentence by a plurality of style words selected by a plurality of style fragments, generating a style picture based on the first sentence, analyzing the bottom layer picture, marking pixel points of the bottom layer picture, splitting the bottom layer picture into a plurality of independent first pixel areas based on the marks, splitting the style picture into a second pixel area corresponding to the first pixel area, determining the first pixel area which needs to be changed of the bottom layer picture, transferring and transforming the second pixel area corresponding to the style picture into the bottom layer picture based on an image generation model, generating a target image, if the generation mode based on word description is selected, inserting the style words into a description text, generating the second sentence, and generating the target image based on the second sentence by the image generation model;
in particular embodiments, the image generation model may be built based on a variety of generation countermeasure networks, GAN, DCGAN, styleGAN, and the like. The GAN network consists of a generator and a discriminator, wherein the generator is used for generating false images in the training process, the discriminator is used for identifying the true and false of the images, the two networks are mutually opposed and mutually trained until the false images generated by the generator cannot be identified to be true or false by the discriminator.
In this embodiment, a model is built based on a CNN (convolutional neural network) to realize splitting of a bottom layer picture, for example, the bottom layer picture contains a portrait, a beach and an umbrella, the bottom layer picture is input into the convolutional neural network, so that the bottom layer picture is split into three first pixel areas and is defined as a portrait, a background and an accessory respectively, then a style picture is also split into three second pixel areas with the categories of the portrait, the background and the accessory, and finally the style of the second pixel areas is migrated into the first pixel areas based on an image generation model, so that a target image is generated; through the step, the style picture can accurately perform style transformation on the corresponding area of the bottom image, so that the generation accuracy of the target model is improved.
Step S5: the target image generated in step S4 is acquired, and the target image is converted into a vector image.
At present, the resolution of an image generated by an antagonism network is not high, the image is a grating image, a distortion phenomenon can be generated when the grating image is amplified, a vector image has the characteristic that the method cannot be distorted, and the objective of a metauniverse is to realize the real display of a virtual character and an environment, so that the imaging effect of the image can be ensured by converting the objective image into the vector image.
The invention has two model generation modes, so that a user can select the descriptive text input by the user to directly generate the model, and can also perform style transformation on the basis of importing pictures by the user, thereby obtaining the model required by the user, and realizing the diversification of the model generation modes. And by means of a style word selection mechanism of style fragments, a user can supplement own imagination by means of the style fragments and bind imaging styles wanted by the user with corresponding words, so that descriptive texts of images can be rapidly and accurately generated, and imaging quality of the images is guaranteed.
After the target image is generated, the user is dissatisfied with a partial area in the target image, when other areas are required to be reserved and the partial area needs to be modified, no more proper modification mode exists at present, because the generation of the meta-universe model is usually combined with the blockchain, namely, once the generated image is exported, the generated image is recorded by the blockchain, so that the user cannot randomly change the image outside the system based on own preference, namely, after the target image is exported, the image is recorded by the blockchain, and therefore, the mode of modifying and splicing the target image by using software such as Phonsthop by the user is not applicable. The invention therefore proposes the following method:
in step S4, the migration of the style picture to the bottom picture includes the following steps:
determining first pixel areas of the bottom layer picture to be changed, setting weights occupied by all the first pixel areas in the target image, if all the first pixel areas of the bottom layer picture are required to be changed, moving second pixel areas of the style picture to the bottom layer picture based on the weights set by all the first pixel areas, if only part of the first pixel areas of the bottom layer picture are required to be changed, extracting the first pixel areas to be changed from the bottom layer picture, generating an original picture and a plurality of independent first sub-pictures, extracting the second pixel areas corresponding to the first pixel areas from the style picture, forming a plurality of independent second sub-pictures, moving the second sub-pictures to the first sub-pictures based on the weights set by all the first sub-pictures, forming a plurality of sub-target images, splicing the original picture with all the sub-target images, and generating the target image.
Specifically, the splitting and splicing of the images are realized based on the CNN network, and the generation control of the target image is realized to a certain extent by setting the occupation weight for each first pixel region, wherein the higher the occupation weight is, the lower the degree of the first pixel region changed by the style picture is; on the basis of combining the step S4, through the steps, the change area of the original bottom layer picture can be directly selected, and when the target image is generated, the target image can be imported into the image generation model again as the bottom layer picture under the condition that the target image is not exported, the partial area is changed, a new target image is generated, and finally the modification of the partial area of the target image is completed.
In the prior art, the conversion of the grating image into the vector image generally adopts conversion software such as vector and the like, and the software can be directly accessed into the system, but the vector image after the conversion of the software can generate partial image area blurring, so that partial information is lost, and the conversion effect is poor, therefore, the invention provides the following steps:
step S51: obtaining a target image, setting a super-resolution conversion model, inputting the target image into the super-resolution conversion model to obtain a super-resolution target image, converting the super-resolution target image into a first image based on the vector conversion model, converting the first image into a second image based on the grating conversion model, wherein the first image is a vector image, the second image is a grating image, generating a model to identify the target image and the second image, obtaining a first identification result and a second identification result, judging whether the first identification result is the same as the second identification result, outputting the first image as the target image if yes, and executing step S52 if not;
the super-resolution conversion model can enable the target image to be converted from lower resolution to higher resolution, and when the resolution of the target image is too low, the converted vector image cannot become clear, so that the situation can be avoided by setting the super-resolution conversion model. If the target image and the first image are directly identified, a new generation countermeasure network needs to be reestablished and trained due to the difference of the image formats of the target image and the first image, and therefore the second image and the target image are both raster images through converting the first image into the second image again, and the identification of the second image and the target image can be completed based on the original image generation module; further, since the target image is the image output by the image generation model, the first recognition result is necessarily true, if the second recognition result is the same as the first recognition result, the second image is proved to be close to the target image, and the first image based on the second image generation source has no obvious loss in the conversion process and can be output as the target image; on the contrary, the first image proves that the image information is seriously damaged in the conversion process, and needs to be regenerated.
Step S52: the image generation model changes the noise added in the generation process and regenerates the target image, step S51 is executed again, and if the first image regenerated based on step S51 cannot still be output as the target image, the image generation model changes the noise again and regenerates the target image, and steps S51 and S52 are executed repeatedly until the first image is output as the target image.
Because the generation effect is poor, the target image needs to be adjusted, noise is an indispensable factor for generating an countermeasure network, the noise can be an element such as freckle and hairline on a face, the target image is regenerated after the noise is changed, and when the noise randomly generated in an image generation model is more matched with a conversion algorithm in conversion software, the conversion effect of the target image in the conversion process is improved, so that the information damage in the conversion process is avoided; through the steps, the target image with the raster image can be converted into the vector image, and the situation that the picture information is damaged in the conversion process of the image can be prevented.
The image generation model is provided with a noise learning model, the noise learning model takes a conversion algorithm of a target image and a vector conversion model and a second recognition result of a first image as inputs, noise generated by the image generation model as outputs, the relation between the conversion algorithm of the vector conversion model and the noise and the second recognition result is learned, when the image generation model needs to regenerate the target image, the conversion algorithm of the target image and the vector conversion image and the second recognition result are input into the noise learning model, the target noise is acquired, and the image generation model regenerates the target image based on the target noise.
In this embodiment, the noise learning model may be built based on a BP neural network, and the relationship between the conversion algorithm of the vector conversion model and the noise in the image generation model may be obtained by setting the learning model with noise, so that after the target image is obtained, the noise learning model may quickly obtain suitable noise based on the conversion algorithm of the vector conversion model, thereby accelerating the generation times and convergence speed of the image generation model, and shortening the time for obtaining the target image.
Establishing a super-resolution conversion model based on the following steps:
the method comprises the steps of obtaining a high-resolution learning image set, converting an image in the learning image set into a low-resolution image, obtaining the low-resolution learning image set, inputting the image in the low-resolution learning image set into a super-resolution conversion model, generating a super-resolution image, obtaining a difference between the super-resolution image and a corresponding high-resolution image in the learning image set based on a contrast algorithm, and adjusting parameters of the super-resolution conversion model based on the difference until the difference is within an allowable range.
Specifically, the comparison algorithm can be a ssim algorithm, and the super-resolution conversion model can be built only by a group of learning image sets through the steps, so that the time for acquiring the training set is reduced.
The invention also provides an AI model generation system based on artificial intelligence, which is used for realizing the model generation method of any one of the above steps, comprising: the input module is used for importing a bottom layer picture or inputting a description text;
the style word generation module is internally provided with a plurality of style fragment slots, each style fragment slot can be filled with at most one type of a plurality of style fragments, and after the style fragments in all style fragment slots are filled, the corresponding style words are extracted from the word stock;
the judging module is used for calculating abstract values of the generated images based on the style words based on the matching formulas and judging whether the style words are required to be regenerated by the style word generating module;
the semantic generation module is used for generating a first sentence pattern based on the selected style words or importing the style words into the descriptive text to generate a second sentence pattern;
the image generation module is used for generating a style picture based on the first sentence pattern or generating a target image based on the second sentence pattern;
the style migration module migrates the style picture to the bottom picture to generate a target image;
and the format conversion module is used for converting the target image into a vector image.
The present invention also provides a computer storage medium comprising computer readable instructions which, when read and executed by a computer, cause the computer to perform the model generation method of any one of the above.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a non-transitory computer readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as the scope of the description of the present specification as long as there is no contradiction between the combinations of the technical features.
The foregoing examples have been presented to illustrate only a few embodiments of the invention and are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.