Disclosure of Invention
The disclosure provides an image style migration method, a model, an apparatus, an electronic device, a computer readable storage medium and a computer program product, so as to at least solve the problem that local information and global information of an image cannot be captured at the same time to ensure the quality of image content when the image style is converted in the related art. The technical scheme of the present disclosure is as follows:
according to a first aspect of an embodiment of the present disclosure, there is provided an image style migration method, including: acquiring an original image and determining image content characteristics corresponding to the original image; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style; extracting local features from the image content features to obtain local image features corresponding to the image content features, wherein the local image features comprise edge region features of the original image; extracting global features of the image content features to obtain global image features corresponding to the image content features; performing feature fusion processing on the local image features and the global image features to obtain hidden state fusion features; and decoding the hidden state fusion characteristic to obtain a target style image corresponding to the original image.
In an exemplary embodiment of the disclosure, the determining the image content feature corresponding to the original image includes: acquiring pre-configured image sampling parameters; the image sampling parameters comprise downsampling multiplying power and hidden state channel number; based on the downsampling multiplying power and the number of hidden state channels, downsampling processing is carried out on the original image, so that hidden state features corresponding to the original image are obtained; acquiring a target style for performing style migration on the original image, and determining a condition state parameter corresponding to the target style; the conditional state parameters are determined based on the hidden state features; the image content feature is generated based on the hidden state feature and the conditional state parameter.
In an exemplary embodiment of the present disclosure, the hidden state feature corresponding to the original image is obtained by performing downsampling processing on the original image based on a preconfigured encoder; the training method of the encoder comprises the following steps: acquiring a pre-constructed initial network model; the initial network model comprises an initial encoder and an initial decoder; determining a training sample image, and performing downsampling processing on the training sample image by the initial encoder to obtain hidden state sample characteristics corresponding to the training sample image; reconstructing the hidden state sample features by the initial decoder to obtain a reconstructed sample image corresponding to the training sample image; determining a first model loss function based on a comparison between the training sample image and the reconstructed sample image; and respectively carrying out parameter adjustment processing on the initial encoder and the initial decoder based on the first model loss function to obtain a trained encoder and a trained decoder.
In an exemplary embodiment of the present disclosure, the determining a training sample image includes: acquiring an original sample image and a target style sample image corresponding to the original sample image; performing image scaling processing on the original sample image and the target style sample image to obtain a corresponding scaled image pair; performing center clipping processing on the scaled image pair to obtain a clipping image with a preset size; and carrying out translation processing on the clipping image to obtain the training sample image subjected to normalization processing.
In an exemplary embodiment of the present disclosure, the above method further includes: acquiring a pre-trained image style migration model; and carrying out feature extraction and feature fusion on the image content features through the image style migration model to obtain the hidden state fusion features.
In an exemplary embodiment of the present disclosure, the image style migration model is trained by: determining a training sample image, wherein the training sample image comprises an original sample image and a target style sample image; extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction; acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer; based on the initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features; determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by the initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result; and training the initial neural network model based on the residual prediction result to obtain the image style migration model.
In one exemplary embodiment of the present disclosure, the initial neural network model includes an initial convolution branch, an initial transformer branch; the step of respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features based on the initial neural network model to obtain fused image features corresponding to the sample image features, comprises the following steps: extracting local features of the first sample feature through the initial convolution branch to obtain a corresponding sample local feature; global feature extraction is carried out on the second sample feature through the initial converter branch, and a corresponding sample global feature is obtained; and carrying out feature fusion processing on the sample local features and the sample global features to obtain fusion image features.
In an exemplary embodiment of the present disclosure, the feature extracting the original sample image to generate a sample image feature corresponding to the original sample image includes: downsampling the original sample image to obtain hidden state sample characteristics corresponding to the original sample image; determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features; generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter; and performing length and width splicing processing on the first sample characteristics to obtain the second sample characteristics.
In an exemplary embodiment of the present disclosure, the residual prediction result includes a hidden state residual and a hidden state prediction result, and the training the initial neural network model based on the residual prediction result to obtain the image style migration model includes: acquiring hidden state sample characteristics corresponding to the original sample image and target style hidden state characteristics corresponding to the original sample image; obtaining the pre-configured model iteration times and initial conditions; training the initial neural network model based on the model iteration times and the initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference value between the hidden state prediction result and the target style hidden state; generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error output each time and the hidden state prediction result; and training the initial neural network model based on the second model loss function to obtain the image style migration model.
In an exemplary embodiment of the disclosure, the generating a second model loss function according to the target style hidden state feature and the hidden state residual error and the hidden state prediction result output each time includes: determining a current iteration number of model training, and determining a current hidden state residual error corresponding to the current iteration number; determining a last hidden state prediction result corresponding to a last iteration number corresponding to the current iteration number; determining the current hidden state prediction result according to the current hidden state residual error and the last hidden state prediction result; and generating the second model loss function according to the current hidden state prediction result and the target style hidden state characteristic.
In an exemplary embodiment of the present disclosure, the decoding the hidden state fusion feature to obtain a target style image corresponding to the original image includes: acquiring a pre-trained decoder; and decoding the hidden state fusion characteristic through the decoder to obtain a target style image of the original image under a target style.
According to a second aspect of embodiments of the present disclosure, there is provided an image style migration model, comprising: the convolution branch is used for extracting local features of the model input data to obtain local image features; the model input data is generated based on hidden state features and conditional state parameters corresponding to the original image; the hidden state characteristics are obtained by extracting characteristics of the original image, and the condition state parameters are hidden state parameters corresponding to the target style; the converter branch is used for carrying out global feature extraction on the model input data to obtain global image features corresponding to the image content features; the linear layer is used for outputting hidden state fusion characteristics of the original image under the target style based on the fusion image characteristics; the fusion image features are obtained by fusion processing of the local image features and the global image features.
According to a third aspect of the embodiments of the present disclosure, there is provided an image style migration apparatus, including: the image characteristic determining module is used for acquiring an original image and determining image content characteristics corresponding to the original image; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are obtained by extracting characteristics of the original image, and the condition state parameters are hidden state parameters corresponding to the target style; the local feature extraction module is used for carrying out local feature extraction on the image content features to obtain local image features corresponding to the image content features, wherein the local image features comprise edge region features of the original image; the global feature extraction module is used for carrying out global feature extraction on the image content features to obtain global image features corresponding to the image content features; the feature fusion module is used for carrying out feature fusion processing on the local image features and the global image features to obtain hidden state fusion features; and the image generation module is used for decoding the hidden state fusion characteristic to obtain a target style image corresponding to the original image.
In an exemplary embodiment of the present disclosure, the image feature determining module includes an image feature determining unit for acquiring a preconfigured image sampling parameter; the image sampling parameters comprise downsampling multiplying power and hidden state channel number; based on the downsampling multiplying power and the number of hidden state channels, downsampling processing is carried out on the original image, so that hidden state features corresponding to the original image are obtained; acquiring a target style for performing style migration on the original image, and determining a condition state parameter corresponding to the target style; the conditional state parameters are determined based on the hidden state features; the image content feature is generated based on the hidden state feature and the conditional state parameter.
In an exemplary embodiment of the present disclosure, the hidden state feature corresponding to the original image is obtained by performing downsampling processing on the original image based on a preconfigured encoder; the image feature determining module comprises an encoder training unit, a data processing unit and a data processing unit, wherein the encoder training unit is used for acquiring a pre-constructed initial network model; the initial network model comprises an initial encoder and an initial decoder; determining a training sample image, and performing downsampling processing on the training sample image by the initial encoder to obtain hidden state sample characteristics corresponding to the training sample image; reconstructing the hidden state sample features by the initial decoder to obtain a reconstructed sample image corresponding to the training sample image; determining a first model loss function based on a comparison between the training sample image and the reconstructed sample image; and respectively carrying out parameter adjustment processing on the initial encoder and the initial decoder based on the first model loss function to obtain a trained encoder and a trained decoder.
In one exemplary embodiment of the present disclosure, the encoder training unit includes a sample image generation subunit for acquiring an original sample image, and a target style sample image corresponding to the original sample image; performing image scaling processing on the original sample image and the target style sample image to obtain a corresponding scaled image pair; performing center clipping processing on the scaled image pair to obtain a clipping image with a preset size; and carrying out translation processing on the clipping image to obtain the training sample image subjected to normalization processing.
In an exemplary embodiment of the present disclosure, the image style migration apparatus further includes an image feature extraction module for acquiring a pre-trained image style migration model; and carrying out feature extraction and feature fusion on the image content features through the image style migration model to obtain the hidden state fusion features.
In one exemplary embodiment of the present disclosure, the image style migration apparatus further includes a stylized model training module for determining a training sample image, the training sample image including an original sample image and a target style sample image; extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction; acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer; based on the initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features; determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by the initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result; and training the initial neural network model based on the residual prediction result to obtain the image style migration model.
In one exemplary embodiment of the present disclosure, the initial neural network model includes an initial convolution branch, an initial transformer branch; the stylized model training module comprises a fusion feature determining unit, a first sample feature extraction unit and a second sample feature extraction unit, wherein the fusion feature determining unit is used for extracting local features of the first sample feature through the initial convolution branch to obtain corresponding sample local features; global feature extraction is carried out on the second sample feature through the initial converter branch, and a corresponding sample global feature is obtained; and carrying out feature fusion processing on the sample local features and the sample global features to obtain fusion image features.
In an exemplary embodiment of the present disclosure, the stylized model training module includes an input sample data generating unit, configured to perform downsampling processing on the original sample image to obtain a hidden state sample feature corresponding to the original sample image; determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features; generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter; and performing length and width splicing processing on the first sample characteristics to obtain the second sample characteristics.
In an exemplary embodiment of the disclosure, the residual prediction result includes a hidden state residual and a hidden state prediction result, and the stylized model training module includes a stylized model training unit, configured to obtain a hidden state sample feature corresponding to the original sample image and a target style hidden state feature corresponding to the original sample image; obtaining the pre-configured model iteration times and initial conditions; training the initial neural network model based on the model iteration times and the initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference value between the hidden state prediction result and the target style hidden state; generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error output each time and the hidden state prediction result; and training the initial neural network model based on the second model loss function to obtain the image style migration model.
In an exemplary embodiment of the present disclosure, the stylized model training unit includes a loss function generating subunit, configured to determine a current iteration number of model training, and determine a current hidden state residual corresponding to the current iteration number; determining a last hidden state prediction result corresponding to a last iteration number corresponding to the current iteration number; determining the current hidden state prediction result according to the current hidden state residual error and the last hidden state prediction result; and generating the second model loss function according to the current hidden state prediction result and the target style hidden state characteristic.
In an exemplary embodiment of the present disclosure, the image generation module includes an image generation unit for acquiring a pre-trained decoder; and decoding the hidden state fusion characteristic through the decoder to obtain a target style image of the original image under a target style.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute instructions to implement the image style migration method in the first aspect described above.
According to a fifth aspect of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the image style migration method in the first aspect described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the image style migration method of the first aspect described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: on the one hand, when capturing the content characteristics of the image, the local characteristic information and the global characteristic information can be acquired at the same time, so that the content quality of the stylized image can be ensured to a greater extent. On the other hand, the style conversion is performed based on the hidden state of the image, so that the style conversion processing can be performed by using less data volume, the data processing volume in the style conversion is reduced, and the conversion processing efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
In some image style transformation schemes, the styles are separated from the image content by different network layers of a deep convolutional neural network, the complete "style distribution" of the reference style image is fed back through the neural network, and the reference style is transferred onto the input image. The method can capture the long-range dependency relationship of the picture only by a convolutional network with a limited receptive field of convolution operation, but the increase of the network depth can cause the reduction of the resolution of the picture characteristics and the loss of details, thereby influencing the conversion effect.
In other image style transformation schemes, an input picture is encoded into a hidden state by an encoder, a mapping process from a hidden state of a content picture to a hidden state of a target style picture is learned by a neural network through a transform model based on a self-attention mechanism and a back propagation algorithm, and finally the hidden state after style mapping is restored into a stylized picture by a decoder. The method can capture global information input by long-term dependency learning by a transducer, but the method also causes the model to lose the capability of capturing local characteristics and position information, so that the conversion result has poor effect on colors and details.
In addition, in some image style transformation schemes, a convolution model is used to transform hidden states after encoding a picture in a residual iteration mode, the hidden states after the previous iteration are used as conditions to be input into residual prediction of the next round, and errors between the current hidden state and the target style hidden state are continuously reduced through residual prediction for a plurality of times. Although the method can improve the conversion accuracy to a certain extent by using a residual iteration mode, a single convolution model is still used, the receptive field is limited, and many global information cannot be captured.
Based on this, according to an embodiment of the present disclosure, an image style migration method, an image style migration model, an image style migration apparatus, an electronic device, a computer-readable storage medium, and a computer program product are proposed.
Fig. 1 is a flowchart illustrating an image style migration method according to an exemplary embodiment, and as shown in fig. 1, the image style migration method may be used in a computer device, where the computer device described in the present disclosure may include a mobile terminal device such as a mobile phone, a tablet computer, a notebook computer, a palm top computer, a personal digital assistant (Personal Digital Assistant, PDA), and a fixed terminal device such as a desktop computer. The present exemplary embodiment is illustrated with the method applied to a computer device, and it is understood that the method may also be applied to a server, and may also be applied to a system including a computer device and a server, and implemented through interaction of the computer device and the server. The method specifically comprises the following steps.
In step S110, an original image is acquired, and image content features corresponding to the original image are determined; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state features are the image features of the original image in the hidden space, and the condition state parameters are hidden state parameters corresponding to the target style.
In one exemplary embodiment of the present disclosure, the original image may be an image to be style-converted. The image content features may be feature data used in converting the original image to the target style image. Image content features may be generated based on hidden state features and conditional state parameters corresponding to the original image. The hidden state feature may be an image feature of the original image corresponding to the hidden space, for example, the hidden state feature may be an image feature obtained by performing downsampling processing on the original image. The condition state parameter may be a feature parameter corresponding to the image corresponding to the target style in the hidden space, where the condition state parameter and the parameter of the hidden state feature have the same row-column size. The target style may be a style corresponding to the original image after the style migration process. For example, the target style may include cartoon style, comic style, Q edition style, and the like.
In the image style conversion scene, an original image may be acquired first, for example, the original image may be acquired in real time by an image acquisition apparatus, or an image stored in a specific position in advance may be acquired as the original image. After the original image is acquired, in order to have smaller data processing amount in the style conversion process, the image characteristics of the original image under the hidden Space can be acquired, and the hidden Space (latency Space) can be a representation of compressed data, and the function of the hidden Space is to learn the data characteristics for finding the mode and simplify the data representation. For example, downsampling is performed on the original image to obtain hidden state features corresponding to the original image in the hidden space, and feature sizes corresponding to the hidden state features are determined.
After determining the feature size, determining a condition state parameter with the same feature size as the hidden state feature, wherein the condition state parameter can be a feature parameter corresponding to the target style. And taking the acquired condition state parameters and hidden state features together as image content features corresponding to the original image.
In step S120, local feature extraction is performed on the image content features, so as to obtain local image features corresponding to the image content features, where the local image features include edge region features of the original image.
In one exemplary embodiment of the present disclosure, local feature extraction may be a process of extracting local features from image content features. The local image features may be local features contained in the image content features, for example, the detail features and feature location information of the image edges in the original image may be preserved by local feature extraction. The edge region features may be features of some edge regions in the original image where the image subject interfaces with the image background.
After the image content features are obtained, local feature extraction can be performed on the image content features, and contents such as detail features, feature position information and the like corresponding to the original image and the target style are extracted to serve as local image features corresponding to the image content features.
In step S130, global feature extraction is performed on the image content features, so as to obtain global image features corresponding to the image content features.
In one exemplary embodiment of the present disclosure, global feature extraction may be a process of extracting global features from image content features. The global image feature may be feature information contained in the entirety of the image content feature.
When capturing the image content information, in order to simultaneously retain the local features and the global features of the image, global feature extraction can be performed on the image content features to obtain global image features corresponding to the image content features.
In step S140, feature fusion processing is performed on the local image features and the global image features, so as to obtain hidden state fusion features.
In one exemplary embodiment of the present disclosure, the feature fusion process may be a process of fusing local image features with global image features. The hidden state fusion feature can fuse the local image feature and the global image feature to obtain the hidden state feature corresponding to the target style image.
After the local image features and the global image features are obtained, feature fusion processing can be carried out on the local image features and the global image features, and hidden state features, namely hidden state fusion features, of the original image in the target style can be obtained after the feature fusion processing.
In step S150, decoding is performed on the hidden state fusion feature to obtain a target style image corresponding to the original image.
In one exemplary embodiment of the present disclosure, the decoding process may be a process of upsampling the hidden state fusion feature. The target style image may be a style-shifted image of the original image in a target style, the target style image having the same size as the original image.
After the hidden state fusion feature is obtained, because the hidden state fusion feature is a feature fusion result obtained based on the image content feature, reconstruction processing can be performed on the hidden state fusion feature, for example, upsampling processing is performed on the hidden state fusion feature, and the hidden state fusion feature is reconstructed into a target style image of the original image in the target style.
According to the image style migration method in the present exemplary embodiment, on one hand, when capturing the content features of the image, the local feature information and the global feature information can be obtained at the same time, so that the content quality of the stylized image can be ensured to a greater extent. On the other hand, the style conversion is performed based on the hidden state of the image, so that the style conversion processing can be performed by using less data volume, the data processing volume in the style conversion is reduced, and the conversion processing efficiency is improved.
Next, an image style migration method in the present exemplary embodiment will be further described.
In an exemplary embodiment of the present disclosure, for step S110, determining an image content feature corresponding to an original image includes: acquiring pre-configured image sampling parameters; the image sampling parameters comprise downsampling multiplying power and hidden state channel number; based on the downsampling multiplying power and the number of hidden state channels, downsampling processing is carried out on the original image, so that hidden state features corresponding to the original image are obtained; acquiring a target style for performing style migration on an original image, and determining a condition state parameter corresponding to the target style; the conditional state parameters are determined based on hidden state features; image content features are generated based on the hidden state features and the conditional state parameters.
The image sampling parameter may be a parameter used for performing downsampling processing on the original image. The downsampling magnification may be a magnification by which the original image is downsampled, for example, the downsampling magnification may be 4, 8, or the like. The number of hidden-state channels may be the number of channels employed for downsampling the original image, and for example, the number of hidden-state channels may be configured to be 2. The encoder may be a network structure that downsamples the original image.
In order to reduce the amount of inference calculation in image style conversion, the original image can be converted into a lower-dimensional hidden space and style conversion processing can be performed. Image sampling parameters employed for downsampling an original image are obtained, and may include, for example, downsampling magnification and the number of hidden-state channels. The down-sampling process of the original image may be performed by a pre-trained Encoder, and the Encoder may be trained based on a convolutional neural network. The Encoder may implement a mutual mapping between the original image and the corresponding hidden state features.
Recording the obtained downsampling multiplying power as sf and the number of hidden state channels as ch, and performing downsampling processing on the normalized original image by an Encoder Encoder based on the downsampling multiplying power sf and the number of hidden state channels as ch to obtain hidden state features corresponding to the original image, wherein the hidden state features can be expressed as (ch, res/sf, res/sf); where res may be the image size of the normalized raw image. The larger the downsampling ratio sf is, the smaller the size of the corresponding hidden state feature is, and the less the image content information is reserved, so that the downsampling ratio can be configured according to specific sampling requirements in an actual scene, and the downsampling ratio is not excessively large.
After the hidden state feature corresponding to the original image is obtained, a condition state parameter corresponding to the target style of style migration of the original image can be further determined, wherein the parameter size of the condition state parameter is the same as the row and column size of the feature parameter contained in the hidden state feature, and the parameter sizes are (ch, res/sf, res/sf). After the condition state parameters are obtained, the hidden state features and the condition state parameters can be spliced to generate image content features corresponding to the original image, and the image content features are used as data bases for feature extraction so as to reduce the calculated amount in the style conversion process.
In one exemplary embodiment of the present disclosure, the encoder is trained by: acquiring a pre-constructed initial network model; the initial network model comprises an initial encoder and an initial decoder; determining a training sample image, and performing downsampling processing on the training sample image by an initial encoder to obtain hidden state sample characteristics corresponding to the training sample image; reconstructing the hidden state sample characteristics by an initial decoder to obtain a reconstructed sample image corresponding to the training sample image; determining a first model loss function based on a comparison result between the training sample image and the reconstructed sample image; and respectively performing parameter adjustment processing on the initial encoder and the initial decoder based on the first model loss function to obtain a trained encoder and trained decoder.
The initial network model may be a pre-built network model, for example, a convolutional network model. The initial encoder may be a network branch included in the initial network model for training the encoder. The initial decoder may be a network branch included in the initial network model for training the decoder. The training sample image may be a training sample image used for model training of the initial network model, and the training sample image may be a sample image after normalization processing. The hidden state sample feature may be an image feature obtained by performing downsampling processing on the training sample image. The reconstructed sample image may be a sample image after reconstruction of the latent state sample feature. The first model loss function may be a loss function for training an initial network model. The decoder may be a network structure that reconstructs the latent state image features.
The encoder model used for the downsampling of the original image is based on pre-constructed initial neural network training. Specifically, an initial network model constructed in advance can be obtained; the initial network model may include two network branches, an initial encoder and an initial decoder. A training sample image for training the initial network model is obtained, and the training sample image can be a sample image after normalization processing.
The training mode of the initial network model can be performed through a back propagation algorithm, and the convolution kernel parameters of the initial encoder and the initial decoder are trained through the back propagation algorithm. Taking the training sample image input as model input data, and performing downsampling processing on the training sample image by an initial encoder to obtain hidden state sample characteristics z corresponding to the training sample image; and then, the initial decoder performs reconstruction processing on the hidden state sample characteristic z to obtain a reconstructed sample image recon corresponding to the training sample image. Determining a comparison result between the reconstructed sample image recon and the training sample image input, generating a first model loss function from the comparison result between the reconstructed sample image recon and the training sample image input, and performing back propagation training on an initial network model by using the obtained first model loss function to obtain a trained model. Specifically, the first model loss function can be obtained by equation 1.
loss=abs (input-direction) +lambda_p (LPIPS (input, direction) (formula 1)
Wherein input may be a training sample image and recon may be a reconstructed sample image; abs (input-direction) may be the absolute pixel error between the input and reconstructed pictures, representing the difference of the reconstructed picture at the pixel level and the original picture. lambda_p LPIPS (input, receiver) may be a perceptual error, and the depth features of the image may be extracted using a super resolution test sequence (visual geometry group, VGG), representing the difference between the reconstructed sample image and the training sample image from the content perception level; lambda_p may be the weight that the perceived error occupies; the LPIPS () may be an image similarity measure function.
After determining the first model loss function, parameter adjustment processing can be performed on convolution kernel parameters of the initial encoder and the initial decoder based on the first model loss function until the first model loss function converges or a preset iteration number is obtained, so as to obtain the trained encoder and decoder.
In one exemplary embodiment of the present disclosure, a training sample image is obtained by: acquiring an original sample image and a target style sample image corresponding to the original sample image; performing image scaling processing on the original sample image and the target style sample image to obtain a corresponding scaled image pair; performing center clipping processing on the scaled image pair to obtain a clipping image with a preset size; and carrying out translation processing on the clipping image to obtain a training sample image subjected to normalization processing.
Wherein the original sample image may be a sample image that has not been normalized. The target style sample image may be a style migration image to which the original sample image corresponds under the target style. The image scaling process may be a process of scaling an image by a preset scaling ratio. The scaled image pair may be an image pair formed by scaling an original sample image and a target style sample image. The center clipping process may be a process of clipping the scaled image pair based on the center of the corresponding long side of the scaled image pair.
The training sample image may be derived based on a set of raw sample images ctx and a corresponding set of stylized target style sample images sty. In this embodiment, in order to facilitate model parameter learning and accelerate model convergence speed, some preprocessing may be performed on an original sample image and a target style sample image for model training. And performing image scaling processing on the original sample image and the target style sample image according to a predefined scaling ratio, so that the short sides of the two images reach a set size res to obtain a scaled image pair, and after the scaled image pair is obtained, performing center clipping on the scaled image pair to obtain a group of clipping images with res. After the clipping image is obtained, scaling and translation processing can be carried out on pixel pixels of the clipping image, so that a training sample image after normalization processing is obtained. For example, the clipping image is scaled and translated according to equation 2, where pixels may be pixels corresponding to the clipping image.
pixel=pixel/127.5-1 (formula 2)
In one exemplary embodiment of the present disclosure, a pre-trained image style migration model is obtained; and carrying out feature extraction and feature fusion on the image content features through an image style migration model to obtain hidden state fusion features.
The image style migration model may be a network model for performing style conversion on an original image to obtain a target style image.
And for the original image, performing style conversion through a pre-trained image style migration model to obtain a target style image of the original image in a target style. Referring to fig. 2, fig. 2 is an overall flowchart illustrating generating a target style image based on an original image according to an exemplary embodiment. For the
original image 210, the
encoder 220 may perform upsampling processing on the original image to obtain the hidden state feature z=corresponding to the original imageEncoder (x) and obtain conditional state parameters having the same size as the hidden state feature z=Encoder (x)
The image content features 230 are input into the image
style migration model 240 together as the image content features 230 corresponding to the original image, and feature extraction and feature fusion are performed on the image content features 230 to obtain hidden state fusion features 250 of the original image in the target style.
In an exemplary embodiment of the present disclosure, for step S150, reconstructing the hidden state fusion feature to obtain a target style image corresponding to the original image, including: acquiring a pre-trained decoder; reconstructing the hidden state fusion characteristic through a decoder to obtain a target style image of the original image under the target style
With continued reference to fig. 2, after obtaining the hiddenstate fusion feature 250 of the original image, apre-trained decoder 260 is obtained; the resulting hiddenstate fusion feature 250 is reconstructed by apre-trained decoder 260 to obtain atarget style image 270 of theoriginal image 210 in the target style. Through the processing steps, style conversion processing of the original image through the image style migration model can be realized.
In one exemplary embodiment of the present disclosure, an image style migration model is trained by: determining a training sample image, wherein the training sample image comprises an original sample image and a target style sample image; extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction; acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer; based on an initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features; determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by an initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result; and training the initial neural network model based on the residual prediction result to obtain an image style migration model.
The input sample data may be sample data input to the initial neural network model for model training. The first sample feature may be a sample data feature input to the initial convolution branch. The second sample feature may be a sample data feature input to the initial transformer branch. The initial convolution branches may be model branches composed of a convolution network structure. The initial transformer branch may be a model branch consisting of a transformer network structure. The initial linear layer may be a linear layer for feature fusion. The sample local feature may be a local feature corresponding to the input sample data. The sample global feature may be a global feature corresponding to the input sample data. The fused image features can be features obtained by performing feature fusion processing on the sample local features and the sample global features. The target style hidden state feature may be a hidden state feature corresponding to the target style sample image. The residual prediction result may be a result of predicting a residual between the fused image feature and the target style hidden state feature.
According to the training sample image and the encoder obtained in the previous steps, a group of original sample images and target style sample images can be encoded into corresponding hidden state features, and training of an image style conversion model is performed; wherein the hidden state features of a group of original sample images corresponding to the target style sample image can be expressed as
Input sample data may be generated based on the original sample image; the input sample data includes a first sample feature and a second sample feature; wherein the first sample feature can be used as input data of the initial convolution branch for local feature extraction, and the second sampleThe present features can be used as input data for the initial transformer branch for global feature extraction.
In order to give consideration to the local feature information and the global feature information of the image and obtain higher image conversion quality, the embodiment combines a depth convolution model and a transform model to design a novel hybrid model style conversion for reasoning, and referring to fig. 3, fig. 3 is a model structure diagram of an image style migration model according to an exemplary embodiment. The constructed initial neural network model can comprise network structures such as initial convolution branches, initial converter branches, initial linear layers and the like. Specifically, the initial convolution branch may be composed of a plurality of convolution blocks (Convolution Block), the number of the convolution blocks may be set according to hardware conditions, and the larger the video memory, the more the number of the convolution blocks may be.
And taking the first sample characteristic as input data of the initial convolution branch, carrying out local characteristic extraction processing to obtain a sample local characteristic corresponding to the original sample image, and taking the second sample characteristic as input data of the initial converter branch, carrying out global characteristic extraction to obtain a sample global characteristic corresponding to the original sample image. And finally, fusing the local information and the global information extracted from the two parallel branches, and accurately predicting an output residual error, namely an error between the current hidden state and the stylized target hidden state, through a linear layer.
In one exemplary embodiment of the present disclosure, the initial neural network model includes an initial convolution branch, an initial transformer branch; based on an initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample feature and the second sample feature to obtain a fusion image feature corresponding to the sample image feature, wherein the method comprises the following steps: extracting local features of the first sample feature through an initial convolution branch to obtain a corresponding sample local feature; carrying out global feature extraction on the second sample feature through the initial converter branch to obtain a corresponding sample global feature; and carrying out feature fusion processing on the local features of the sample and the global features of the sample to obtain fusion image features.
With continued reference to fig. 3, the initial convolution branch may be composed of a plurality of convolution blocks (Convolution Block), the number of which may be set according to hardware conditions, and the larger the video memory, the greater the number of convolution blocks. Referring to fig. 4, fig. 4 is a network structure diagram illustrating transformer blocks in an image style migration model according to an exemplary embodiment. Each transducer block adopts the same residual block (ResBlock) structure, each transducer block can comprise a regularization layer, a multi-head self-attention block, a multi-layer perceptron and a jump connection, and the convolution block obtained based on the network structure can effectively extract local features of the first input sample features, namely the sample local features.
The number of initial transformer branches may consist of a number of transformer blocks (Transformer Block), which is also set according to hardware conditions. Referring to fig. 5, fig. 5 is a network block diagram illustrating a convolution block in an image style migration model according to an example embodiment. Each transducer block may consist of a convolution layer, a batch regularization layer, an activation function, and a skip connection, which may be effective to extract a global feature of the sample contained in the second sample feature.
The corresponding convolution processing and transformation processing are carried out on the two different sample characteristics through the initial convolution branch and the initial transformer branch respectively, so that the local characteristics and the global characteristics corresponding to the original sample image can be obtained and used as the data basis of the fusion characteristics.
In an exemplary embodiment of the present disclosure, feature extraction is performed on an original sample image to generate sample image features corresponding to the original sample image, including: downsampling the original sample image to obtain hidden state sample characteristics corresponding to the original sample image; determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features; generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter; and performing length and width splicing processing on the first sample characteristics to obtain second sample characteristics.
The hidden state sample feature may be an image feature obtained by performing downsampling processing on an original sample image. The conditional state sample parameter may be a conditional state parameter having the same rank size as the implicit state sample feature.
After the original sample image is obtained, the original sample image can be subjected to downsampling processing to obtain hidden state sample characteristics corresponding to the original sample image, then condition state sample parameters corresponding to the hidden state sample characteristics are determined, and the condition state sample parameters and the hidden state sample characteristics have the same size. And splicing the hidden state sample characteristics with the conditional state sample parameters of the same shape to obtain first sample characteristics, so that the number of channels corresponding to the first sample characteristics is twice as large as that of the hidden state sample characteristics. After the first sample feature is obtained, the first sample feature can be directly input into the initial convolution branch for reasoning.
For the converter branch parallel to the initial convolution branch, the embodiment rearranges the obtained first sample feature, for example, the length and width of the first sample feature are flattened to obtain a second sample feature, and the obtained second sample feature is input to the initial converter branch for reasoning. Through the data processing mode, input data corresponding to the initial convolution branch and the initial converter branch can be respectively determined so as to perform model training.
In an exemplary embodiment of the present disclosure, the residual prediction result includes a hidden state residual and a hidden state prediction result, training an initial neural network model based on the residual prediction result to obtain an image style migration model, including: acquiring hidden state sample characteristics corresponding to an original sample image and target style hidden state characteristics corresponding to the original sample image; obtaining the pre-configured model iteration times and initial conditions; training an initial neural network model based on the model iteration times and initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference between the hidden state prediction result and the hidden state of the target style; generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error and the hidden state prediction result which are output each time; training the initial neural network model based on the second model loss function to obtain an image style migration model.
The target style hidden state feature can be an image feature obtained after the downsampling process is performed on the target style sample image. The number of model iterations may be the number of residual iterations performed on the initial neural network model. The initial conditions may be initial condition state parameters employed to train the initial neural network model. The hidden state prediction result may be a hidden state prediction result of input sample data output by each iteration when the model is subjected to residual iteration processing. The hidden state residual may be a residual between the hidden state prediction result output by each residual iteration process and the target style hidden state. The second model loss function may be a loss function corresponding to the initial neural network model.
For the initial neural network model, model training can be performed in a back propagation and residual iteration mode, and an image style migration model is obtained. Parameters to be trained in the model training process may include a transducer branch all parameter, a Convolition branch all parameter, and a linear output layer parameter. In order to obtain a more accurate conversion result, unlike a stylized algorithm with only one forward propagation, the embodiment performs forward propagation for multiple times through fixed model iteration times to iterate the prediction residual, so that the obtained conversion result is more similar to the target style sample image, namely, the refined hidden state conversion is realized.
Before model training is carried out, hidden state sample characteristics corresponding to an original sample image and target style hidden state characteristics corresponding to the original sample image are obtained, and the target style hidden state characteristics are used as a final training target. Referring to FIG. 6, FIG. 6 is a diagram illustrating a training process for obtaining an image style migration model through model training, according to an example embodiment. In fig. 6, the number of model iterations and the initial condition are obtained, and since the prediction result of the last residual iteration does not exist when the residual iteration process is performed for the first time, the pre-configured initial condition can be used as the data of the hidden state sample feature stitching, and the initial condition z ζ_0 can be taken as the all-zero matrix with the same shape as the hidden state z. Splicing the hidden state sample characteristics with the acquired initial conditions, and sharing As well as model input data as an initial neural network model. The data input z of the model may be the state of the original sample image mapped to the hidden space by the encoder, i.e. the hidden state sample feature; data input of model

It may be the state that the target style sample image maps to hidden space, i.e., the target style hidden state feature.
Inputting model input data into an initial neural network model, and obtaining hidden state prediction results and hidden state residual errors which are output by each model iteration when training the initial neural network model; the hidden state residual is the difference between the hidden state prediction result and the target style hidden state. And generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error and the hidden state prediction result which are output each time. The loss between the target style hidden state feature for back propagation and the hidden state prediction result from the current residual iteration can be calculated by equation 3. I.e. using mean square error to measure the hidden state prediction result of each round of output
Hidden state feature from target style->
The gap between them.
Wherein MSE () may represent the mean square error function;
the hidden state prediction result can be represented; / >
The target style hidden state feature may be represented.
After the model loss function is obtained, the initial neural network model may be trained based on the model loss function, for example, the initial neural network model may be iteratively trained based on a preconfigured number of model iterations, to obtain a final image style migration model.
In an exemplary embodiment of the present disclosure, generating a second model loss function according to the target style hidden state feature and the hidden state residual and hidden state prediction result of each output includes: determining a current iteration number of model training, and determining a current hidden state residual error corresponding to the current iteration number; determining a last hidden state prediction result corresponding to a last iteration number corresponding to a current iteration number; determining a current hidden state prediction result according to the current hidden state residual error and the previous hidden state prediction result; and generating a second model loss function according to the current hidden state prediction result and the target style hidden state characteristics.
The current iteration number may be the number of the current residual iteration. The current hidden state residual may be a prediction residual output by the current residual iteration training. The last iteration number may be a number adjacent to and in order before the current iteration number. The last hidden state prediction result may be a prediction result of the input sample data output by the last residual iterative process in the target style. The current hidden state prediction result may be a prediction result of the input sample data output by the current residual iterative process in the target style.
Since the model training needs to go through multiple residual iterative processes, each residual iterative process can be configured with a corresponding iteration number. For example, if the model iteration number is configured as N, the iteration number t=1, 2,3, …, N. The model predicts N times hidden state residual errors
The result of the stylized implicit state prediction is the sum of N residual prediction results, i.e., prediction +.>
Hidden state for each round of predictionResidual error->
Hidden state prediction result obtained from previous round +.>
The sum is the hidden state prediction result of the round +.>
Therefore, the prediction result obtained in the previous round is utilized, so that the error between the prediction result and the target hidden state is reduced in each round of iteration process, the style conversion result is optimized continuously, the accuracy is higher, and the stylized result is more similar to other algorithms.
In an exemplary embodiment of the present disclosure, for step S150, decoding the hidden state fusion feature to obtain a target style image corresponding to the original image, including: acquiring a pre-trained decoder; and decoding the hidden state fusion characteristic through a decoder to obtain a target style image of the original image under the target style.
After the hidden state fusion feature corresponding to the original image is obtained, a pre-trained decoder can be obtained, the decoder is adopted to decode the hidden state fusion feature, for example, the decoder is used to perform up-sampling processing on the hidden state fusion feature, and a target style image of the original image under a target style is obtained.
In summary, according to the image style migration method disclosed by the disclosure, an original image is obtained, and image content characteristics corresponding to the original image are determined; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style; extracting local features of the image content features to obtain local image features corresponding to the image content features, wherein the local image features comprise edge region features of an original image; extracting global features of the image content features to obtain global image features corresponding to the image content features; performing feature fusion processing on the local image features and the global image features to obtain hidden state fusion features; and decoding the hidden state fusion characteristic to obtain a target style image corresponding to the original image. On the one hand, when capturing the content characteristics of the image, the local characteristic information and the global characteristic information can be acquired at the same time, so that the content quality of the stylized image can be ensured to a greater extent. On the other hand, the style conversion is performed based on the hidden state of the image, so that the style conversion processing can be performed by using less data volume, the data processing volume in the style conversion is reduced, and the conversion processing efficiency is improved. On the other hand, the error between the input image and the target style is continuously reduced in a residual error iteration mode during hidden state conversion, so that the content quality and the stylization degree of the input image conversion result can be greatly improved.
In some embodiments of the present disclosure, an image style migration model is provided, referring to fig. 3, the image style migration model comprising: convolvingbranch 310,transformer branch 320, andlinear layer 330.
Specifically, theconvolution branch 310 is configured to perform local feature extraction on the model input data to obtain local image features; the model input data is generated based on hidden state features and conditional state parameters corresponding to the original image; the hidden state characteristics are obtained by extracting characteristics of the original image, and the condition state parameters are hidden state parameters corresponding to the target style; atransformer branch 320, configured to perform global feature extraction on the model input data, so as to obtain global image features corresponding to the image content features; alinear layer 330 for outputting hidden state fusion features of the original image in the target style based on the fusion image features; the fusion image features are obtained by fusion processing of the local image features and the global image features.
Fig. 7 is a block diagram illustrating an image style migration apparatus according to an exemplary embodiment. Referring to fig. 7, the imagestyle migration apparatus 700 includes: an imagefeature determination module 710, a localfeature extraction module 720, a globalfeature extraction module 730, afeature fusion module 740, and animage generation module 750.
Specifically, the imagefeature determining module 710 is configured to obtain an original image, and determine an image content feature corresponding to the original image; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style; the localfeature extraction module 720 is configured to perform local feature extraction on the image content features to obtain local image features corresponding to the image content features, where the local image features include edge region features of the original image; the globalfeature extraction module 730 is configured to perform global feature extraction on the image content features to obtain global image features corresponding to the image content features; thefeature fusion module 740 is configured to perform feature fusion processing on the local image feature and the global image feature to obtain a hidden state fusion feature; theimage generating module 750 is configured to perform decoding processing on the hidden state fusion feature to obtain a target style image corresponding to the original image.
In one exemplary embodiment of the present disclosure, the imagefeature determination module 710 includes an image feature determination unit for acquiring pre-configured image sampling parameters; the image sampling parameters comprise downsampling multiplying power and hidden state channel number; based on the downsampling multiplying power and the number of hidden state channels, downsampling processing is carried out on the original image, so that hidden state features corresponding to the original image are obtained; acquiring a target style for performing style migration on an original image, and determining a condition state parameter corresponding to the target style; the conditional state parameters are determined based on hidden state features; image content features are generated based on the hidden state features and the conditional state parameters.
In one exemplary embodiment of the present disclosure, the imagefeature determination module 710 includes an encoder training unit for acquiring a pre-constructed initial network model; the initial network model comprises an initial encoder and an initial decoder; determining a training sample image, and performing downsampling processing on the training sample image by an initial encoder to obtain hidden state sample characteristics corresponding to the training sample image; reconstructing the hidden state sample characteristics by an initial decoder to obtain a reconstructed sample image corresponding to the training sample image; determining a first model loss function based on a comparison result between the training sample image and the reconstructed sample image; and respectively performing parameter adjustment processing on the initial encoder and the initial decoder based on the first model loss function to obtain a trained encoder and trained decoder.
In one exemplary embodiment of the present disclosure, the encoder training unit includes a sample image generation subunit for acquiring an original sample image, and a target style sample image corresponding to the original sample image; performing image scaling processing on the original sample image and the target style sample image to obtain a corresponding scaled image pair; performing center clipping processing on the scaled image pair to obtain a clipping image with a preset size; and carrying out translation processing on the clipping image to obtain a training sample image subjected to normalization processing.
In an exemplary embodiment of the present disclosure, the imagestyle migration apparatus 700 further includes an image feature extraction module for acquiring a pre-trained image style migration model; and carrying out feature extraction and feature fusion on the image content features through an image style migration model to obtain hidden state fusion features.
In one exemplary embodiment of the present disclosure, the imagestyle migration apparatus 700 further includes a stylized model training module for determining a training sample image, the training sample image including an original sample image and a target style sample image; extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction; acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer; based on an initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features; determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by an initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result; and training the initial neural network model based on the residual prediction result to obtain an image style migration model.
In one exemplary embodiment of the present disclosure, the initial neural network model includes an initial convolution branch, an initial transformer branch; the stylized model training module comprises a fusion feature determining unit, a first sampling unit and a second sampling unit, wherein the fusion feature determining unit is used for extracting local features of a first sample through an initial convolution branch to obtain corresponding sample local features; carrying out global feature extraction on the second sample feature through the initial converter branch to obtain a corresponding sample global feature; and carrying out feature fusion processing on the local features of the sample and the global features of the sample to obtain fusion image features.
In an exemplary embodiment of the present disclosure, a stylized model training module includes an input sample data generating unit, configured to perform downsampling processing on an original sample image to obtain hidden state sample features corresponding to the original sample image; determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features; generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter; and performing length and width splicing processing on the first sample characteristics to obtain second sample characteristics.
In an exemplary embodiment of the present disclosure, the residual prediction result includes a hidden state residual and a hidden state prediction result, and the stylized model training module includes a stylized model training unit, configured to obtain a hidden state sample feature corresponding to an original sample image and a target style hidden state feature corresponding to the original sample image; obtaining the pre-configured model iteration times and initial conditions; training an initial neural network model based on the model iteration times and initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference between the hidden state prediction result and the hidden state of the target style; generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error and the hidden state prediction result which are output each time; training the initial neural network model based on the second model loss function to obtain an image style migration model.
In an exemplary embodiment of the present disclosure, the stylized model training unit includes a loss function generating subunit, configured to determine a current iteration number of the model training, and determine a current hidden state residual corresponding to the current iteration number; determining a last hidden state prediction result corresponding to a last iteration number corresponding to a current iteration number; determining a current hidden state prediction result according to the current hidden state residual error and the previous hidden state prediction result; and generating a second model loss function according to the current hidden state prediction result and the target style hidden state characteristics.
In one exemplary embodiment of the present disclosure, theimage generation module 750 includes an image generation unit for acquiring a pre-trained decoder; and decoding the hidden state fusion characteristic through a decoder to obtain a target style image of the original image under the target style.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Anelectronic device 800 according to such an embodiment of the present disclosure is described below with reference to fig. 8. Theelectronic device 800 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 8, theelectronic device 800 is embodied in the form of a general purpose computing device. Components ofelectronic device 800 may include, but are not limited to: the at least oneprocessing unit 810, the at least onestorage unit 820, abus 830 connecting the different system components (including thestorage unit 820 and the processing unit 810), and adisplay unit 840.
Wherein the storage unit stores program code that is executable by theprocessing unit 810 such that theprocessing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification.
Storage unit 820 may include readable media in the form of volatile storage units such as Random Access Memory (RAM) 821 and/orcache memory unit 822, and may further include Read Only Memory (ROM) 823.
Thestorage unit 820 may include a program/utility 824 having a set (at least one) ofprogram modules 825,such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 830 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
Theelectronic device 800 may also communicate with one or more external devices 870 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with theelectronic device 800, and/or any device (e.g., router, modem, etc.) that enables theelectronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O)interface 850. Also,electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, throughnetwork adapter 860. As shown,network adapter 860 communicates with other modules ofelectronic device 800 overbus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection withelectronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory, comprising instructions executable by a processor of an apparatus to perform the image style migration method described above. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided, comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the image style migration method according to any one of the preceding claims.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.