The image content features 230 are input into the imagestyle migration model 240 together as the image content features 230 corresponding to the original image, and feature extraction and feature fusion are performed on the image content features 230 to obtain hidden state fusion features 250 of the original image in the target style.

In an exemplary embodiment of the present disclosure, for step S150, reconstructing the hidden state fusion feature to obtain a target style image corresponding to the original image, including: acquiring a pre-trained decoder; reconstructing the hidden state fusion characteristic through a decoder to obtain a target style image of the original image under the target style

With continued reference to fig. 2, after obtaining the hiddenstate fusion feature 250 of the original image, apre-trained decoder 260 is obtained; the resulting hiddenstate fusion feature 250 is reconstructed by apre-trained decoder 260 to obtain atarget style image 270 of theoriginal image 210 in the target style. Through the processing steps, style conversion processing of the original image through the image style migration model can be realized.

In one exemplary embodiment of the present disclosure, an image style migration model is trained by: determining a training sample image, wherein the training sample image comprises an original sample image and a target style sample image; extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction; acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer; based on an initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features; determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by an initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result; and training the initial neural network model based on the residual prediction result to obtain an image style migration model.

The input sample data may be sample data input to the initial neural network model for model training. The first sample feature may be a sample data feature input to the initial convolution branch. The second sample feature may be a sample data feature input to the initial transformer branch. The initial convolution branches may be model branches composed of a convolution network structure. The initial transformer branch may be a model branch consisting of a transformer network structure. The initial linear layer may be a linear layer for feature fusion. The sample local feature may be a local feature corresponding to the input sample data. The sample global feature may be a global feature corresponding to the input sample data. The fused image features can be features obtained by performing feature fusion processing on the sample local features and the sample global features. The target style hidden state feature may be a hidden state feature corresponding to the target style sample image. The residual prediction result may be a result of predicting a residual between the fused image feature and the target style hidden state feature.

According to the training sample image and the encoder obtained in the previous steps, a group of original sample images and target style sample images can be encoded into corresponding hidden state features, and training of an image style conversion model is performed; wherein the hidden state features of a group of original sample images corresponding to the target style sample image can be expressed as

Input sample data may be generated based on the original sample image; the input sample data includes a first sample feature and a second sample feature; wherein the first sample feature can be used as input data of the initial convolution branch for local feature extraction, and the second sampleThe present features can be used as input data for the initial transformer branch for global feature extraction.

In order to give consideration to the local feature information and the global feature information of the image and obtain higher image conversion quality, the embodiment combines a depth convolution model and a transform model to design a novel hybrid model style conversion for reasoning, and referring to fig. 3, fig. 3 is a model structure diagram of an image style migration model according to an exemplary embodiment. The constructed initial neural network model can comprise network structures such as initial convolution branches, initial converter branches, initial linear layers and the like. Specifically, the initial convolution branch may be composed of a plurality of convolution blocks (Convolution Block), the number of the convolution blocks may be set according to hardware conditions, and the larger the video memory, the more the number of the convolution blocks may be.

And taking the first sample characteristic as input data of the initial convolution branch, carrying out local characteristic extraction processing to obtain a sample local characteristic corresponding to the original sample image, and taking the second sample characteristic as input data of the initial converter branch, carrying out global characteristic extraction to obtain a sample global characteristic corresponding to the original sample image. And finally, fusing the local information and the global information extracted from the two parallel branches, and accurately predicting an output residual error, namely an error between the current hidden state and the stylized target hidden state, through a linear layer.

In one exemplary embodiment of the present disclosure, the initial neural network model includes an initial convolution branch, an initial transformer branch; based on an initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample feature and the second sample feature to obtain a fusion image feature corresponding to the sample image feature, wherein the method comprises the following steps: extracting local features of the first sample feature through an initial convolution branch to obtain a corresponding sample local feature; carrying out global feature extraction on the second sample feature through the initial converter branch to obtain a corresponding sample global feature; and carrying out feature fusion processing on the local features of the sample and the global features of the sample to obtain fusion image features.

With continued reference to fig. 3, the initial convolution branch may be composed of a plurality of convolution blocks (Convolution Block), the number of which may be set according to hardware conditions, and the larger the video memory, the greater the number of convolution blocks. Referring to fig. 4, fig. 4 is a network structure diagram illustrating transformer blocks in an image style migration model according to an exemplary embodiment. Each transducer block adopts the same residual block (ResBlock) structure, each transducer block can comprise a regularization layer, a multi-head self-attention block, a multi-layer perceptron and a jump connection, and the convolution block obtained based on the network structure can effectively extract local features of the first input sample features, namely the sample local features.

The number of initial transformer branches may consist of a number of transformer blocks (Transformer Block), which is also set according to hardware conditions. Referring to fig. 5, fig. 5 is a network block diagram illustrating a convolution block in an image style migration model according to an example embodiment. Each transducer block may consist of a convolution layer, a batch regularization layer, an activation function, and a skip connection, which may be effective to extract a global feature of the sample contained in the second sample feature.

The corresponding convolution processing and transformation processing are carried out on the two different sample characteristics through the initial convolution branch and the initial transformer branch respectively, so that the local characteristics and the global characteristics corresponding to the original sample image can be obtained and used as the data basis of the fusion characteristics.

In an exemplary embodiment of the present disclosure, feature extraction is performed on an original sample image to generate sample image features corresponding to the original sample image, including: downsampling the original sample image to obtain hidden state sample characteristics corresponding to the original sample image; determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features; generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter; and performing length and width splicing processing on the first sample characteristics to obtain second sample characteristics.

The hidden state sample feature may be an image feature obtained by performing downsampling processing on an original sample image. The conditional state sample parameter may be a conditional state parameter having the same rank size as the implicit state sample feature.

After the original sample image is obtained, the original sample image can be subjected to downsampling processing to obtain hidden state sample characteristics corresponding to the original sample image, then condition state sample parameters corresponding to the hidden state sample characteristics are determined, and the condition state sample parameters and the hidden state sample characteristics have the same size. And splicing the hidden state sample characteristics with the conditional state sample parameters of the same shape to obtain first sample characteristics, so that the number of channels corresponding to the first sample characteristics is twice as large as that of the hidden state sample characteristics. After the first sample feature is obtained, the first sample feature can be directly input into the initial convolution branch for reasoning.

For the converter branch parallel to the initial convolution branch, the embodiment rearranges the obtained first sample feature, for example, the length and width of the first sample feature are flattened to obtain a second sample feature, and the obtained second sample feature is input to the initial converter branch for reasoning. Through the data processing mode, input data corresponding to the initial convolution branch and the initial converter branch can be respectively determined so as to perform model training.

In an exemplary embodiment of the present disclosure, the residual prediction result includes a hidden state residual and a hidden state prediction result, training an initial neural network model based on the residual prediction result to obtain an image style migration model, including: acquiring hidden state sample characteristics corresponding to an original sample image and target style hidden state characteristics corresponding to the original sample image; obtaining the pre-configured model iteration times and initial conditions; training an initial neural network model based on the model iteration times and initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference between the hidden state prediction result and the hidden state of the target style; generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error and the hidden state prediction result which are output each time; training the initial neural network model based on the second model loss function to obtain an image style migration model.

The target style hidden state feature can be an image feature obtained after the downsampling process is performed on the target style sample image. The number of model iterations may be the number of residual iterations performed on the initial neural network model. The initial conditions may be initial condition state parameters employed to train the initial neural network model. The hidden state prediction result may be a hidden state prediction result of input sample data output by each iteration when the model is subjected to residual iteration processing. The hidden state residual may be a residual between the hidden state prediction result output by each residual iteration process and the target style hidden state. The second model loss function may be a loss function corresponding to the initial neural network model.

For the initial neural network model, model training can be performed in a back propagation and residual iteration mode, and an image style migration model is obtained. Parameters to be trained in the model training process may include a transducer branch all parameter, a Convolition branch all parameter, and a linear output layer parameter. In order to obtain a more accurate conversion result, unlike a stylized algorithm with only one forward propagation, the embodiment performs forward propagation for multiple times through fixed model iteration times to iterate the prediction residual, so that the obtained conversion result is more similar to the target style sample image, namely, the refined hidden state conversion is realized.

Before model training is carried out, hidden state sample characteristics corresponding to an original sample image and target style hidden state characteristics corresponding to the original sample image are obtained, and the target style hidden state characteristics are used as a final training target. Referring to FIG. 6, FIG. 6 is a diagram illustrating a training process for obtaining an image style migration model through model training, according to an example embodiment. In fig. 6, the number of model iterations and the initial condition are obtained, and since the prediction result of the last residual iteration does not exist when the residual iteration process is performed for the first time, the pre-configured initial condition can be used as the data of the hidden state sample feature stitching, and the initial condition z ζ_0 can be taken as the all-zero matrix with the same shape as the hidden state z. Splicing the hidden state sample characteristics with the acquired initial conditions, and sharing As well as model input data as an initial neural network model. The data input z of the model may be the state of the original sample image mapped to the hidden space by the encoder, i.e. the hidden state sample feature; data input of model

It may be the state that the target style sample image maps to hidden space, i.e., the target style hidden state feature.

Inputting model input data into an initial neural network model, and obtaining hidden state prediction results and hidden state residual errors which are output by each model iteration when training the initial neural network model; the hidden state residual is the difference between the hidden state prediction result and the target style hidden state. And generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error and the hidden state prediction result which are output each time. The loss between the target style hidden state feature for back propagation and the hidden state prediction result from the current residual iteration can be calculated by equation 3. I.e. using mean square error to measure the hidden state prediction result of each round of output

Hidden state feature from target style->

The gap between them.

Wherein MSE () may represent the mean square error function;

the hidden state prediction result can be represented; / >

The target style hidden state feature may be represented.

After the model loss function is obtained, the initial neural network model may be trained based on the model loss function, for example, the initial neural network model may be iteratively trained based on a preconfigured number of model iterations, to obtain a final image style migration model.

In an exemplary embodiment of the present disclosure, generating a second model loss function according to the target style hidden state feature and the hidden state residual and hidden state prediction result of each output includes: determining a current iteration number of model training, and determining a current hidden state residual error corresponding to the current iteration number; determining a last hidden state prediction result corresponding to a last iteration number corresponding to a current iteration number; determining a current hidden state prediction result according to the current hidden state residual error and the previous hidden state prediction result; and generating a second model loss function according to the current hidden state prediction result and the target style hidden state characteristics.

The current iteration number may be the number of the current residual iteration. The current hidden state residual may be a prediction residual output by the current residual iteration training. The last iteration number may be a number adjacent to and in order before the current iteration number. The last hidden state prediction result may be a prediction result of the input sample data output by the last residual iterative process in the target style. The current hidden state prediction result may be a prediction result of the input sample data output by the current residual iterative process in the target style.

Since the model training needs to go through multiple residual iterative processes, each residual iterative process can be configured with a corresponding iteration number. For example, if the model iteration number is configured as N, the iteration number t=1, 2,3, …, N. The model predicts N times hidden state residual errors

The result of the stylized implicit state prediction is the sum of N residual prediction results, i.e., prediction +.>

Hidden state for each round of predictionResidual error->

Hidden state prediction result obtained from previous round +.>

The sum is the hidden state prediction result of the round +.>

Therefore, the prediction result obtained in the previous round is utilized, so that the error between the prediction result and the target hidden state is reduced in each round of iteration process, the style conversion result is optimized continuously, the accuracy is higher, and the stylized result is more similar to other algorithms.

In an exemplary embodiment of the present disclosure, for step S150, decoding the hidden state fusion feature to obtain a target style image corresponding to the original image, including: acquiring a pre-trained decoder; and decoding the hidden state fusion characteristic through a decoder to obtain a target style image of the original image under the target style.

After the hidden state fusion feature corresponding to the original image is obtained, a pre-trained decoder can be obtained, the decoder is adopted to decode the hidden state fusion feature, for example, the decoder is used to perform up-sampling processing on the hidden state fusion feature, and a target style image of the original image under a target style is obtained.

In summary, according to the image style migration method disclosed by the disclosure, an original image is obtained, and image content characteristics corresponding to the original image are determined; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style; extracting local features of the image content features to obtain local image features corresponding to the image content features, wherein the local image features comprise edge region features of an original image; extracting global features of the image content features to obtain global image features corresponding to the image content features; performing feature fusion processing on the local image features and the global image features to obtain hidden state fusion features; and decoding the hidden state fusion characteristic to obtain a target style image corresponding to the original image. On the one hand, when capturing the content characteristics of the image, the local characteristic information and the global characteristic information can be acquired at the same time, so that the content quality of the stylized image can be ensured to a greater extent. On the other hand, the style conversion is performed based on the hidden state of the image, so that the style conversion processing can be performed by using less data volume, the data processing volume in the style conversion is reduced, and the conversion processing efficiency is improved. On the other hand, the error between the input image and the target style is continuously reduced in a residual error iteration mode during hidden state conversion, so that the content quality and the stylization degree of the input image conversion result can be greatly improved.

In some embodiments of the present disclosure, an image style migration model is provided, referring to fig. 3, the image style migration model comprising: convolvingbranch 310,transformer branch 320, andlinear layer 330.

Specifically, theconvolution branch 310 is configured to perform local feature extraction on the model input data to obtain local image features; the model input data is generated based on hidden state features and conditional state parameters corresponding to the original image; the hidden state characteristics are obtained by extracting characteristics of the original image, and the condition state parameters are hidden state parameters corresponding to the target style; atransformer branch 320, configured to perform global feature extraction on the model input data, so as to obtain global image features corresponding to the image content features; alinear layer 330 for outputting hidden state fusion features of the original image in the target style based on the fusion image features; the fusion image features are obtained by fusion processing of the local image features and the global image features.

Fig. 7 is a block diagram illustrating an image style migration apparatus according to an exemplary embodiment. Referring to fig. 7, the imagestyle migration apparatus 700 includes: an imagefeature determination module 710, a localfeature extraction module 720, a globalfeature extraction module 730, afeature fusion module 740, and animage generation module 750.

Specifically, the imagefeature determining module 710 is configured to obtain an original image, and determine an image content feature corresponding to the original image; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style; the localfeature extraction module 720 is configured to perform local feature extraction on the image content features to obtain local image features corresponding to the image content features, where the local image features include edge region features of the original image; the globalfeature extraction module 730 is configured to perform global feature extraction on the image content features to obtain global image features corresponding to the image content features; thefeature fusion module 740 is configured to perform feature fusion processing on the local image feature and the global image feature to obtain a hidden state fusion feature; theimage generating module 750 is configured to perform decoding processing on the hidden state fusion feature to obtain a target style image corresponding to the original image.

In one exemplary embodiment of the present disclosure, the imagefeature determination module 710 includes an image feature determination unit for acquiring pre-configured image sampling parameters; the image sampling parameters comprise downsampling multiplying power and hidden state channel number; based on the downsampling multiplying power and the number of hidden state channels, downsampling processing is carried out on the original image, so that hidden state features corresponding to the original image are obtained; acquiring a target style for performing style migration on an original image, and determining a condition state parameter corresponding to the target style; the conditional state parameters are determined based on hidden state features; image content features are generated based on the hidden state features and the conditional state parameters.

In one exemplary embodiment of the present disclosure, the imagefeature determination module 710 includes an encoder training unit for acquiring a pre-constructed initial network model; the initial network model comprises an initial encoder and an initial decoder; determining a training sample image, and performing downsampling processing on the training sample image by an initial encoder to obtain hidden state sample characteristics corresponding to the training sample image; reconstructing the hidden state sample characteristics by an initial decoder to obtain a reconstructed sample image corresponding to the training sample image; determining a first model loss function based on a comparison result between the training sample image and the reconstructed sample image; and respectively performing parameter adjustment processing on the initial encoder and the initial decoder based on the first model loss function to obtain a trained encoder and trained decoder.

In one exemplary embodiment of the present disclosure, the encoder training unit includes a sample image generation subunit for acquiring an original sample image, and a target style sample image corresponding to the original sample image; performing image scaling processing on the original sample image and the target style sample image to obtain a corresponding scaled image pair; performing center clipping processing on the scaled image pair to obtain a clipping image with a preset size; and carrying out translation processing on the clipping image to obtain a training sample image subjected to normalization processing.

In an exemplary embodiment of the present disclosure, the imagestyle migration apparatus 700 further includes an image feature extraction module for acquiring a pre-trained image style migration model; and carrying out feature extraction and feature fusion on the image content features through an image style migration model to obtain hidden state fusion features.

In one exemplary embodiment of the present disclosure, the imagestyle migration apparatus 700 further includes a stylized model training module for determining a training sample image, the training sample image including an original sample image and a target style sample image; extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction; acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer; based on an initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features; determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by an initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result; and training the initial neural network model based on the residual prediction result to obtain an image style migration model.

In one exemplary embodiment of the present disclosure, the initial neural network model includes an initial convolution branch, an initial transformer branch; the stylized model training module comprises a fusion feature determining unit, a first sampling unit and a second sampling unit, wherein the fusion feature determining unit is used for extracting local features of a first sample through an initial convolution branch to obtain corresponding sample local features; carrying out global feature extraction on the second sample feature through the initial converter branch to obtain a corresponding sample global feature; and carrying out feature fusion processing on the local features of the sample and the global features of the sample to obtain fusion image features.

In an exemplary embodiment of the present disclosure, a stylized model training module includes an input sample data generating unit, configured to perform downsampling processing on an original sample image to obtain hidden state sample features corresponding to the original sample image; determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features; generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter; and performing length and width splicing processing on the first sample characteristics to obtain second sample characteristics.

In an exemplary embodiment of the present disclosure, the residual prediction result includes a hidden state residual and a hidden state prediction result, and the stylized model training module includes a stylized model training unit, configured to obtain a hidden state sample feature corresponding to an original sample image and a target style hidden state feature corresponding to the original sample image; obtaining the pre-configured model iteration times and initial conditions; training an initial neural network model based on the model iteration times and initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference between the hidden state prediction result and the hidden state of the target style; generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error and the hidden state prediction result which are output each time; training the initial neural network model based on the second model loss function to obtain an image style migration model.

In an exemplary embodiment of the present disclosure, the stylized model training unit includes a loss function generating subunit, configured to determine a current iteration number of the model training, and determine a current hidden state residual corresponding to the current iteration number; determining a last hidden state prediction result corresponding to a last iteration number corresponding to a current iteration number; determining a current hidden state prediction result according to the current hidden state residual error and the previous hidden state prediction result; and generating a second model loss function according to the current hidden state prediction result and the target style hidden state characteristics.

In one exemplary embodiment of the present disclosure, theimage generation module 750 includes an image generation unit for acquiring a pre-trained decoder; and decoding the hidden state fusion characteristic through a decoder to obtain a target style image of the original image under the target style.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Anelectronic device 800 according to such an embodiment of the present disclosure is described below with reference to fig. 8. Theelectronic device 800 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 8, theelectronic device 800 is embodied in the form of a general purpose computing device. Components ofelectronic device 800 may include, but are not limited to: the at least oneprocessing unit 810, the at least onestorage unit 820, abus 830 connecting the different system components (including thestorage unit 820 and the processing unit 810), and adisplay unit 840.

Wherein the storage unit stores program code that is executable by theprocessing unit 810 such that theprocessing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification.

Storage unit 820 may include readable media in the form of volatile storage units such as Random Access Memory (RAM) 821 and/orcache memory unit 822, and may further include Read Only Memory (ROM) 823.

Thestorage unit 820 may include a program/utility 824 having a set (at least one) ofprogram modules 825,such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 830 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

Theelectronic device 800 may also communicate with one or more external devices 870 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with theelectronic device 800, and/or any device (e.g., router, modem, etc.) that enables theelectronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O)interface 850. Also,electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, throughnetwork adapter 860. As shown,network adapter 860 communicates with other modules ofelectronic device 800 overbus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection withelectronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory, comprising instructions executable by a processor of an apparatus to perform the image style migration method described above. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the image style migration method according to any one of the preceding claims.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image style migration method, comprising:

acquiring an original image and determining image content characteristics corresponding to the original image; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style;

extracting local features from the image content features to obtain local image features corresponding to the image content features, wherein the local image features comprise edge region features of the original image;

extracting global features of the image content features to obtain global image features corresponding to the image content features;

performing feature fusion processing on the local image features and the global image features to obtain hidden state fusion features;

and decoding the hidden state fusion characteristic to obtain a target style image corresponding to the original image.

2. The method of claim 1, wherein the determining the image content characteristics corresponding to the original image comprises:

acquiring pre-configured image sampling parameters; the image sampling parameters comprise downsampling multiplying power and hidden state channel number;

based on the downsampling multiplying power and the number of hidden state channels, downsampling processing is carried out on the original image, so that hidden state features corresponding to the original image are obtained;

acquiring a target style for performing style migration on the original image, and determining a condition state parameter corresponding to the target style; the conditional state parameters are determined based on the hidden state features;

the image content feature is generated based on the hidden state feature and the conditional state parameter.

3. The method according to claim 2, wherein the hidden state feature corresponding to the original image is obtained by performing downsampling processing on the original image based on a pre-configured encoder; the training method of the encoder comprises the following steps:

acquiring a pre-constructed initial network model; the initial network model comprises an initial encoder and an initial decoder;

determining a training sample image, and performing downsampling processing on the training sample image by the initial encoder to obtain hidden state sample characteristics corresponding to the training sample image;

Reconstructing the hidden state sample features by the initial decoder to obtain a reconstructed sample image corresponding to the training sample image;

determining a first model loss function based on a comparison between the training sample image and the reconstructed sample image;

and respectively carrying out parameter adjustment processing on the initial encoder and the initial decoder based on the first model loss function to obtain a trained encoder and a trained decoder.

4. A method according to claim 3, wherein said determining a training sample image comprises:

acquiring an original sample image and a target style sample image corresponding to the original sample image;

performing image scaling processing on the original sample image and the target style sample image to obtain a corresponding scaled image pair;

performing center clipping processing on the scaled image pair to obtain a clipping image with a preset size;

and carrying out translation processing on the clipping image to obtain the training sample image subjected to normalization processing.

5. The method according to claim 1, wherein the method further comprises:

acquiring a pre-trained image style migration model;

And carrying out feature extraction and feature fusion on the image content features through the image style migration model to obtain the hidden state fusion features.

6. The method of claim 5, wherein the image style migration model is trained by:

determining a training sample image, wherein the training sample image comprises an original sample image and a target style sample image;

extracting features of the original sample image to generate sample image features corresponding to the original sample image; the sample image features include a first sample feature and a second sample feature; the first sample feature is used for local feature extraction, and the second sample feature is used for global feature extraction;

acquiring a pre-constructed initial neural network model; the initial neural network model includes an initial linear layer;

based on the initial neural network model, respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features to obtain fusion image features corresponding to the sample image features;

determining a target style hidden state characteristic corresponding to the target style image, and carrying out hidden state residual prediction by the initial linear layer based on the fusion image characteristic and the target style hidden state characteristic to obtain a residual prediction result;

And training the initial neural network model based on the residual prediction result to obtain the image style migration model.

7. The method of claim 6, wherein the initial neural network model comprises an initial convolution branch, an initial transformer branch; the step of respectively carrying out feature extraction and feature fusion processing on the first sample features and the second sample features based on the initial neural network model to obtain fused image features corresponding to the sample image features, comprises the following steps:

extracting local features of the first sample feature through the initial convolution branch to obtain a corresponding sample local feature;

global feature extraction is carried out on the second sample feature through the initial converter branch, and a corresponding sample global feature is obtained;

and carrying out feature fusion processing on the sample local features and the sample global features to obtain fusion image features.

8. The method of claim 6, wherein the extracting features of the original sample image to generate sample image features corresponding to the original sample image comprises:

downsampling the original sample image to obtain hidden state sample characteristics corresponding to the original sample image;

Determining a conditional state sample parameter corresponding to the hidden state sample feature; the conditional state sample parameters are determined based on the hidden state sample features;

generating a first sample feature based on the hidden state sample feature and the conditional state sample parameter;

and performing length and width splicing processing on the first sample characteristics to obtain the second sample characteristics.

9. The method of claim 6, wherein the residual prediction result comprises a hidden state residual and a hidden state prediction result, wherein the training the initial neural network model based on the residual prediction result to obtain the image style migration model comprises:

acquiring hidden state sample characteristics corresponding to the original sample image and target style hidden state characteristics corresponding to the original sample image;

obtaining the pre-configured model iteration times and initial conditions;

training the initial neural network model based on the model iteration times and the initial conditions to obtain a hidden state prediction result and a hidden state residual error which are output by each model iteration; the hidden state residual error is the difference value between the hidden state prediction result and the target style hidden state;

Generating a second model loss function according to the target style hidden state characteristics, the hidden state residual error output each time and the hidden state prediction result;

and training the initial neural network model based on the second model loss function to obtain the image style migration model.

10. The method of claim 9, wherein the generating a second model loss function from the target style hidden state feature and the hidden state residuals and the hidden state predictors for each output comprises:

determining a current iteration number of model training, and determining a current hidden state residual error corresponding to the current iteration number;

determining a last hidden state prediction result corresponding to a last iteration number corresponding to the current iteration number;

determining the current hidden state prediction result according to the current hidden state residual error and the last hidden state prediction result;

and generating the second model loss function according to the current hidden state prediction result and the target style hidden state characteristic.

11. The method according to claim 1, wherein the decoding the hidden state fusion feature to obtain the target style image corresponding to the original image includes:

Acquiring a pre-trained decoder;

and decoding the hidden state fusion characteristic through the decoder to obtain a target style image of the original image under a target style.

12. An image style migration model, comprising:

the convolution branch is used for extracting local features of the model input data to obtain local image features; the model input data is generated based on hidden state features and conditional state parameters corresponding to the original image; the hidden state characteristics are obtained by extracting characteristics of the original image, and the condition state parameters are hidden state parameters corresponding to the target style;

the converter branch is used for carrying out global feature extraction on the model input data to obtain global image features corresponding to the image content features;

the linear layer is used for outputting hidden state fusion characteristics of the original image under the target style based on the fusion image characteristics; the fusion image features are obtained by fusion processing of the local image features and the global image features.

13. An image style migration apparatus, comprising:

the image characteristic determining module is used for acquiring an original image and determining image content characteristics corresponding to the original image; the image content features are generated based on hidden state features and conditional state parameters of the original image; the hidden state characteristics are image characteristics of the original image in a hidden space, and the condition state parameters are hidden state parameters corresponding to the target style;

The local feature extraction module is used for carrying out local feature extraction on the image content features to obtain local image features corresponding to the image content features, wherein the local image features comprise edge region features of the original image;

the global feature extraction module is used for carrying out global feature extraction on the image content features to obtain global image features corresponding to the image content features;

the feature fusion module is used for carrying out feature fusion processing on the local image features and the global image features to obtain hidden state fusion features;

and the image generation module is used for decoding the hidden state fusion characteristic to obtain a target style image corresponding to the original image.

14. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image style migration method of any one of claims 1 to 11.

15. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the image style migration method of any one of claims 1 to 11.