Movatterモバイル変換


[0]ホーム

URL:


CN109447897B - Real scene image synthesis method and system - Google Patents

Real scene image synthesis method and system
Download PDF

Info

Publication number
CN109447897B
CN109447897BCN201811241932.XACN201811241932ACN109447897BCN 109447897 BCN109447897 BCN 109447897BCN 201811241932 ACN201811241932 ACN 201811241932ACN 109447897 BCN109447897 BCN 109447897B
Authority
CN
China
Prior art keywords
network model
real scene
scene image
image synthesis
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811241932.XA
Other languages
Chinese (zh)
Other versions
CN109447897A (en
Inventor
饶鉴
陈欣
刘罡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenchuang Smart Technology Wuhan Co ltd
Original Assignee
Wenchuang Smart Technology Wuhan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenchuang Smart Technology Wuhan Co ltdfiledCriticalWenchuang Smart Technology Wuhan Co ltd
Priority to CN201811241932.XApriorityCriticalpatent/CN109447897B/en
Publication of CN109447897ApublicationCriticalpatent/CN109447897A/en
Application grantedgrantedCritical
Publication of CN109447897BpublicationCriticalpatent/CN109447897B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a method and a system for synthesizing a real scene image. The method comprises the following steps: acquiring an image training set consisting of a semantic graph and a real scene reference graph corresponding to the semantic graph; establishing a real scene image synthesis network model according to the U-net convolution neural network model and the excitation residual block; establishing a loss function of a real scene image synthesis network model by using a pre-trained VGG-19 convolutional neural network model; taking the image training set as the input of a real scene image synthesis network model, and training the real scene image synthesis network model according to a loss function to obtain a trained real scene image synthesis network model; acquiring a plurality of semantic graphs to be synthesized; and inputting the semantic graph to be synthesized into the trained real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the semantic graph to be synthesized. The invention can quickly and effectively synthesize the photographic-level scene image with stronger reality sense, and improve the reality sense and the visual quality of the synthesized image.

Description

Real scene image synthesis method and system
Technical Field
The invention relates to the technical field of image synthesis, in particular to a method and a system for synthesizing a real scene image.
Background
In the field of image synthesis, real scene image synthesis technology based on deep learning is increasingly applied. Real scene image synthesis is a visual image synthesis technique for synthesizing an approximation to a real scene image based on information of object segmentation in a semantic layout. The real scene image synthesis method integrates various professional technologies such as deep learning, mode recognition and digital image processing. The key points of the real scene image synthesis are three points: (1) global coordination; (2) storage capacity of the network model; and (3) high resolution. The deep learning can realize the global feature extraction of the image, and simultaneously can improve the parameter quantity of the network model, namely the storage capacity of the network model and generate the high-resolution image, thereby greatly improving the reality of the real scene image synthesis. The design of the deep learning network structure used by the real scene image synthesis method often directly influences the effect of real scene image synthesis. Therefore, designing a suitable deep learning network structure is one of the important tasks for improving the synthesis fidelity of the scene images.
Currently, a method for synthesizing an image of a real scene includes: (1) A generator for generating countermeasure networks (GANs) using U-net as a condition, and this method can achieve desired performance when converting grayscale and binary edge images into color images. However, when this method converts the semantic graph into a photographic-level realistic image (i.e., a real scene image), the synthesis speed and visual quality thereof are to be improved. (2) A Cascaded Refinement Network (CRNs) for synthesizing photographic-level images is used to convert the semantic layout into a photographic-level photorealistic image. Although CRNs have a huge storage capacity and can generate images more realistic than method (1), they take a lot of time in the training and prediction stages and cannot realize fast and efficient synthesis of images of real scenes. In a word, the existing synthesis method has low efficiency, and the reality degree of the synthesized photographic-level realistic image and the visual quality of the image are to be improved.
Disclosure of Invention
Therefore, it is necessary to provide a method and a system for synthesizing a real scene image, so as to quickly and effectively synthesize a photographic-level scene image with a stronger sense of reality, improve the sense of reality and the visual quality of the synthesized image, and expand the application range and the application scene.
In order to achieve the purpose, the invention provides the following scheme:
a method of synthesizing an image of a real scene, comprising:
acquiring an image training set; the image training set is composed of a plurality of image pairs; each image pair is composed of a semantic graph and a real scene reference graph corresponding to the semantic graph;
establishing a real scene image synthesis network model according to the U-net convolution neural network model and the excitation residual block; the excitation residual block is composed of a convolution layer and an active layer;
establishing a loss function of the real scene image synthetic network model by utilizing a pre-trained VGG-19 convolutional neural network model;
taking the image training set as the input of the real scene image synthesis network model, and training the real scene image synthesis network model according to the loss function to obtain a trained real scene image synthesis network model;
acquiring a plurality of semantic graphs to be synthesized;
and inputting the semantic graph to be synthesized into the trained real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the semantic graph to be synthesized.
Optionally, the establishing a real scene image synthesis network model according to the U-net convolutional neural network model and the excitation residual block specifically includes:
establishing a U-net convolution neural network model; the U-net convolutional neural network model comprises a plurality of hierarchical levels;
establishing an excitation residual block;
and embedding the excitation residual block between every two adjacent layers of the U-net convolutional neural network model to form a real scene image synthesis network model.
Optionally, the training the real scene image synthesis network model with the image training set as an input of the real scene image synthesis network model according to the loss function to obtain a trained real scene image synthesis network model specifically includes:
inputting the ith Zhang Yuyi image in the training set into a current real scene image synthesis network model to obtain a real scene synthesis image corresponding to the ith semantic image; wherein i is an integer greater than or equal to 1; the current real scene image synthesis network model is a real scene image synthesis network model updated after the jth training; wherein j is an integer greater than or equal to 0;
judging whether j is smaller than a preset maximum training frequency;
if yes, inputting the real scene synthesis graph and a real scene reference graph corresponding to the ith semantic graph into the loss function to obtain a loss value;
inputting the loss value into an Adam optimizer, and updating the real scene image synthesis network model by adopting an Adam optimization algorithm; then i = i +1, j = j +1, and returns the ith semantic graph in the training set to be input into the current real scene image synthesis network model, so as to obtain a real scene synthesis graph corresponding to the ith semantic graph;
if not, the current real scene image synthesis network model is used as the trained real scene image synthesis network model.
Optionally, the excitation residual block specifically includes:
f(x)=x·sigmoid(β(x))
wherein x represents the input semantic graph; sigmoid is an activation function, and the functional expression is sigmoid (x) = 1/(1 + exp (-x)); β represents a convolution layer in the excitation residual block; β (x) represents an image obtained by performing convolution operation on the input semantic graph.
Optionally, the loss function specifically includes:
Figure GDA0003972035010000031
wherein L isf Represents a loss value; f represents a real scene synthesis graph output by the real scene image synthesis network model, and G represents a real scene reference graph; phi denotes the pre-trained VGG-19 convolutional neural network model, phil Represents the l layer, phi, in the pre-trained VGG-19 convolutional neural network modell (F) The characteristic diagram phi of the output of the I-th convolutional layer after F is input into a pre-trained VGG-19 convolutional neural networkl (G) Means that G is input into a pre-trained VGG-19 convolutional neural network, and the first layer convolutional layer is outputDrawing a feature graph; l is {0,1,2,3,4,5}; phi is a unit of0 Input diagram, phi, representing a pre-trained VGG-19 network1 To phi5 A characteristic diagram representing the corresponding output of five convolutional layers in the pre-trained VGG-19; lambda [ alpha ]l Weight coefficient, λ, corresponding to loss value representing the l-th layerl The value of (1/1.6,1/2.3,1/1.8,1/2.8, 10/0.8).
Optionally, before the training the real scene image synthesis network model according to the loss function by using the image training set as the input of the real scene image synthesis network model to obtain the trained real scene image synthesis network model, the method further includes:
determining initialization parameters of the real scene image synthesis network model; the initialization parameters comprise a learning rate, a maximum training frequency, the number of semantic graphs, the width of the semantic graphs and the height of the semantic graphs.
The invention also provides a real scene image synthesis system, which comprises:
the first acquisition module is used for acquiring an image training set; the image training set is composed of a plurality of image pairs; each image pair consists of a semantic graph and a real scene reference graph corresponding to the semantic graph;
the synthetic model establishing module is used for establishing a real scene image synthetic network model according to the U-net convolutional neural network model and the excitation residual block; the excitation residual block is composed of a convolution layer and an active layer;
the loss function establishing module is used for establishing a loss function of the real scene image synthetic network model by utilizing a pre-trained VGG-19 convolutional neural network model;
the training module is used for taking the image training set as the input of the real scene image synthesis network model, training the real scene image synthesis network model according to the loss function and obtaining the trained real scene image synthesis network model;
the second acquisition module is used for acquiring a plurality of semantic graphs to be synthesized;
and the synthesis module is used for inputting the semantic graph to be synthesized into the trained real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the semantic graph to be synthesized.
Optionally, the synthetic model establishing module specifically includes:
the first establishing unit is used for establishing a U-net convolutional neural network model; the U-net convolutional neural network model comprises a plurality of tiers;
a second establishing unit for establishing an excitation residual block;
and the synthetic model establishing unit is used for embedding the excitation residual block between every two adjacent layers of the U-net convolutional neural network model to form a real scene image synthetic network model.
Optionally, the training module specifically includes:
a synthetic image obtaining unit, configured to input the i Zhang Yuyi image in the training set into a current real scene image synthetic network model, so as to obtain a real scene synthetic image corresponding to the i semantic image; wherein i is an integer greater than or equal to 1; the current real scene image synthesis network model is a real scene image synthesis network model updated by the jth training; wherein j is an integer greater than or equal to 0;
the judging unit is used for judging whether j is smaller than the preset maximum training frequency;
an updating unit, configured to input the real scene synthesis graph and the real scene reference graph corresponding to the ith semantic graph into the loss function to obtain a loss value if j is less than a preset maximum training time; inputting the loss value into an Adam optimizer, and updating the real scene image synthetic network model by adopting an Adam optimization algorithm; enabling i = i +1, j = j +1, and returning to input the ith semantic graph in the training set into the current real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the ith semantic graph;
and the synthetic model determining unit is used for taking the current real scene image synthetic network model as the trained real scene image synthetic network model if j is greater than or equal to the preset maximum training times.
Optionally, the system further includes:
the parameter determining module is used for determining the initialization parameters of the real scene image synthesis network model; the initialization parameters comprise a learning rate, a maximum training frequency, the number of semantic graphs, the width of the semantic graphs and the height of the semantic graphs.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a real scene image synthesis method and a system, wherein the method comprises the steps of establishing a real scene image synthesis network model according to a U-net convolutional neural network model and an excitation residual block, establishing a loss function of the real scene image synthesis network model by using a pre-trained VGG-19 convolutional neural network model, and training the real scene image synthesis network model by using the loss function to obtain a final real scene image synthesis network model. The method or the system can quickly, effectively and reliably synthesize the photographic-level scene image with stronger reality sense, improve the reality sense and the visual quality of the synthesized image, and enlarge the application range and the application scene.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a method for synthesizing a real scene image according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an excitation residual block according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a real scene image synthesis network model according to an embodiment of the present invention;
FIG. 4 is a diagram of the result of the synthesis using the image in the street view data set cityscaps dataset as the semantic graph to be synthesized;
FIG. 5 is a diagram of the result of the synthesis using the image in the GTA5 dataset of the game scene as the semantic graph to be synthesized;
fig. 6 is a schematic structural diagram of a real scene image synthesis system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a method for synthesizing a real scene image according to an embodiment of the present invention.
Referring to fig. 1, the method for synthesizing an image of a real scene according to the embodiment includes:
step S1: acquiring an image training set; the image training set is composed of a plurality of image pairs; each image pair is composed of a semantic graph and a real scene reference graph corresponding to the semantic graph.
In this embodiment, the images in the image training set may be obtained from cityscape datasets, which represent real-world real scenes, or from GTA5 datasets, which represent game scenes.
Step S2: establishing a real scene image synthesis network model according to the U-net convolution neural network model and the excitation residual block; the excitation residual block is composed of a convolution layer and an active layer.
Fig. 2 is a schematic structural diagram of an excitation Residual Block according to an embodiment of the present invention, where fig. 2 (a) is a structural diagram of an active Layer (Swish Layer), and fig. 2 (b) is a structural diagram of a single excitation Residual Block (SRB).
Referring to fig. 2, each rectangular box represents a corresponding data operation in the network, arrows represent data flows,
Figure GDA0003972035010000071
representing the splicing operation of the feature map, "·" represents the element-by-element multiplication "x" of the matrix, representing the feature map of the network input, i.e. the semantic map, H represents the convolution kernel with the size of 3 × 3 convolution layers, the activation function of H is sigmoid, R (x) represents the final output feature map of 2 layers of convolution layers, the activation function of the convolution layer in the R module is LRelu, G (x) represents the output of the activation layer, and "64-d" represents that the number of channels of the feature map x is 64.
The step S2 specifically includes:
establishing a U-net convolution neural network model; the U-net convolutional neural network model comprises a plurality of tiers;
establishing an excitation residual block, wherein the excitation residual block specifically comprises the following steps:
f(x)=x·sigmoid(β(x))
wherein, x represents the input semantic graph, sigmoid is an activation function, the function expression is sigmoid (x) = 1/(1 + exp (-x)), β represents the convolution layer in the excitation residual block, and β (x) represents the image after convolution operation is performed on the input semantic graph;
and embedding the excitation residual block between every two adjacent layers of the U-net convolutional neural network model to form a real scene image synthesis network model.
The U-net convolutional neural network model in this embodiment is formed by 2 symmetrical branches on the left and right, and in the left and right branches of the U-net, several convolutional layers that perform convolution operations on feature maps of different resolutions form different U-net levels, and the U-net convolutional neural network model in this embodiment includes 6 levels, and the 6 levels have 6 levels of different resolutions.
Fig. 3 is a schematic structural diagram of a real scene image synthesis network model according to an embodiment of the present invention. Referring to fig. 3, the real scene image synthesis network model includes 6 levels, which are 1 level to 6 levels from top to bottom, and the resolution of the feature map is reduced by half in sequence; each rectangular box represents a multi-channel feature map, the numbers at the top of the rectangular boxes represent the number of channels, such as "20, 96, 192, 384, 512, 1536", etc., "s" represents an excitation residual block, arrows represent different operations, "arrow ↓" represents a downsampling operation, i.e., a maximum pooling operation, arrow ↓ "represents an upsampling operation, the method of the upsampling operation in this embodiment is a resize-convolution (resize-convolution) method, arrow" → "represents a convolution operation of a convolution layer, and a dotted arrow represents a copy-paste operation of the feature map. The upsampling operation is to enlarge an image with a small resolution to an image with a large resolution by some method. The process of resize-volume upsampling in this embodiment is as follows: firstly, an input small-resolution image is subjected to an interpolation algorithm of bicubic interpolation to double the resolution of the image, then the amplified image is subjected to convolution operation through a layer of convolution layer, and then a feature graph output by the convolution layer is an output graph of an up-sampling method.
And step S3: and establishing a loss function of the real scene image synthesis network model by using a pre-trained VGG-19 convolutional neural network model. The loss function is specifically:
Figure GDA0003972035010000081
wherein L isf Represents a loss value; f represents a real scene synthesis graph output by the real scene image synthesis network model, and G represents a real scene reference graph; phi denotes the pre-trained VGG-19 convolutional neural network model, phil Represents the l layer, phi, in the pre-trained VGG-19 convolutional neural network modell (F) The characteristic diagram phi of the output after the I layer convolution layer is shown after F is input into the pre-trained VGG-19 convolution neural networkl (G) A characteristic diagram showing the output of the first layer convolution layer when G is input into a pre-trained VGG-19 convolution neural network; the value of l is {0,1,2,3,4,5}; phi is a unit of0 Input diagram, phi, representing a pre-trained VGG-19 network1 To phi5 A characteristic diagram representing the corresponding output of five convolutional layers in the pre-trained VGG-19; lambda [ alpha ]l Denotes the l-th layerCorresponding to the loss value of (a) is given by a weight factor, λl The value of (1/1.6,1/2.3,1/1.8,1/2.8, 10/0.8).
And step S4: and determining initialization parameters of the real scene image synthesis network model.
Specifically, the initialization parameters include a learning rate, a maximum training frequency, the number of semantic graphs, the width of the semantic graphs, and the height of the semantic graphs.
In this embodiment, the learning rate learning _ rate =0.0001, the maximum training time epoch =100, the width of the semantic graph =384, and the height of the semantic graph =192.
Step S5: and taking the image training set as the input of the real scene image synthesis network model, and training the real scene image synthesis network model according to the loss function to obtain the trained real scene image synthesis network model. The step S5 specifically includes:
inputting the ith Zhang Yuyi image in the training set into a current real scene image synthesis network model to obtain a real scene synthesis image corresponding to the ith semantic image; wherein i is an integer greater than or equal to 1; the current real scene image synthesis network model is a real scene image synthesis network model updated by the jth training; wherein j is an integer greater than or equal to 0;
judging whether j is smaller than a preset maximum training frequency;
if yes, inputting the real scene synthesis graph and a real scene reference graph corresponding to the ith semantic graph into the loss function to obtain a loss value;
inputting the loss value into an Adam optimizer, and updating the real scene image synthesis network model by adopting an Adam optimization algorithm; then i = i +1, j = j +1, and returns the ith semantic graph in the training set to be input into the current real scene image synthesis network model, so as to obtain a real scene synthesis graph corresponding to the ith semantic graph;
if not, the current real scene image synthesis network model is used as the trained real scene image synthesis network model.
In this embodiment, the calculation process of the loss value may be specifically described as follows: respectively inputting the real scene synthesis graph and the corresponding real scene reference graph into a pre-trained VGG-19 convolutional neural network model, respectively obtaining feature sub-graphs output by 5 convolutional layers (respectively conv 1-2, conv22, conv32, conv42 and conv 52) in the pre-trained VGG-19, then calculating the square absolute error of the 5 groups of feature sub-graphs to obtain 5 groups of square absolute error values, then calculating the square absolute error value between the real scene synthesis graph and the real reference graph corresponding to the semantic graph, finally obtaining 6 groups of square absolute error values, and summing the 6 groups of square absolute error values to obtain the loss value.
Step S6: and acquiring a plurality of semantic graphs to be synthesized.
Step S7: and inputting the semantic graph to be synthesized into the trained real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the semantic graph to be synthesized.
In the embodiment, the images in the city-block data set cityscapesdataset representing the real scene of the real world are used as the training set and the test set, so that the real scene image synthesis method is realized. Fig. 4 is a synthesis result diagram using an image in the city view data set cityscaps dataset as a semantic graph to be synthesized, where the diagram (a) in fig. 4 is a semantic graph selected in the city view data set cityscaps dataset to be synthesized, and the diagram (b) in fig. 4 is a real scene synthesis graph corresponding to the diagram (a) in fig. 4. Fig. 5 is a diagram of a synthesis result using an image in the game scene GTA5 dataset as a semantic graph to be synthesized, where fig. 5 (a) is a semantic graph selected in the game scene GTA5 dataset to be synthesized, and fig. 5 (b) is a real scene synthesis graph corresponding to the graph of fig. 5 (a).
The method for synthesizing the real scene images can quickly, effectively and reliably synthesize the photographic-level scene images with stronger reality sense, improve the reality sense and the visual quality of the synthesized images and enlarge the application range and the application scene; and the upsampling method in the U-net convolution neural network model is a resizing-convolution (resize-convolution) method, so that the chessboard artifact in the synthetic image can be reduced, and the reality of the synthetic image is further improved.
The invention also provides a real scene image synthesis system, and fig. 6 is a schematic structural diagram of a real scene image synthesis system according to an embodiment of the invention.
Referring to fig. 6, the real scene image synthesis system of the embodiment includes:
a first obtainingmodule 601, configured to obtain an image training set; the image training set is composed of a plurality of image pairs; each image pair is composed of a semantic graph and a real scene reference graph corresponding to the semantic graph.
A syntheticmodel establishing module 602, configured to establish a real scene image synthetic network model according to the U-net convolutional neural network model and the excitation residual block; the excitation residual block is composed of a convolution layer and an active layer.
The synthesismodel building module 602 specifically includes:
the first establishing unit is used for establishing a U-net convolutional neural network model; the U-net convolutional neural network model comprises a plurality of tiers;
a second establishing unit for establishing an excitation residual block;
and the synthetic model establishing unit is used for embedding the excitation residual block between every two adjacent layers of the U-net convolutional neural network model to form a real scene image synthetic network model.
And a lossfunction establishing module 603, configured to establish a loss function of the real scene image synthesis network model by using a pre-trained VGG-19 convolutional neural network model.
Aparameter determining module 604, configured to determine an initialization parameter of the real scene image synthesis network model; the initialization parameters comprise a learning rate, a maximum training frequency, the number of semantic graphs, the width of the semantic graphs and the height of the semantic graphs.
Atraining module 605, configured to use the image training set as an input of the real scene image synthesis network model, train the real scene image synthesis network model according to the loss function, and obtain a trained real scene image synthesis network model.
Thetraining module 605 specifically includes:
a synthetic image obtaining unit, configured to input the i Zhang Yuyi image in the training set into the current real scene image synthetic network model, to obtain a real scene synthetic image corresponding to the i semantic image; wherein i is an integer greater than or equal to 1; the current real scene image synthesis network model is a real scene image synthesis network model updated by the jth training; wherein j is an integer greater than or equal to 0;
the judging unit is used for judging whether j is smaller than a preset maximum training frequency or not;
an updating unit, configured to input the real scene synthesis graph and the real scene reference graph corresponding to the ith semantic graph into the loss function to obtain a loss value if j is less than a preset maximum training time; inputting the loss value into an Adam optimizer, and updating the real scene image synthetic network model by adopting an Adam optimization algorithm; then i = i +1, j = j +1, and returns the ith semantic graph in the training set to be input into the current real scene image synthesis network model, so as to obtain a real scene synthesis graph corresponding to the ith semantic graph;
and the synthetic model determining unit is used for taking the current real scene image synthetic network model as the trained real scene image synthetic network model if j is greater than or equal to the preset maximum training times.
A second obtainingmodule 606, configured to obtain multiple semantic graphs to be synthesized.
And asynthesizing module 607, configured to input the semantic graph to be synthesized into the trained real scene image synthesizing network model, so as to obtain a real scene synthetic graph corresponding to the semantic graph to be synthesized.
The real scene image synthesis system in the embodiment can quickly, effectively and reliably synthesize the photographic-level scene image with stronger reality sense, improve the reality sense and the visual quality of the synthesized image, and expand the application range and the application scene.
In the system disclosed by the embodiment in the specification, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A method for synthesizing an image of a real scene, comprising:
acquiring an image training set; the image training set is composed of a plurality of image pairs; each image pair consists of a semantic graph and a real scene reference graph corresponding to the semantic graph;
establishing a real scene image synthesis network model according to the U-net convolution neural network model and the excitation residual block; the excitation residual block is composed of a convolution layer and an active layer;
establishing a loss function of the real scene image synthesis network model by using a pre-trained VGG-19 convolutional neural network model;
taking the image training set as the input of the real scene image synthesis network model, and training the real scene image synthesis network model according to the loss function to obtain a trained real scene image synthesis network model;
acquiring a plurality of semantic graphs to be synthesized;
inputting the semantic graph to be synthesized into the trained real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the semantic graph to be synthesized;
the method for establishing the real scene image synthesis network model according to the U-net convolution neural network model and the excitation residual block specifically comprises the following steps:
establishing a U-net convolution neural network model; the U-net convolutional neural network model comprises a plurality of tiers;
establishing an excitation residual block;
and embedding the excitation residual block between every two adjacent layers of the U-net convolutional neural network model to form a real scene image synthesis network model.
2. The method of claim 1, wherein the training the real-scene image synthesis network model according to the loss function by using the image training set as an input of the real-scene image synthesis network model to obtain a trained real-scene image synthesis network model specifically includes:
inputting the ith semantic graph in the training set into a current real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the ith semantic graph; wherein i is an integer greater than or equal to 1; the current real scene image synthesis network model is a real scene image synthesis network model updated by the jth training; wherein j is an integer greater than or equal to 0;
judging whether j is smaller than a preset maximum training frequency;
if yes, inputting the real scene synthesis graph and a real scene reference graph corresponding to the ith semantic graph into the loss function to obtain a loss value;
inputting the loss value into an Adam optimizer, and updating the real scene image synthetic network model by adopting an Adam optimization algorithm; then i = i +1, j = j +1, and returns the ith semantic graph in the training set to be input into the current real scene image synthesis network model, so as to obtain a real scene synthesis graph corresponding to the ith semantic graph;
and if not, taking the current real scene image synthesis network model as the trained real scene image synthesis network model.
3. The method for synthesizing an image of a real scene according to claim 1, wherein the excitation residual block is specifically:
f(x)=x·sigmoid(β(x))
wherein x represents the input semantic graph; sigmoid is an activation function, and the functional expression is sigmoid (x) = 1/(1 + exp (-x)); β represents a convolutional layer in the excitation residual block; β (x) represents an image obtained by performing convolution operation on the input semantic graph.
4. The method for synthesizing an image of a real scene according to claim 1, wherein the loss function is specifically:
Figure FDA0003972029000000021
wherein L isf Represents a loss value; f represents a real scene synthesis graph output by the real scene image synthesis network model, and G represents a real scene reference graph; phi denotes the pre-trained VGG-19 convolutional neural network model, phil Represents the l layer, phi, in the pre-trained VGG-19 convolutional neural network modell (F) The characteristic diagram phi of the output after the I layer convolution layer is shown after F is input into the pre-trained VGG-19 convolution neural networkl (G) A characteristic diagram showing the output of the first layer convolution layer when G is input into a pre-trained VGG-19 convolution neural network; the value of l is {0,1,2,3,4,5}; phi is a0 Input diagram, phi, representing a pre-trained VGG-19 network1 To phi5 A characteristic diagram representing the corresponding output of five convolutional layers in the pre-trained VGG-19; lambda [ alpha ]l Weight coefficient, λ, representing the loss value of the l-th layerl The value of (1/1.6,1/2.3,1/1.8,1/2.8, 10/0.8).
5. The method as claimed in claim 1, wherein before the step of training the real-scene image synthesis network model according to the loss function by using the image training set as the input of the real-scene image synthesis network model to obtain the trained real-scene image synthesis network model, the method further comprises:
determining initialization parameters of the real scene image synthesis network model; the initialization parameters comprise a learning rate, a preset maximum training frequency, the number of semantic graphs, the width of the semantic graphs and the height of the semantic graphs.
6. A real scene image composition system, comprising:
the first acquisition module is used for acquiring an image training set; the image training set is composed of a plurality of image pairs; each image pair is composed of a semantic graph and a real scene reference graph corresponding to the semantic graph;
the synthetic model establishing module is used for establishing a real scene image synthetic network model according to the U-net convolutional neural network model and the excitation residual block; the excitation residual block is composed of a convolution layer and an active layer;
the loss function establishing module is used for establishing a loss function of the real scene image synthetic network model by utilizing a pre-trained VGG-19 convolutional neural network model;
the training module is used for taking the image training set as the input of the real scene image synthesis network model, training the real scene image synthesis network model according to the loss function and obtaining the trained real scene image synthesis network model;
the second acquisition module is used for acquiring a plurality of semantic graphs to be synthesized;
the synthesis module is used for inputting the semantic graph to be synthesized into the trained real scene image synthesis network model to obtain a real scene synthesis graph corresponding to the semantic graph to be synthesized;
the synthetic model establishing module specifically comprises:
the first establishing unit is used for establishing a U-net convolutional neural network model; the U-net convolutional neural network model comprises a plurality of tiers;
a second establishing unit for establishing an excitation residual block;
and the synthetic model establishing unit is used for embedding the excitation residual block between every two adjacent layers of the U-net convolutional neural network model to form a real scene image synthetic network model.
7. The real scene image synthesis system according to claim 6, wherein the training module specifically includes:
a synthetic image obtaining unit, configured to input the ith semantic image in the training set into a current real scene image synthetic network model, so as to obtain a real scene synthetic image corresponding to the ith semantic image; wherein i is an integer greater than or equal to 1; the current real scene image synthesis network model is a real scene image synthesis network model updated by the jth training; wherein j is an integer greater than or equal to 0;
the judging unit is used for judging whether j is smaller than a preset maximum training frequency or not;
an updating unit, configured to input the real scene synthesis graph and the real scene reference graph corresponding to the ith semantic graph into the loss function to obtain a loss value if j is less than a preset maximum training time; inputting the loss value into an Adam optimizer, and updating the real scene image synthesis network model by adopting an Adam optimization algorithm; then i = i +1, j = j +1, and returns the ith semantic graph in the training set to be input into the current real scene image synthesis network model, so as to obtain a real scene synthesis graph corresponding to the ith semantic graph; and the synthetic model determining unit is used for taking the current real scene image synthetic network model as the trained real scene image synthetic network model if j is greater than or equal to the preset maximum training times.
8. A real scene image synthesis system according to claim 6, further comprising:
the parameter determining module is used for determining initialization parameters of the real scene image synthesis network model; the initialization parameters comprise a learning rate, a preset maximum training frequency, the number of semantic graphs, the width of the semantic graphs and the height of the semantic graphs.
CN201811241932.XA2018-10-242018-10-24Real scene image synthesis method and systemActiveCN109447897B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811241932.XACN109447897B (en)2018-10-242018-10-24Real scene image synthesis method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811241932.XACN109447897B (en)2018-10-242018-10-24Real scene image synthesis method and system

Publications (2)

Publication NumberPublication Date
CN109447897A CN109447897A (en)2019-03-08
CN109447897Btrue CN109447897B (en)2023-04-07

Family

ID=65548080

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811241932.XAActiveCN109447897B (en)2018-10-242018-10-24Real scene image synthesis method and system

Country Status (1)

CountryLink
CN (1)CN109447897B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110322528B (en)*2019-06-262021-05-14浙江大学 Vascular reconstruction method of MRI brain image based on 3T and 7T
CN110998663B (en)*2019-11-222023-12-01驭势(上海)汽车科技有限公司 Image generation method, electronic device and storage medium for simulation scene
CN112488967B (en)*2020-11-202024-07-09中国传媒大学Object and scene synthesis method and system based on indoor scene
CN112907750A (en)*2021-03-052021-06-04齐鲁工业大学Indoor scene layout estimation method and system based on convolutional neural network
CN114372940B (en)*2021-12-152025-05-23南京邮电大学Real scene image synthesis method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106204467A (en)*2016-06-272016-12-07深圳市未来媒体技术研究院A kind of image de-noising method based on cascade residual error neutral net
CN106372581A (en)*2016-08-252017-02-01中国传媒大学Method for constructing and training human face identification feature extraction network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106204467A (en)*2016-06-272016-12-07深圳市未来媒体技术研究院A kind of image de-noising method based on cascade residual error neutral net
CN106372581A (en)*2016-08-252017-02-01中国传媒大学Method for constructing and training human face identification feature extraction network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
vU-net: accurate cell edge segmentation in time-lapse fluorescence live cell images based on convolutional neural network;Chuangqi Wang 等;《bioRxiv》;20170921;第1页-第17页*
超声图像神经分割方法研究;徐晨阳 等;《北京邮电大学学报》;20180228;第41卷(第1期);第115页-第120页*

Also Published As

Publication numberPublication date
CN109447897A (en)2019-03-08

Similar Documents

PublicationPublication DateTitle
CN109447897B (en)Real scene image synthesis method and system
CN113240580B (en)Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN111833246B (en) Single-frame image super-resolution method based on attention cascade network
CN111369440B (en)Model training and image super-resolution processing method, device, terminal and storage medium
CN112396645B (en)Monocular image depth estimation method and system based on convolution residual learning
CN109215123B (en)Method, system, storage medium and terminal for generating infinite terrain based on cGAN
CN110210485A (en)The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
Zeng et al.Single image super-resolution using a polymorphic parallel CNN
CN111899203B (en)Real image generation method based on label graph under unsupervised training and storage medium
CN113298097B (en)Feature point extraction method and device based on convolutional neural network and storage medium
CN114049420B (en)Model training method, image rendering method, device and electronic equipment
CN113536971B (en)Target detection method based on incremental learning
KR20230073751A (en)System and method for generating images of the same style based on layout
CN117576402B (en)Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN115496654A (en)Image super-resolution reconstruction method, device and medium based on self-attention mechanism
CN110097505A (en)A kind of Law of DEM Data processing method and processing device
CN114972062A (en)Image restoration model based on parallel self-adaptive guide network and method thereof
CN114373110A (en)Method and device for detecting target of input image and related products
CN115713462A (en)Super-resolution model training method, image recognition method, device and equipment
CN112785498B (en)Pathological image superscore modeling method based on deep learning
CN118096978B (en) A method for rapid generation of 3D art content based on arbitrary stylization
CN116363329B (en)Three-dimensional image generation method and system based on CGAN and LeNet-5
CN117456185A (en) Remote sensing image segmentation method based on adaptive pattern matching and nested modeling
CN115330601A (en)Multi-scale cultural relic point cloud super-resolution method and system
CN120047450B (en)Remote sensing image change detection method based on double-current time-phase characteristic adapter

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp