Disclosure of Invention
In view of the foregoing, it is an object of the present application to provide an object and a scene synthesizing method based on an indoor scene, synthesizing an object into a scene, including:
obtaining an original image of an object and a scene;
An object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model are constructed, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
Obtaining a synthesizable position of the object in the scene according to the object position relation constraint model;
Obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
Obtaining different shielding relations between objects with different synthesizable positions and objects in a scene according to the object shielding relation constraint model;
Rendering the object with the scale proportion at a corresponding synthesizable position according to a corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
The step of constructing the object position relation constraint model comprises the following steps:
Constructing a semantic segmentation network, wherein the input of the semantic segmentation network is an RGB image of an original image of a scene, the output of the semantic segmentation network is a semantic label corresponding to each pixel point, and the semantic label is a label corresponding to an object in the scene;
Constructing a semantic feature extraction network, setting a layout label of a scene, dividing the scene into corresponding layout areas, wherein the layout label comprises an image scene, a junction of a wall and the ground, a junction of the wall and a junction of the wall and a ceiling, the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic label, and the output is the layout label;
constructing a scene layout estimation network, sharing weights of a semantic segmentation network and a semantic feature extraction network, and obtaining layout labels corresponding to all pixel points;
the synthesizable positions of the objects in scene synthesis are defined according to the rationality of the object placements, which means that the layout area is able to provide reasonable support for the objects.
Preferably, it comprises:
constructing an object style relation constraint model, wherein the object style relation constraint model is used for constraining the style of an object relative to a scene;
According to the object style relation constraint model, the color style of the original image of the object is adjusted, so that the object and the scene are naturally fused;
And rendering the object with the color style adjusted at the selected synthesizable position according to the scale proportion and the shielding relation to obtain a synthesized image of the object and the scene.
Further, preferably, the step of constructing the object style relation constraint model includes:
Adjusting the brightness and saturation of the object so that the object is compatible with the scene style includes: establishing an image histogram of the object and the scene in a color space, and restricting the maximum consistency of the brightness and saturation of the image histogram of the object and the scene by moving the image histogram of the corresponding statistic of the object;
adjusting the local contrast and correlated color temperature of the object to make the object compatible with the scene style, comprising: and the appearance of the object is adjusted by combining the local contrast and the correlated color temperature with the HSL channel, so that the object is compatible with the scene image style.
Further, preferably, the step of adjusting the brightness and saturation of the object so that the object is compatible with the scene style includes:
obtaining an RGB image of an original image of a scene;
converting an RGB image of an original image of a scene into an HSL color space image to obtain a brightness histogram and a hue-saturation histogram corresponding to the scene;
rendering the object into a two-dimensional RGB image with regular edge contours;
converting the RGB image of the object into an HSL color space image, extracting effective pixels corresponding to the object by using a mask, and obtaining a brightness histogram and a hue-saturation histogram of the object corresponding to the effective pixels;
Adjusting the brightness histogram of the object corresponding to the effective pixel to make the brightness of the object consistent with that of the scene;
And adjusting the hue-saturation histogram of the object corresponding to the effective pixel so that the saturation of the object and the saturation of the scene are consistent.
Furthermore, preferably, the step of adjusting the local contrast and correlated color temperature of the subject such that the subject is compatible with the scene style comprises:
obtaining an RGB image of an original image of a scene;
rendering the object into a two-dimensional RGB image with regular edge contours;
converting the RGB images of scenes and objects into HSL color space images;
for each pixel point of the HSL color space image of the effective pixel corresponding to the scene and the mask extraction object, the local contrast of each pixel point is obtained by the following formula
Wherein Lx is the pixel brightness,Is the local average luminance at pixel x;
The S curve is utilized to realize the pixel-level brightness conversion of the HSL color space image of the effective pixel corresponding to the object, and the forward S curve and the reverse S curve are respectively used for enhancing and reducing the local contrast, so that the contrast of the object and the scene is matched;
And adjusting the correlated color temperature of the HSL color space image of the effective pixel corresponding to the object, so that the correlated color temperatures of the object and the scene are consistent.
Preferably, the step of constructing the object position relation constraint model further includes:
obtaining edge nodes corresponding to layout labels of a connection part of a wall and a floor and a connection part of the wall and a ceiling comprises the following steps: expanding edge pixels of a layout output image of a scene layout estimation network into one-dimensional vectors, and taking middle point pixels of each edge section as edge nodes;
Obtaining an intermediate node corresponding to a layout label at a wall and a wall connection, comprising: detecting a vertical line segment between walls by using Hough transformation, wherein the end point of the line segment is a middle node;
and connecting the edge node and the intermediate node to generate a layout structure diagram of the scene.
Preferably, the step of constructing the object scale relation constraint model includes: the scale of the object takes the wall height of the indoor scene as a reference, the distance between the object and the wall is an independent variable, and the scale of the object is a dependent variable and changes linearly.
Further, preferably, the step of constructing an object scale relation constraint model includes:
the dimensional relationship between the object and the scene is constructed by the following formula with reference to the wall height or/and the wall width,
Sobj=λSwall+δ(S-S2),
Where Swall is the height or width of the wall, Sobj is the size of the object corresponding to the height or width of the wall, λ is the scale factor of Sobj of the object and Swall of the wall, S2 is the value of one end point of the height or width of the wall closest to the object in the dimension of the height or width, S is the value of the synthesizable position of the object in the dimension, and δ is the scale factor.
Preferably, the step of constructing an occlusion relationship constraint model includes:
performing mask extraction on an object in a layout area where a synthesizable position of the object is located;
Judging whether the synthesizable position of the object is above the object pixel;
if the synthesizable location is above an object pixel, the object occludes the object;
if the synthesizable location is not above the object pixel, the object is not occluded by the object.
Further, preferably, the step of extracting the mask of the object in the layout area where the synthesizable position of the object is located includes:
Extracting an object area of a layout area where the synthesizable position of the object is located by a mask;
Calculating a gradient of the object region;
and narrowing the object mask pixel by pixel until stopping at the local gradient maximum value, obtaining the mask edge of the object, and outputting the mask of the object.
Preferably, the step of obtaining a composite image of the object and the scene further comprises:
Taking the synthesized image as an initial synthesized image, extracting objects in the initial synthesized image by using a mask, wherein the initial synthesized image except the objects is a scene image;
Obtaining an HSL color space image of the scene image and the object;
Converting the brightness and saturation into logarithmic domain according to weber's law, and using micro-inversion as a unit of correlated color temperature to obtain brightness, saturation, local contrast and correlated color temperature statistics of HSL color space images of the object and scene images;
adjusting the local contrast of the object through an S curve;
Adjusting the brightness of the object so that the brightness of the object is consistent with the brightness of the scene image;
Adjusting the correlated color temperature of the object so that the correlated color temperatures of the object and the scene image are consistent;
adjusting the saturation of the object so that the saturation of the object and the saturation of the scene image are consistent;
And re-rendering the adjusted synthesizable position of the object in the scene image to obtain an adjusted synthesized image.
Preferably, the method further comprises:
providing a user interaction interface for the client, wherein the user interaction interface provides multiple scenes, multiple objects and multiple synthesizable positions for the client to select, and feeds back rationality of synthesizable positions of the objects selected by the client in the scenes, and the rationality refers to conforming to an object position relation constraint model, an object scale relation constraint model, an object shielding relation constraint model and an object style relation constraint model.
Preferably, the step of constructing the object position relation constraint model further includes:
based on the pixel region of the synthesizable positions, the probability of each synthesizable position is obtained by
Wherein (x, y, 0) represents one synthesizable position, and S is a pixel region composed of all synthesizable positions.
According to another aspect of the present invention, there is provided an object and scene synthesizing system based on an indoor scene, synthesizing an object into a scene, comprising:
the image acquisition module is used for acquiring original images of the object and the scene;
The model construction module is used for constructing an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
The synthesizable position obtaining module obtains the synthesizable position of the object in the scene according to the object position relation constraint model;
the scale proportion obtaining module is used for obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
the shielding relation obtaining module is used for obtaining different shielding relations between the objects with different synthesizable positions and the objects in the scene according to the object shielding relation constraint model;
The synthesizing module carries out style rendering on the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
wherein, the model construction module includes:
The semantic segmentation network inputs an RGB image of an original image of a scene, outputs semantic tags corresponding to all pixel points, and the semantic tags are tags corresponding to objects in the scene;
The method comprises the steps of setting a semantic feature extraction network, setting a layout label of a scene, dividing the scene into corresponding layout areas, wherein the layout label comprises an image scene, a junction of a wall and a ground, a junction of the wall and a junction of the wall and a ceiling, the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic label, and the output is the layout label;
The scene layout estimation network shares the weights of the semantic segmentation network and the semantic feature extraction network to obtain layout labels corresponding to all pixel points;
And the synthesizable position limiting unit is used for limiting synthesizable positions of the objects in scene synthesis according to rationality of the object occupation, wherein the rationality of the object occupation means that a layout area can provide reasonable support for the objects.
According to the method and the system for synthesizing the object and the scene based on the indoor scene, the object can be synthesized at different positions in an indoor scene image, a plurality of new images with shielding conditions can be generated, scene understanding (analyzing the original scene image structure and object information to understand the layout of the scene and the position relation of the object) is introduced, probability distribution of synthesizable positions in the indoor scene image is analyzed through scene layout estimation and semantic segmentation, and a parameterized model between the synthesizable positions, the character scale and the shielding relation is established, so that the synthesized image is more diversified. In the image blending part, the color style information of the scene image and the object is analyzed, the color style of the foreground object is adjusted to achieve the effect of natural fusion with the scene image, and the synthesized image is more real by adjusting the character style to be compatible with the scene image.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Specific embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an object and scene synthesis method based on an indoor scene according to the present invention, which synthesizes an object into a scene as shown in fig. 1, including:
step S1, obtaining an original image of an object and a scene;
S2, an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model are constructed, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
S3, obtaining a synthesizable position of the object in the scene according to the object position relation constraint model;
S4, obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
S5, obtaining different shielding relations between objects with different synthesizable positions and objects in a scene according to the object shielding relation constraint model;
Step S7, performing style rendering on the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
The step of constructing the object position relation constraint model comprises the following steps:
Constructing a semantic segmentation network, wherein the input of the semantic segmentation network is an RGB image of an original image of a scene, the output of the semantic segmentation network is a semantic label corresponding to each pixel point, the semantic label is a label corresponding to an object in the scene, as shown in fig. 2, a full convolution network is trained by using SUNRGBD data with semantic labels of indoor objects, 37-level semantic segmentation of a dataset SUNRGBD is realized, and the 37 categories can cover objects and furniture which are usually present in the indoor scene, such as walls, ceilings, chairs or windows, so as to describe a disordered scene to the maximum extent, and a convolution layer with a hole mechanism is added to the semantic segmentation network on the basis of the original ResNet-101. Since the input to this network is an RGB image, it is actually a random variable X with a value range of 0, 255, which is determined by an implicit random variable Y, which takes the value of the semantic tag 1,2,3, 37, corresponding to the 37 categories. Thus, this network describes a posterior probability distribution P (y|x);
Constructing a semantic feature extraction network, setting a layout tag of a scene, dividing the scene into corresponding layout areas, wherein the layout tag comprises a joint of a wall and a ground, a joint of the wall and a ceiling, and an image scene (an area of a non-joint), the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic tag, and the output is the layout tag, for example, the layout tag is:
S={bg,wf,ww,wc}
Wherein bg, wf, ww, wc represent image scenes (background, bg), wall-floor edge, wf, wall-wall edge, ww, and wall-ceiling edge, wc, respectively; as shown in fig. 3, the semantic feature extraction network uses each pixel in the semantic features as a sample, uses the layout label corresponding to LSUN datasets as a supervision item, and trains a37×4 full-connection layer to learn the gap between the semantic features of the 37 channels and the class 4 edge labels. At this stage, Y is determined by a hidden random variable Z, which takes the value in the edge tag [1,2,3,4], P (Z|Y) is a parameterized representation of the room layout and scene clutter;
Constructing a scene layout estimation network, sharing weights of a semantic segmentation network and a semantic feature extraction network to obtain layout labels corresponding to all pixel points, reconstructing a learned 37×4 full-connection layer into a1×1×37×4 convolution layer as shown in fig. 3, adding the convolution layer into a semantic segmentation pre-training network in a first stage, synthesizing topological structures of the semantic segmentation network and the semantic feature extraction network, combining a pixel-level layout estimation network, taking the semantic segmentation network weights as feature extractors, taking the semantic feature extraction network weights as classifiers, and constructing the layout estimation network by the following formula
P(Z|Y)P(Y|X)=P(Z|X)
On one hand, the layout estimation network can perform end-to-end fine tuning on the scene layout estimation network on LSUN data sets to perform layout edge prediction; on the other hand, it smartly combines the relationship between room clutter and scene layout. The layout edge prediction network may thus convert the input image into a two-dimensional representation of the final scene layout. The result of the proposed scene layout estimation network's performance on LSUN dataset is shown in fig. 4;
The synthesizable position of the object in the scene synthesis is defined according to the rationality of the object occupation, wherein the rationality of the object occupation refers to that the layout area can provide reasonable support for the object, for example, a person is taken as the object, the layout area of the synthesizable position is the ground, and the ground position in the scene layout is necessarily below the image, so that the reasonable position of the object synthesis can be divided according to the layout two-dimensional diagram, the ground area is randomly sampled, and the image synthesis point (synthesizable position) is obtained, and the schematic diagram is shown in fig. 5.
The foreground is a person or object (front, blocking others) near the lens, and the existing method usually synthesizes the person as the foreground into the scene image. Therefore, in the application, the synthesized characters are not only used as the prospect.
In step S2, as shown in fig. 6, the step of constructing the object position relationship constraint model further includes: optimizing the output of the layout estimation network to obtain more refined layout structure lines so as to perform more accurate scale constraint on the object, and specifically comprises the following steps:
Obtaining edge nodes corresponding to layout labels of a connection part of a wall and a floor and a connection part of the wall and a ceiling comprises the following steps: expanding edge pixels of a layout output image (such as a diagram on the left side in fig. 6) of the scene layout estimation network into one-dimensional vectors, and taking middle point pixels of each edge section as edge nodes;
Obtaining an intermediate node corresponding to a layout label at a wall and a wall connection, comprising: detecting a vertical line segment between walls by using Hough transformation, wherein the end point of the line segment is a middle node;
the edge nodes and the intermediate nodes are connected to generate a layout structure diagram of the scene (e.g., the right-hand diagram in fig. 6).
Preferably, the step of constructing the object position relation constraint model further includes:
based on the pixel region of the synthesizable positions, the probability of each synthesizable position is obtained by
Wherein (x, y, 0) represents one synthesizable position, and S is a pixel region composed of all synthesizable positions.
In step S2, the step of constructing an object scale relation constraint model includes:
the dimensional relationship between the object and the scene is constructed by the following formula with reference to the wall height or/and the wall width,
Sobj=λSwall+δ(S-S2)
Where Swall is the height or width of the wall, Sobj is the size of the object corresponding to the height or width of the wall, λ is the scale factor of Sobj of the object and Swall of the wall, S2 is the value of one endpoint of the height or width of the wall closest to the object in the dimension of the height or width, S is the value of the object in the dimension of the synthesizable position, and δ is the scale factor.
When the person is taken as an object, the layout area of the synthesizable position is ground, and the step of constructing the object scale relation constraint model comprises the following steps: the scale of the object is linear with the wall height of the indoor scene as a reference, the distance between the object and the wall as a dependent variable, that is, as shown in fig. 6, the vertical layout line ww in the scene is detected through hough transformation, the pixel coordinates of the two end points are set to be (x1,y1) and (x2,y2), the image area below the layout line wf is set to be F, the position (x, y) e F for restricting the synthesis of the foreground object is defined, the scale is scaled with the height of the object as a reference, and the linear relationship between the character scale and the character distance is assumed, namely:
Hobj=λHwall+δ(y-y2)
Wherein Hobj is the height of the object, and Hwall is the height of the wall.
In a preferred embodiment, λ is estimated empirically to be approximately 0.6 and a comparison experiment is performed with respect to the different δ, as shown in fig. 7a, when δ=0.6, the synthesized person is smaller in scale; when δ=1.8, the synthesized person is large in size. Therefore, when the value of δ is too small or too large, there is a dimensional inadequacy, preferably, when 1.0.ltoreq.δ.ltoreq.1.4, the synthesized foreground character has a relatively real representation on the scale of the scene, as shown in fig. 7b, which is a synthesized graph when δ=1.0, δ=1.2, and δ=1.4.
In step S2, as shown in fig. 8, the step of constructing an occlusion relationship constraint model includes:
performing mask extraction on an object in a layout area where a synthesizable position of the object is located;
Judging whether the synthesizable position of the object is above the object pixel;
if the synthesizable location is above an object pixel, the object occludes the object;
if the synthesizable location is not above the object pixel, the object is not occluded by the object.
The occlusion relation constraint model adopts a mode of covering an object mask, and the object mask is a binary image obtained by scene semantic segmentation.
Because the error of the semantic label and the generalization performance of the network model are limited, the semantic segmentation graph of the object cannot be completely accurate, and especially in the image synthesis process, the pixel-level error brings serious distortion, namely, a relatively obvious gap exists at the connection part of the object and the object, so as to solve the problem that as shown in fig. 9, the step of extracting the mask of the object in the layout area where the synthesizable position of the object is located comprises the following steps:
Extracting an object area of a layout area where the synthesizable position of the object is located by a mask;
Calculating the gradient of the object region, the gradient at (x, y) in the object region being:
i.e. the gradient at (x, y) in the object region is a vector, the magnitude and direction of this vector are respectively:
wherein,Representing the gradient magnitude, α (x, y) representing the gradient direction;
the gradient direction may reflect where the image gray value changes most, so the gradient value of the pixel edge of the object mask is locally greatest. Therefore, calculating the pixel gradient value of the local area of the object of the scene image, wherein the local gradient is the reasonable connection position, shrinking the object mask pixel by pixel until the local gradient is stopped at the maximum value, obtaining the mask edge of the object, outputting the mask of the object, indicating that the new mask edge accords with the real edge of the object of the scene, and having more real visual effect when the shielding relation is processed.
And establishing a relation model of the scene image and the object through scene understanding, so that geometric factors such as the position, the scale and the like of object synthesis are reasonably controlled, and further, image synthesis with various interaction relations is realized. However, not only is a true composite image reasonable in geometric relationship, but also the object and scene images are compatible with each other to achieve the effect of natural fusion, so the above-mentioned object and scene composition method further includes:
In the step S2, an object style relation constraint model is also constructed, and the object style relation constraint model is used for constraining the style of an object relative to a scene;
S6, adjusting the color style of the original image of the object according to the object style relation constraint model so that the foreground object and the scene are naturally fused;
and S7, rendering the object with the color style adjusted at the selected synthesizable position according to the scale proportion and the shielding relation to obtain a synthesizable image of the object and the scene.
Preferably, the step of constructing the object style relation constraint model includes:
Adjusting the brightness and saturation of the object so that the object is compatible with the scene style includes: establishing an image histogram of the object and the scene in a color space, and restricting the maximum consistency of the brightness and saturation of the image histogram of the object and the scene by moving the image histogram of the corresponding statistic of the object;
adjusting the local contrast and correlated color temperature of the object to make the object compatible with the scene style, comprising: the appearance of the object is adjusted by combining the local contrast and the correlated color temperature with the HSL channel, so that the object is compatible with the style of the original image.
In one embodiment, as shown in fig. 10, the step of adjusting the brightness and saturation of the object to make the object compatible with the scene style includes:
obtaining an RGB image of an original image of a scene and an object;
Converting the RGB image into an HSL color space image, to obtain a luminance histogram and a Hue-Saturation histogram corresponding to the object and the scene, where H, S, L in the HSL represent Hue, saturation, and brightness (Hue, saturation, light), (r, g, b) are the red, green, and blue channel coordinates of a certain pixel point of the RGB image, specifically: converting an RGB image of an original image of a scene into an HSL color space image to obtain a brightness histogram and a hue-saturation histogram corresponding to the scene; rendering an object into a two-dimensional RGB image with regular edge contours, converting the RGB image of the object into an HSL color space image, extracting effective pixels corresponding to the object by using masks, acquiring a brightness histogram and a hue-saturation histogram of the object corresponding to the effective pixels, and normalizing the RGB values of the image to be within a range of [0,1] for the convenience of calculation, wherein the values of r, g and b are real numbers between 0 and 1. Let max be the maximum value of r, g, b, min be the minimum value of r, g, b. To find the (h, s, l) value in HSL space, where h ε [0, 360 is hue angle, s, l ε [0,1 is saturation and luminance, and the conversion relationship of RGB and HSL modes is:
brightness control and contrast adjustment are carried out by utilizing a brightness histogram corresponding to the object, so that the brightness and the contrast of the object are consistent with those of the scene;
And adjusting the saturation by using the hue-saturation histogram corresponding to the object, so that the saturation of the object is consistent with that of the scene.
In one embodiment, the step of adjusting the local contrast and correlated color temperature of the subject to make the subject compatible with the scene style comprises:
obtaining an RGB image of an original image of a scene;
rendering the object into a two-dimensional RGB image with regular edge contours;
converting the RGB images of scenes and objects into HSL color space images;
for each pixel point of the HSL color space image of the effective pixel corresponding to the scene and the mask extraction object, the local contrast of each pixel point is obtained by the following formula
Wherein Lx is the pixel brightness,Is the local average luminance at pixel x;
The S curve is utilized to realize the pixel-level brightness conversion of the HSL color space image of the effective pixel corresponding to the object, and the forward S curve and the reverse S curve are respectively used for enhancing and reducing the local contrast, so that the contrast of the object and the scene is matched;
And adjusting the correlated color temperature of the HSL color space image of the effective pixel corresponding to the object, so that the correlated color temperatures of the object and the scene are consistent.
Preferably, the step of obtaining a composite image of the object and the scene further comprises:
Taking the synthesized image as an initial synthesized image, extracting objects in the initial synthesized image by using a mask, wherein the initial synthesized image except the objects is a scene image;
Obtaining an HSL color space image of the scene image and the object;
Converting the brightness and saturation into the logarithmic domain according to weber's law, using the micro-inversion as a unit of correlated color temperature, obtaining brightness, saturation, local contrast, and correlated color temperature statistics of the HSL color space image of the subject and scene images, comprising:
luminance and saturation are converted into the logarithmic domain according to weber's law, i.e. the luminance space is represented by log2 Y (where y= [ epsilon, 1.0], epsilon = 3.03 x 10-4 is used to prevent undefined logarithmic values); the saturation channel of the HSL color space is represented by log2 S (s= [ epsilon, 1.0 ]); h represents the hue value of the cycle in the range of [0.0,1.0 ];
taking the micro-fall (mired) as a unit of correlated color temperature (Correlated Color Temperature, CCT):
K is Kelvin temperature, natural lighting range is between [1500, 20000], and the correlated color temperature of the image is calculated by using OptProp tools;
The local contrast of the object is adjusted by an S-curve, as shown in fig. 11, brightness conversion at the pixel level is achieved by using an S-curve, and a forward S-curve and a reverse S-curve are used to enhance and reduce the local contrast, respectively, and the pixel brightness Lin is mapped to Lout by the S-curve, and in the same image, the S-curve of the same parameter is used. Wherein the starting point of the S-curve is P0 = (0, 0), the ending point is P1 = (1, 1), the turning point Pm is the average luminance value, and the point divides the S-curve into two upper and lower sub-curves, each of which is a bezier curve with 3 anchor points (BezierCurve). Pm,PU,P1 is 3 anchor points of the upper segment sub-curve, Pm,PD,P0 is 3 anchor points of the lower segment sub-curve, the upper segment sub-curve and the lower segment sub-curve are respectively controlled by the anchor points, and the conversion degree of the S curve is determined by PU and PD:
PU=P11+α(P12-P11),
PD=P01+α(P02-P01).
Wherein, alpha is a parameter for controlling the bending direction of the S curve, and when alpha is less than 0.5, the S curve for increasing the local contrast is obtained; when alpha is more than 0.5, the inverse S curve for reducing the local contrast is obtained. α=0.5 then degenerates into a straight line. In the experiment, alpha is set in [0.4,0.6], and the value of alpha is continuously searched to find a proper curve, so that the contrast ratio of a foreground person and a scene image is matched;
the brightness, saturation, correlated color temperature and local contrast of the image are taken as adjustment targets, and the statistics of each index are calculated and recorded asAndFor example, the number of the cells to be processed,A correlated color temperature value representing an image of the scene,
Adjusting local contrast of an object by means of an S-curve
Adjusting the brightness of the object so that the brightness of the object and the scene image are consistent,
Adjusting the correlated color temperature of the subject, so that the correlated color temperatures of the subject and the scene image coincide,
Adjusting the saturation of the object, so that the saturation of the object and the scene image is consistent,
And re-rendering the adjusted synthesizable position of the object in the scene image to obtain an adjusted synthesized image.
In the above object and scene composition method, the contrast adjustment step is placed before the other adjustment step in order to prevent other indexes from being affected after the contrast adjustment.
In a preferred embodiment, as shown in fig. 1 and 12, further comprising:
providing a user interaction interface for the client, wherein the user interaction interface provides multiple scenes, multiple objects and multiple synthesizable positions for the client to select, and feeds back rationality of synthesizable positions of the objects selected by the client in the scenes, and the rationality refers to conforming to an object position relation constraint model, an object scale relation constraint model, an object shielding relation constraint model and an object style relation constraint model.
The invention also provides an object and scene synthesis system based on the indoor scene, which synthesizes the object into the scene, comprising the following steps:
the image acquisition module is used for acquiring original images of the object and the scene;
The model construction module is used for constructing an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
The synthesizable position obtaining module obtains the synthesizable position of the object in the scene according to the object position relation constraint model;
the scale proportion obtaining module is used for obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
the shielding relation obtaining module is used for obtaining different shielding relations between the objects with different synthesizable positions and the objects in the scene according to the object shielding relation constraint model;
The synthesizing module carries out style rendering on the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
wherein, the model construction module includes:
The semantic segmentation network inputs an RGB image of an original image of a scene, outputs semantic tags corresponding to all pixel points, and the semantic tags are tags corresponding to objects in the scene;
The method comprises the steps of setting a semantic feature extraction network, setting a layout label of a scene, dividing the scene into corresponding layout areas, wherein the layout label comprises an image scene, a junction of a wall and a ground, a junction of the wall and a junction of the wall and a ceiling, the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic label, and the output is the layout label;
The scene layout estimation network shares the weights of the semantic segmentation network and the semantic feature extraction network to obtain layout labels corresponding to all pixel points;
And the synthesizable position limiting unit is used for limiting synthesizable positions of the objects in scene synthesis according to rationality of the object occupation, wherein the rationality of the object occupation means that a layout area can provide reasonable support for the objects.
In one embodiment, the model building module further builds an object style relationship constraint model for constraining the style of the object relative to the scene; the object and scene synthesis system based on the indoor scene further comprises a style fusion module, wherein the color style of the original image of the object is adjusted according to the object style relation constraint model, so that the object and the scene are naturally fused; and rendering the object with the color style adjusted at the selected synthesizable position according to the scale proportion and the shielding relation by the synthesis module to obtain a synthesized image of the object and the scene.
The embodiment of the object and scene synthesizing system based on the indoor scene is substantially the same as the embodiment of the object and scene synthesizing method based on the indoor scene, and will not be described herein.
The object and scene synthesis method and system based on indoor scene combine scene understanding with indoor image synthesis. And establishing three-dimensional structural representation of the indoor scene image through indoor scene layout estimation and semantic segmentation, and calculating probability distribution of synthesizable positions of ground areas in the scene image, wherein the probability distribution is used as a sampling basis of the synthesizable positions. On the basis, the scale size and the shielding relation of the synthesized characters are constrained according to different synthesis positions. And establishing a style relation between the synthesized person and the scene image, and adjusting the style of the synthesized person to be consistent with the scene image by matching statistics such as brightness, saturation, contrast, color temperature and the like of the synthesized person and the scene image, so that the synthesized new image is more real. The character model is interfaced with the task of synthesizing the indoor scene image. On the one hand, the position to be synthesized can be designated, the rationality of the designated position is determined according to the probability of the synthesized position, and when the designated position is reasonable, a corresponding new image is synthesized; on the other hand, rendering angles of view of the 360 ° character model are provided, and real-time image synthesis with the scene image is completed.
The prior art can synthesize characters conforming to context information of a scene by using a common character-scene synthesis method, and has more real performance in geometric and appearance aspects. However, the existing method usually synthesizes characters as the foreground into the scene image, that is, the occlusion relationship between the characters and the scene objects is single, and the synthesis positions of the characters are not enough diversified. Therefore, through the global and local understanding work of the indoor scene image, a probability distribution model of the synthesizable position of the person is established, and the shielding phenomenon in perspective projection is realized by using the object pixel mask of the corresponding position in the scene image, so that the synthesizable person and the scene object have different shielding relations at different positions, and the synthesizable image is more real and reasonable.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and other manners of division may be implemented in practice.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.