Movatterモバイル変換


[0]ホーム

URL:


CN112488967B - Object and scene synthesis method and system based on indoor scene - Google Patents

Object and scene synthesis method and system based on indoor scene
Download PDF

Info

Publication number
CN112488967B
CN112488967BCN202011313114.3ACN202011313114ACN112488967BCN 112488967 BCN112488967 BCN 112488967BCN 202011313114 ACN202011313114 ACN 202011313114ACN 112488967 BCN112488967 BCN 112488967B
Authority
CN
China
Prior art keywords
scene
image
synthesizable
layout
constraint model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011313114.3A
Other languages
Chinese (zh)
Other versions
CN112488967A (en
Inventor
钟微
操奎
叶龙
方力
张勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of ChinafiledCriticalCommunication University of China
Priority to CN202011313114.3ApriorityCriticalpatent/CN112488967B/en
Publication of CN112488967ApublicationCriticalpatent/CN112488967A/en
Application grantedgrantedCritical
Publication of CN112488967BpublicationCriticalpatent/CN112488967B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention provides an object and scene synthesis method and system based on an indoor scene, comprising the following steps: obtaining an object and an original image of a scene; constructing an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model; obtaining a synthesizable position of the object in the scene according to the object position relation constraint model; obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model; obtaining different shielding relations between objects with different synthesizable positions and objects in a scene according to the object shielding relation constraint model; rendering the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene. The method and the system introduce scene understanding, so that the synthesized image is more real.

Description

Object and scene synthesis method and system based on indoor scene
Technical Field
The application relates to the technical field of image synthesis, in particular to an object and scene synthesis method and system based on an indoor scene.
Background
In the related studies and technologies of image composition, the currently mainstream methods can be classified into image composition based on a cut-and-paste method, image composition based on rendering, and image composition based on countermeasure learning. The method based on cut-and-paste is direct, the object to be synthesized is selected and enhanced to a certain extent through the feature learning of the scene image, and the object is pasted into the target background, so that the image synthesis process of the method has a certain randomness. Rendering-based image synthesis methods generally have better fusion effects, but such methods require explicit reconstruction of the geometric and illumination information of the background image. In the case of image synthesis based on countermeasure learning, such methods tend not to easily control the process of image synthesis because the input variables of the generator are random. Therefore, although the current image synthesis method can naturally fuse the synthesized object and the scene image in terms of geometry, appearance and the like, there are two problems in most cases: (1) The lack of the layout estimation work for indoor scenes ignores probability distribution of synthesizable positions in the scenes, so that the situation of multiple synthesizable positions is difficult to process; (2) The relation constraint model between the position and the scale of the synthesized object and between the synthesized object and the occlusion is ignored (different synthesized positions constrain the scale of the synthesized object and different occlusion relations), so that the occlusion relation between the synthesized person and the scene object is single.
Disclosure of Invention
In view of the foregoing, it is an object of the present application to provide an object and a scene synthesizing method based on an indoor scene, synthesizing an object into a scene, including:
obtaining an original image of an object and a scene;
An object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model are constructed, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
Obtaining a synthesizable position of the object in the scene according to the object position relation constraint model;
Obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
Obtaining different shielding relations between objects with different synthesizable positions and objects in a scene according to the object shielding relation constraint model;
Rendering the object with the scale proportion at a corresponding synthesizable position according to a corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
The step of constructing the object position relation constraint model comprises the following steps:
Constructing a semantic segmentation network, wherein the input of the semantic segmentation network is an RGB image of an original image of a scene, the output of the semantic segmentation network is a semantic label corresponding to each pixel point, and the semantic label is a label corresponding to an object in the scene;
Constructing a semantic feature extraction network, setting a layout label of a scene, dividing the scene into corresponding layout areas, wherein the layout label comprises an image scene, a junction of a wall and the ground, a junction of the wall and a junction of the wall and a ceiling, the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic label, and the output is the layout label;
constructing a scene layout estimation network, sharing weights of a semantic segmentation network and a semantic feature extraction network, and obtaining layout labels corresponding to all pixel points;
the synthesizable positions of the objects in scene synthesis are defined according to the rationality of the object placements, which means that the layout area is able to provide reasonable support for the objects.
Preferably, it comprises:
constructing an object style relation constraint model, wherein the object style relation constraint model is used for constraining the style of an object relative to a scene;
According to the object style relation constraint model, the color style of the original image of the object is adjusted, so that the object and the scene are naturally fused;
And rendering the object with the color style adjusted at the selected synthesizable position according to the scale proportion and the shielding relation to obtain a synthesized image of the object and the scene.
Further, preferably, the step of constructing the object style relation constraint model includes:
Adjusting the brightness and saturation of the object so that the object is compatible with the scene style includes: establishing an image histogram of the object and the scene in a color space, and restricting the maximum consistency of the brightness and saturation of the image histogram of the object and the scene by moving the image histogram of the corresponding statistic of the object;
adjusting the local contrast and correlated color temperature of the object to make the object compatible with the scene style, comprising: and the appearance of the object is adjusted by combining the local contrast and the correlated color temperature with the HSL channel, so that the object is compatible with the scene image style.
Further, preferably, the step of adjusting the brightness and saturation of the object so that the object is compatible with the scene style includes:
obtaining an RGB image of an original image of a scene;
converting an RGB image of an original image of a scene into an HSL color space image to obtain a brightness histogram and a hue-saturation histogram corresponding to the scene;
rendering the object into a two-dimensional RGB image with regular edge contours;
converting the RGB image of the object into an HSL color space image, extracting effective pixels corresponding to the object by using a mask, and obtaining a brightness histogram and a hue-saturation histogram of the object corresponding to the effective pixels;
Adjusting the brightness histogram of the object corresponding to the effective pixel to make the brightness of the object consistent with that of the scene;
And adjusting the hue-saturation histogram of the object corresponding to the effective pixel so that the saturation of the object and the saturation of the scene are consistent.
Furthermore, preferably, the step of adjusting the local contrast and correlated color temperature of the subject such that the subject is compatible with the scene style comprises:
obtaining an RGB image of an original image of a scene;
rendering the object into a two-dimensional RGB image with regular edge contours;
converting the RGB images of scenes and objects into HSL color space images;
for each pixel point of the HSL color space image of the effective pixel corresponding to the scene and the mask extraction object, the local contrast of each pixel point is obtained by the following formula
Wherein Lx is the pixel brightness,Is the local average luminance at pixel x;
The S curve is utilized to realize the pixel-level brightness conversion of the HSL color space image of the effective pixel corresponding to the object, and the forward S curve and the reverse S curve are respectively used for enhancing and reducing the local contrast, so that the contrast of the object and the scene is matched;
And adjusting the correlated color temperature of the HSL color space image of the effective pixel corresponding to the object, so that the correlated color temperatures of the object and the scene are consistent.
Preferably, the step of constructing the object position relation constraint model further includes:
obtaining edge nodes corresponding to layout labels of a connection part of a wall and a floor and a connection part of the wall and a ceiling comprises the following steps: expanding edge pixels of a layout output image of a scene layout estimation network into one-dimensional vectors, and taking middle point pixels of each edge section as edge nodes;
Obtaining an intermediate node corresponding to a layout label at a wall and a wall connection, comprising: detecting a vertical line segment between walls by using Hough transformation, wherein the end point of the line segment is a middle node;
and connecting the edge node and the intermediate node to generate a layout structure diagram of the scene.
Preferably, the step of constructing the object scale relation constraint model includes: the scale of the object takes the wall height of the indoor scene as a reference, the distance between the object and the wall is an independent variable, and the scale of the object is a dependent variable and changes linearly.
Further, preferably, the step of constructing an object scale relation constraint model includes:
the dimensional relationship between the object and the scene is constructed by the following formula with reference to the wall height or/and the wall width,
Sobj=λSwall+δ(S-S2),
Where Swall is the height or width of the wall, Sobj is the size of the object corresponding to the height or width of the wall, λ is the scale factor of Sobj of the object and Swall of the wall, S2 is the value of one end point of the height or width of the wall closest to the object in the dimension of the height or width, S is the value of the synthesizable position of the object in the dimension, and δ is the scale factor.
Preferably, the step of constructing an occlusion relationship constraint model includes:
performing mask extraction on an object in a layout area where a synthesizable position of the object is located;
Judging whether the synthesizable position of the object is above the object pixel;
if the synthesizable location is above an object pixel, the object occludes the object;
if the synthesizable location is not above the object pixel, the object is not occluded by the object.
Further, preferably, the step of extracting the mask of the object in the layout area where the synthesizable position of the object is located includes:
Extracting an object area of a layout area where the synthesizable position of the object is located by a mask;
Calculating a gradient of the object region;
and narrowing the object mask pixel by pixel until stopping at the local gradient maximum value, obtaining the mask edge of the object, and outputting the mask of the object.
Preferably, the step of obtaining a composite image of the object and the scene further comprises:
Taking the synthesized image as an initial synthesized image, extracting objects in the initial synthesized image by using a mask, wherein the initial synthesized image except the objects is a scene image;
Obtaining an HSL color space image of the scene image and the object;
Converting the brightness and saturation into logarithmic domain according to weber's law, and using micro-inversion as a unit of correlated color temperature to obtain brightness, saturation, local contrast and correlated color temperature statistics of HSL color space images of the object and scene images;
adjusting the local contrast of the object through an S curve;
Adjusting the brightness of the object so that the brightness of the object is consistent with the brightness of the scene image;
Adjusting the correlated color temperature of the object so that the correlated color temperatures of the object and the scene image are consistent;
adjusting the saturation of the object so that the saturation of the object and the saturation of the scene image are consistent;
And re-rendering the adjusted synthesizable position of the object in the scene image to obtain an adjusted synthesized image.
Preferably, the method further comprises:
providing a user interaction interface for the client, wherein the user interaction interface provides multiple scenes, multiple objects and multiple synthesizable positions for the client to select, and feeds back rationality of synthesizable positions of the objects selected by the client in the scenes, and the rationality refers to conforming to an object position relation constraint model, an object scale relation constraint model, an object shielding relation constraint model and an object style relation constraint model.
Preferably, the step of constructing the object position relation constraint model further includes:
based on the pixel region of the synthesizable positions, the probability of each synthesizable position is obtained by
Wherein (x, y, 0) represents one synthesizable position, and S is a pixel region composed of all synthesizable positions.
According to another aspect of the present invention, there is provided an object and scene synthesizing system based on an indoor scene, synthesizing an object into a scene, comprising:
the image acquisition module is used for acquiring original images of the object and the scene;
The model construction module is used for constructing an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
The synthesizable position obtaining module obtains the synthesizable position of the object in the scene according to the object position relation constraint model;
the scale proportion obtaining module is used for obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
the shielding relation obtaining module is used for obtaining different shielding relations between the objects with different synthesizable positions and the objects in the scene according to the object shielding relation constraint model;
The synthesizing module carries out style rendering on the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
wherein, the model construction module includes:
The semantic segmentation network inputs an RGB image of an original image of a scene, outputs semantic tags corresponding to all pixel points, and the semantic tags are tags corresponding to objects in the scene;
The method comprises the steps of setting a semantic feature extraction network, setting a layout label of a scene, dividing the scene into corresponding layout areas, wherein the layout label comprises an image scene, a junction of a wall and a ground, a junction of the wall and a junction of the wall and a ceiling, the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic label, and the output is the layout label;
The scene layout estimation network shares the weights of the semantic segmentation network and the semantic feature extraction network to obtain layout labels corresponding to all pixel points;
And the synthesizable position limiting unit is used for limiting synthesizable positions of the objects in scene synthesis according to rationality of the object occupation, wherein the rationality of the object occupation means that a layout area can provide reasonable support for the objects.
According to the method and the system for synthesizing the object and the scene based on the indoor scene, the object can be synthesized at different positions in an indoor scene image, a plurality of new images with shielding conditions can be generated, scene understanding (analyzing the original scene image structure and object information to understand the layout of the scene and the position relation of the object) is introduced, probability distribution of synthesizable positions in the indoor scene image is analyzed through scene layout estimation and semantic segmentation, and a parameterized model between the synthesizable positions, the character scale and the shielding relation is established, so that the synthesized image is more diversified. In the image blending part, the color style information of the scene image and the object is analyzed, the color style of the foreground object is adjusted to achieve the effect of natural fusion with the scene image, and the synthesized image is more real by adjusting the character style to be compatible with the scene image.
Drawings
FIG. 1 is a schematic diagram of an object and scene composition method based on indoor scenes of the present invention;
FIG. 2 is a schematic diagram of a semantic segmentation network according to the present invention;
FIG. 3 is a schematic diagram of a semantic feature extraction network and a scene layout estimation network according to the present invention;
FIG. 4 is a schematic diagram of a layout tag obtained from different scenes through a scene layout estimation network;
FIG. 5 is a schematic illustration of a synthesizable location;
FIG. 6 is a schematic diagram of optimizing the output of a layout estimation network in accordance with the present invention;
FIG. 7a is a schematic diagram of scale distortion caused by inappropriate scale factors for objects and scenes;
FIG. 7b is a schematic illustration of a composite image of different scale factors;
FIG. 8 is a schematic diagram of constructing an occlusion relationship constraint model in accordance with the present invention;
FIG. 9 is a schematic diagram of mask extraction of an object in a layout area where a synthesizable location of the object is located according to the present invention;
FIG. 10 is a schematic diagram illustrating one embodiment of adjusting hue and saturation of an object to make the object compatible with a scene style in accordance with the present invention;
FIG. 11 is a schematic illustration of a forward S-curve and a reverse S-curve;
fig. 12 is a schematic diagram of an object and scene composition method of a UI-based indoor scene.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Specific embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an object and scene synthesis method based on an indoor scene according to the present invention, which synthesizes an object into a scene as shown in fig. 1, including:
step S1, obtaining an original image of an object and a scene;
S2, an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model are constructed, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
S3, obtaining a synthesizable position of the object in the scene according to the object position relation constraint model;
S4, obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
S5, obtaining different shielding relations between objects with different synthesizable positions and objects in a scene according to the object shielding relation constraint model;
Step S7, performing style rendering on the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
The step of constructing the object position relation constraint model comprises the following steps:
Constructing a semantic segmentation network, wherein the input of the semantic segmentation network is an RGB image of an original image of a scene, the output of the semantic segmentation network is a semantic label corresponding to each pixel point, the semantic label is a label corresponding to an object in the scene, as shown in fig. 2, a full convolution network is trained by using SUNRGBD data with semantic labels of indoor objects, 37-level semantic segmentation of a dataset SUNRGBD is realized, and the 37 categories can cover objects and furniture which are usually present in the indoor scene, such as walls, ceilings, chairs or windows, so as to describe a disordered scene to the maximum extent, and a convolution layer with a hole mechanism is added to the semantic segmentation network on the basis of the original ResNet-101. Since the input to this network is an RGB image, it is actually a random variable X with a value range of 0, 255, which is determined by an implicit random variable Y, which takes the value of the semantic tag 1,2,3, 37, corresponding to the 37 categories. Thus, this network describes a posterior probability distribution P (y|x);
Constructing a semantic feature extraction network, setting a layout tag of a scene, dividing the scene into corresponding layout areas, wherein the layout tag comprises a joint of a wall and a ground, a joint of the wall and a ceiling, and an image scene (an area of a non-joint), the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic tag, and the output is the layout tag, for example, the layout tag is:
S={bg,wf,ww,wc}
Wherein bg, wf, ww, wc represent image scenes (background, bg), wall-floor edge, wf, wall-wall edge, ww, and wall-ceiling edge, wc, respectively; as shown in fig. 3, the semantic feature extraction network uses each pixel in the semantic features as a sample, uses the layout label corresponding to LSUN datasets as a supervision item, and trains a37×4 full-connection layer to learn the gap between the semantic features of the 37 channels and the class 4 edge labels. At this stage, Y is determined by a hidden random variable Z, which takes the value in the edge tag [1,2,3,4], P (Z|Y) is a parameterized representation of the room layout and scene clutter;
Constructing a scene layout estimation network, sharing weights of a semantic segmentation network and a semantic feature extraction network to obtain layout labels corresponding to all pixel points, reconstructing a learned 37×4 full-connection layer into a1×1×37×4 convolution layer as shown in fig. 3, adding the convolution layer into a semantic segmentation pre-training network in a first stage, synthesizing topological structures of the semantic segmentation network and the semantic feature extraction network, combining a pixel-level layout estimation network, taking the semantic segmentation network weights as feature extractors, taking the semantic feature extraction network weights as classifiers, and constructing the layout estimation network by the following formula
P(Z|Y)P(Y|X)=P(Z|X)
On one hand, the layout estimation network can perform end-to-end fine tuning on the scene layout estimation network on LSUN data sets to perform layout edge prediction; on the other hand, it smartly combines the relationship between room clutter and scene layout. The layout edge prediction network may thus convert the input image into a two-dimensional representation of the final scene layout. The result of the proposed scene layout estimation network's performance on LSUN dataset is shown in fig. 4;
The synthesizable position of the object in the scene synthesis is defined according to the rationality of the object occupation, wherein the rationality of the object occupation refers to that the layout area can provide reasonable support for the object, for example, a person is taken as the object, the layout area of the synthesizable position is the ground, and the ground position in the scene layout is necessarily below the image, so that the reasonable position of the object synthesis can be divided according to the layout two-dimensional diagram, the ground area is randomly sampled, and the image synthesis point (synthesizable position) is obtained, and the schematic diagram is shown in fig. 5.
The foreground is a person or object (front, blocking others) near the lens, and the existing method usually synthesizes the person as the foreground into the scene image. Therefore, in the application, the synthesized characters are not only used as the prospect.
In step S2, as shown in fig. 6, the step of constructing the object position relationship constraint model further includes: optimizing the output of the layout estimation network to obtain more refined layout structure lines so as to perform more accurate scale constraint on the object, and specifically comprises the following steps:
Obtaining edge nodes corresponding to layout labels of a connection part of a wall and a floor and a connection part of the wall and a ceiling comprises the following steps: expanding edge pixels of a layout output image (such as a diagram on the left side in fig. 6) of the scene layout estimation network into one-dimensional vectors, and taking middle point pixels of each edge section as edge nodes;
Obtaining an intermediate node corresponding to a layout label at a wall and a wall connection, comprising: detecting a vertical line segment between walls by using Hough transformation, wherein the end point of the line segment is a middle node;
the edge nodes and the intermediate nodes are connected to generate a layout structure diagram of the scene (e.g., the right-hand diagram in fig. 6).
Preferably, the step of constructing the object position relation constraint model further includes:
based on the pixel region of the synthesizable positions, the probability of each synthesizable position is obtained by
Wherein (x, y, 0) represents one synthesizable position, and S is a pixel region composed of all synthesizable positions.
In step S2, the step of constructing an object scale relation constraint model includes:
the dimensional relationship between the object and the scene is constructed by the following formula with reference to the wall height or/and the wall width,
Sobj=λSwall+δ(S-S2)
Where Swall is the height or width of the wall, Sobj is the size of the object corresponding to the height or width of the wall, λ is the scale factor of Sobj of the object and Swall of the wall, S2 is the value of one endpoint of the height or width of the wall closest to the object in the dimension of the height or width, S is the value of the object in the dimension of the synthesizable position, and δ is the scale factor.
When the person is taken as an object, the layout area of the synthesizable position is ground, and the step of constructing the object scale relation constraint model comprises the following steps: the scale of the object is linear with the wall height of the indoor scene as a reference, the distance between the object and the wall as a dependent variable, that is, as shown in fig. 6, the vertical layout line ww in the scene is detected through hough transformation, the pixel coordinates of the two end points are set to be (x1,y1) and (x2,y2), the image area below the layout line wf is set to be F, the position (x, y) e F for restricting the synthesis of the foreground object is defined, the scale is scaled with the height of the object as a reference, and the linear relationship between the character scale and the character distance is assumed, namely:
Hobj=λHwall+δ(y-y2)
Wherein Hobj is the height of the object, and Hwall is the height of the wall.
In a preferred embodiment, λ is estimated empirically to be approximately 0.6 and a comparison experiment is performed with respect to the different δ, as shown in fig. 7a, when δ=0.6, the synthesized person is smaller in scale; when δ=1.8, the synthesized person is large in size. Therefore, when the value of δ is too small or too large, there is a dimensional inadequacy, preferably, when 1.0.ltoreq.δ.ltoreq.1.4, the synthesized foreground character has a relatively real representation on the scale of the scene, as shown in fig. 7b, which is a synthesized graph when δ=1.0, δ=1.2, and δ=1.4.
In step S2, as shown in fig. 8, the step of constructing an occlusion relationship constraint model includes:
performing mask extraction on an object in a layout area where a synthesizable position of the object is located;
Judging whether the synthesizable position of the object is above the object pixel;
if the synthesizable location is above an object pixel, the object occludes the object;
if the synthesizable location is not above the object pixel, the object is not occluded by the object.
The occlusion relation constraint model adopts a mode of covering an object mask, and the object mask is a binary image obtained by scene semantic segmentation.
Because the error of the semantic label and the generalization performance of the network model are limited, the semantic segmentation graph of the object cannot be completely accurate, and especially in the image synthesis process, the pixel-level error brings serious distortion, namely, a relatively obvious gap exists at the connection part of the object and the object, so as to solve the problem that as shown in fig. 9, the step of extracting the mask of the object in the layout area where the synthesizable position of the object is located comprises the following steps:
Extracting an object area of a layout area where the synthesizable position of the object is located by a mask;
Calculating the gradient of the object region, the gradient at (x, y) in the object region being:
i.e. the gradient at (x, y) in the object region is a vector, the magnitude and direction of this vector are respectively:
wherein,Representing the gradient magnitude, α (x, y) representing the gradient direction;
the gradient direction may reflect where the image gray value changes most, so the gradient value of the pixel edge of the object mask is locally greatest. Therefore, calculating the pixel gradient value of the local area of the object of the scene image, wherein the local gradient is the reasonable connection position, shrinking the object mask pixel by pixel until the local gradient is stopped at the maximum value, obtaining the mask edge of the object, outputting the mask of the object, indicating that the new mask edge accords with the real edge of the object of the scene, and having more real visual effect when the shielding relation is processed.
And establishing a relation model of the scene image and the object through scene understanding, so that geometric factors such as the position, the scale and the like of object synthesis are reasonably controlled, and further, image synthesis with various interaction relations is realized. However, not only is a true composite image reasonable in geometric relationship, but also the object and scene images are compatible with each other to achieve the effect of natural fusion, so the above-mentioned object and scene composition method further includes:
In the step S2, an object style relation constraint model is also constructed, and the object style relation constraint model is used for constraining the style of an object relative to a scene;
S6, adjusting the color style of the original image of the object according to the object style relation constraint model so that the foreground object and the scene are naturally fused;
and S7, rendering the object with the color style adjusted at the selected synthesizable position according to the scale proportion and the shielding relation to obtain a synthesizable image of the object and the scene.
Preferably, the step of constructing the object style relation constraint model includes:
Adjusting the brightness and saturation of the object so that the object is compatible with the scene style includes: establishing an image histogram of the object and the scene in a color space, and restricting the maximum consistency of the brightness and saturation of the image histogram of the object and the scene by moving the image histogram of the corresponding statistic of the object;
adjusting the local contrast and correlated color temperature of the object to make the object compatible with the scene style, comprising: the appearance of the object is adjusted by combining the local contrast and the correlated color temperature with the HSL channel, so that the object is compatible with the style of the original image.
In one embodiment, as shown in fig. 10, the step of adjusting the brightness and saturation of the object to make the object compatible with the scene style includes:
obtaining an RGB image of an original image of a scene and an object;
Converting the RGB image into an HSL color space image, to obtain a luminance histogram and a Hue-Saturation histogram corresponding to the object and the scene, where H, S, L in the HSL represent Hue, saturation, and brightness (Hue, saturation, light), (r, g, b) are the red, green, and blue channel coordinates of a certain pixel point of the RGB image, specifically: converting an RGB image of an original image of a scene into an HSL color space image to obtain a brightness histogram and a hue-saturation histogram corresponding to the scene; rendering an object into a two-dimensional RGB image with regular edge contours, converting the RGB image of the object into an HSL color space image, extracting effective pixels corresponding to the object by using masks, acquiring a brightness histogram and a hue-saturation histogram of the object corresponding to the effective pixels, and normalizing the RGB values of the image to be within a range of [0,1] for the convenience of calculation, wherein the values of r, g and b are real numbers between 0 and 1. Let max be the maximum value of r, g, b, min be the minimum value of r, g, b. To find the (h, s, l) value in HSL space, where h ε [0, 360 is hue angle, s, l ε [0,1 is saturation and luminance, and the conversion relationship of RGB and HSL modes is:
brightness control and contrast adjustment are carried out by utilizing a brightness histogram corresponding to the object, so that the brightness and the contrast of the object are consistent with those of the scene;
And adjusting the saturation by using the hue-saturation histogram corresponding to the object, so that the saturation of the object is consistent with that of the scene.
In one embodiment, the step of adjusting the local contrast and correlated color temperature of the subject to make the subject compatible with the scene style comprises:
obtaining an RGB image of an original image of a scene;
rendering the object into a two-dimensional RGB image with regular edge contours;
converting the RGB images of scenes and objects into HSL color space images;
for each pixel point of the HSL color space image of the effective pixel corresponding to the scene and the mask extraction object, the local contrast of each pixel point is obtained by the following formula
Wherein Lx is the pixel brightness,Is the local average luminance at pixel x;
The S curve is utilized to realize the pixel-level brightness conversion of the HSL color space image of the effective pixel corresponding to the object, and the forward S curve and the reverse S curve are respectively used for enhancing and reducing the local contrast, so that the contrast of the object and the scene is matched;
And adjusting the correlated color temperature of the HSL color space image of the effective pixel corresponding to the object, so that the correlated color temperatures of the object and the scene are consistent.
Preferably, the step of obtaining a composite image of the object and the scene further comprises:
Taking the synthesized image as an initial synthesized image, extracting objects in the initial synthesized image by using a mask, wherein the initial synthesized image except the objects is a scene image;
Obtaining an HSL color space image of the scene image and the object;
Converting the brightness and saturation into the logarithmic domain according to weber's law, using the micro-inversion as a unit of correlated color temperature, obtaining brightness, saturation, local contrast, and correlated color temperature statistics of the HSL color space image of the subject and scene images, comprising:
luminance and saturation are converted into the logarithmic domain according to weber's law, i.e. the luminance space is represented by log2 Y (where y= [ epsilon, 1.0], epsilon = 3.03 x 10-4 is used to prevent undefined logarithmic values); the saturation channel of the HSL color space is represented by log2 S (s= [ epsilon, 1.0 ]); h represents the hue value of the cycle in the range of [0.0,1.0 ];
taking the micro-fall (mired) as a unit of correlated color temperature (Correlated Color Temperature, CCT):
K is Kelvin temperature, natural lighting range is between [1500, 20000], and the correlated color temperature of the image is calculated by using OptProp tools;
The local contrast of the object is adjusted by an S-curve, as shown in fig. 11, brightness conversion at the pixel level is achieved by using an S-curve, and a forward S-curve and a reverse S-curve are used to enhance and reduce the local contrast, respectively, and the pixel brightness Lin is mapped to Lout by the S-curve, and in the same image, the S-curve of the same parameter is used. Wherein the starting point of the S-curve is P0 = (0, 0), the ending point is P1 = (1, 1), the turning point Pm is the average luminance value, and the point divides the S-curve into two upper and lower sub-curves, each of which is a bezier curve with 3 anchor points (BezierCurve). Pm,PU,P1 is 3 anchor points of the upper segment sub-curve, Pm,PD,P0 is 3 anchor points of the lower segment sub-curve, the upper segment sub-curve and the lower segment sub-curve are respectively controlled by the anchor points, and the conversion degree of the S curve is determined by PU and PD:
PU=P11+α(P12-P11),
PD=P01+α(P02-P01).
Wherein, alpha is a parameter for controlling the bending direction of the S curve, and when alpha is less than 0.5, the S curve for increasing the local contrast is obtained; when alpha is more than 0.5, the inverse S curve for reducing the local contrast is obtained. α=0.5 then degenerates into a straight line. In the experiment, alpha is set in [0.4,0.6], and the value of alpha is continuously searched to find a proper curve, so that the contrast ratio of a foreground person and a scene image is matched;
the brightness, saturation, correlated color temperature and local contrast of the image are taken as adjustment targets, and the statistics of each index are calculated and recorded asAndFor example, the number of the cells to be processed,A correlated color temperature value representing an image of the scene,
Adjusting local contrast of an object by means of an S-curve
Adjusting the brightness of the object so that the brightness of the object and the scene image are consistent,
Adjusting the correlated color temperature of the subject, so that the correlated color temperatures of the subject and the scene image coincide,
Adjusting the saturation of the object, so that the saturation of the object and the scene image is consistent,
And re-rendering the adjusted synthesizable position of the object in the scene image to obtain an adjusted synthesized image.
In the above object and scene composition method, the contrast adjustment step is placed before the other adjustment step in order to prevent other indexes from being affected after the contrast adjustment.
In a preferred embodiment, as shown in fig. 1 and 12, further comprising:
providing a user interaction interface for the client, wherein the user interaction interface provides multiple scenes, multiple objects and multiple synthesizable positions for the client to select, and feeds back rationality of synthesizable positions of the objects selected by the client in the scenes, and the rationality refers to conforming to an object position relation constraint model, an object scale relation constraint model, an object shielding relation constraint model and an object style relation constraint model.
The invention also provides an object and scene synthesis system based on the indoor scene, which synthesizes the object into the scene, comprising the following steps:
the image acquisition module is used for acquiring original images of the object and the scene;
The model construction module is used for constructing an object position relation constraint model, an object scale relation constraint model and an object shielding relation constraint model, wherein the object position relation constraint model is used for constraining the synthesizable position of an object in a scene, the object scale relation constraint model is used for constraining the scale proportion of the object relative to the scene, and the object shielding relation constraint model is used for constraining the shielding relation between the object and the object in the scene;
The synthesizable position obtaining module obtains the synthesizable position of the object in the scene according to the object position relation constraint model;
the scale proportion obtaining module is used for obtaining the scale proportion of the object relative to the scene according to the object scale relation constraint model;
the shielding relation obtaining module is used for obtaining different shielding relations between the objects with different synthesizable positions and the objects in the scene according to the object shielding relation constraint model;
The synthesizing module carries out style rendering on the object with the scale proportion at the corresponding synthesizable position according to the corresponding shielding relation to obtain a synthesized image of the corresponding object and the scene;
wherein, the model construction module includes:
The semantic segmentation network inputs an RGB image of an original image of a scene, outputs semantic tags corresponding to all pixel points, and the semantic tags are tags corresponding to objects in the scene;
The method comprises the steps of setting a semantic feature extraction network, setting a layout label of a scene, dividing the scene into corresponding layout areas, wherein the layout label comprises an image scene, a junction of a wall and a ground, a junction of the wall and a junction of the wall and a ceiling, the layout areas comprise the ground, the wall and the ceiling, the input of the semantic feature extraction network is the semantic label, and the output is the layout label;
The scene layout estimation network shares the weights of the semantic segmentation network and the semantic feature extraction network to obtain layout labels corresponding to all pixel points;
And the synthesizable position limiting unit is used for limiting synthesizable positions of the objects in scene synthesis according to rationality of the object occupation, wherein the rationality of the object occupation means that a layout area can provide reasonable support for the objects.
In one embodiment, the model building module further builds an object style relationship constraint model for constraining the style of the object relative to the scene; the object and scene synthesis system based on the indoor scene further comprises a style fusion module, wherein the color style of the original image of the object is adjusted according to the object style relation constraint model, so that the object and the scene are naturally fused; and rendering the object with the color style adjusted at the selected synthesizable position according to the scale proportion and the shielding relation by the synthesis module to obtain a synthesized image of the object and the scene.
The embodiment of the object and scene synthesizing system based on the indoor scene is substantially the same as the embodiment of the object and scene synthesizing method based on the indoor scene, and will not be described herein.
The object and scene synthesis method and system based on indoor scene combine scene understanding with indoor image synthesis. And establishing three-dimensional structural representation of the indoor scene image through indoor scene layout estimation and semantic segmentation, and calculating probability distribution of synthesizable positions of ground areas in the scene image, wherein the probability distribution is used as a sampling basis of the synthesizable positions. On the basis, the scale size and the shielding relation of the synthesized characters are constrained according to different synthesis positions. And establishing a style relation between the synthesized person and the scene image, and adjusting the style of the synthesized person to be consistent with the scene image by matching statistics such as brightness, saturation, contrast, color temperature and the like of the synthesized person and the scene image, so that the synthesized new image is more real. The character model is interfaced with the task of synthesizing the indoor scene image. On the one hand, the position to be synthesized can be designated, the rationality of the designated position is determined according to the probability of the synthesized position, and when the designated position is reasonable, a corresponding new image is synthesized; on the other hand, rendering angles of view of the 360 ° character model are provided, and real-time image synthesis with the scene image is completed.
The prior art can synthesize characters conforming to context information of a scene by using a common character-scene synthesis method, and has more real performance in geometric and appearance aspects. However, the existing method usually synthesizes characters as the foreground into the scene image, that is, the occlusion relationship between the characters and the scene objects is single, and the synthesis positions of the characters are not enough diversified. Therefore, through the global and local understanding work of the indoor scene image, a probability distribution model of the synthesizable position of the person is established, and the shielding phenomenon in perspective projection is realized by using the object pixel mask of the corresponding position in the scene image, so that the synthesizable person and the scene object have different shielding relations at different positions, and the synthesizable image is more real and reasonable.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and other manners of division may be implemented in practice.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Claims (8)

CN202011313114.3A2020-11-202020-11-20Object and scene synthesis method and system based on indoor sceneActiveCN112488967B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011313114.3ACN112488967B (en)2020-11-202020-11-20Object and scene synthesis method and system based on indoor scene

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011313114.3ACN112488967B (en)2020-11-202020-11-20Object and scene synthesis method and system based on indoor scene

Publications (2)

Publication NumberPublication Date
CN112488967A CN112488967A (en)2021-03-12
CN112488967Btrue CN112488967B (en)2024-07-09

Family

ID=74932622

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011313114.3AActiveCN112488967B (en)2020-11-202020-11-20Object and scene synthesis method and system based on indoor scene

Country Status (1)

CountryLink
CN (1)CN112488967B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112949755B (en)*2021-03-292022-09-13中国科学院合肥物质科学研究院 An OCR Data Synthesis Method Based on Image Structure Information
CN113239785A (en)*2021-05-112021-08-10百安居信息技术(上海)有限公司Method, system, storage medium and electronic device for automatically identifying house type based on convolutional neural network
CN114092962B (en)*2021-10-112025-09-23中国传媒大学 Image blending method and system based on semantic understanding of foreground characters
CN114897962A (en)*2022-03-312022-08-12联想(北京)有限公司 An image processing method and device
CN116196629A (en)*2022-12-122023-06-02珠海沙盒网络科技有限公司Game model placement method, game model placement device, electronic equipment, medium and product

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102436666A (en)*2011-08-312012-05-02上海大学Object and scene fusion method based on IHS (Intensity, Hue, Saturation) transform
CN103761734A (en)*2014-01-082014-04-30北京航空航天大学Binocular stereoscopic video scene fusion method for keeping time domain consistency

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10297070B1 (en)*2018-10-162019-05-21Inception Institute of Artificial Intelligence, Ltd3D scene synthesis techniques using neural network architectures
CN109447897B (en)*2018-10-242023-04-07文创智慧科技(武汉)有限公司Real scene image synthesis method and system
CN109636905B (en)*2018-12-072023-01-24东北大学Environment semantic mapping method based on deep convolutional neural network
US20200242771A1 (en)*2019-01-252020-07-30Nvidia CorporationSemantic image synthesis for generating substantially photorealistic images using neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102436666A (en)*2011-08-312012-05-02上海大学Object and scene fusion method based on IHS (Intensity, Hue, Saturation) transform
CN103761734A (en)*2014-01-082014-04-30北京航空航天大学Binocular stereoscopic video scene fusion method for keeping time domain consistency

Also Published As

Publication numberPublication date
CN112488967A (en)2021-03-12

Similar Documents

PublicationPublication DateTitle
CN112488967B (en)Object and scene synthesis method and system based on indoor scene
US20230377287A1 (en)Systems and methods for selective image compositing
US5745668A (en)Example-based image analysis and synthesis using pixelwise correspondence
Roa'a et al.Generation of high dynamic range for enhancing the panorama environment
US5613048A (en)Three-dimensional image synthesis using view interpolation
Butler et al.Real-time adaptive foreground/background segmentation
US20050219249A1 (en)Integrating particle rendering and three-dimensional geometry rendering
Kompatsiaris et al.Spatiotemporal segmentation and tracking of objects for visualization of videoconference image sequences
CN101945223B (en)video consistency fusion processing method
CN115512036B (en)Editable novel view synthesis method based on intrinsic nerve radiation field
US8824801B2 (en)Video processing
JP2000512833A (en) Improving depth perception by integrating monocular cues
Roxas et al.Occlusion handling using semantic segmentation and visibility-based rendering for mixed reality
CN111652807B (en)Eye adjusting and live broadcasting method and device, electronic equipment and storage medium
US20240290059A1 (en)Editable free-viewpoint video using a layered neural representation
CN112700528A (en)Virtual object shadow rendering method for head-mounted augmented reality equipment
Pintore et al.Instant automatic emptying of panoramic indoor scenes
CN116740261B (en)Image reconstruction method and device and training method and device of image reconstruction model
CN118587344B (en)Indoor design result evaluation method and system based on virtual reality technology
JP2023153534A (en)Image processing apparatus, image processing method, and program
AU2006345533B2 (en)Multi-tracking of video objects
Wang et al.Look at the sky: Sky-aware efficient 3D Gaussian splatting in the wild
ParkInteractive 3D reconstruction from multiple images: A primitive-based approach
Lefèvre et al.Multiresolution color image segmentation applied to background extraction in outdoor images
Yang et al.Capsule based image synthesis for interior design effect rendering

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp