Disclosure of Invention
The application provides an image processing method and device, electronic equipment and a computer readable storage medium.
In a first aspect, there is provided an image processing method, the method comprising:
acquiring an image to be processed and a target image style, wherein the image to be processed comprises a target object;
and under the condition of keeping the gesture of the target object, converting the image style of the target object in the image to be processed into the target image style to obtain a target image.
In combination with any embodiment of the present application, under the condition of maintaining the pose of the target object, converting the image style of the target object in the image to be processed into the target image style, to obtain a target image, including:
extracting edge information of the target object from the image to be processed;
and under the condition that the edge information is used as an input condition of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style to obtain the target image.
In combination with any of the embodiments of the present application, after obtaining the target image, the method further includes:
acquiring a background image;
and replacing the pixel areas except the target object in the target image by using the background image to obtain a style image.
In combination with any one of the embodiments of the present application, the acquiring the background image includes:
and intercepting pixel areas except the target object from the image to be processed to obtain a background image.
In combination with any embodiment of the present application, the capturing, from the image to be processed, a pixel area except for the target object to obtain a background image includes:
expanding the target object in the image to be processed to obtain an expanded image;
and intercepting pixel areas except the target object from the expanded image to obtain the background image.
In combination with any embodiment of the present application, the replacing the pixel area except the target object in the target image with the background image to obtain a style image includes:
corroding the target object in the target image to obtain a corroded image;
and replacing pixel areas except the target object in the corroded image by using the background image to obtain the style image.
In combination with any one of the embodiments of the present application, the acquiring the background image includes:
acquiring a background text describing a background;
And generating a background image according to the background text.
In combination with any one embodiment of the present application, when the edge information is used as an input condition of a control network, converting, according to data output by the control network and the target image style, the image style of the image to be processed into the target image style, to obtain the target image includes:
acquiring a target text describing the target object;
and under the condition that the edge information and the target text are used as input conditions of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style, and obtaining the target image.
In combination with any one of the embodiments of the present application, the obtaining the target text describing the target object includes:
acquiring an image description generation model, wherein the image description generation model is used for generating a text describing an image;
and processing the image to be processed by using the image description generation model to generate the target text.
In combination with any of the embodiments of the present application, the target image style includes a predetermined animation style.
In connection with any of the embodiments of the present application, the target object includes a character.
In a second aspect, there is provided an image processing apparatus including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be processed and a target image style, and the image to be processed comprises a target object;
and the conversion unit is used for converting the image style of the target object in the image to be processed into the target image style under the condition of keeping the gesture of the target object to obtain a target image.
In combination with any one of the embodiments of the present application, the conversion unit is configured to:
extracting edge information of the target object from the image to be processed;
and under the condition that the edge information is used as an input condition of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style to obtain the target image.
In combination with any one of the embodiments of the present application, the acquiring unit is further configured to acquire a background image;
the image processing apparatus further includes: and the replacing unit is used for replacing the pixel areas except the target object in the target image by using the background image to obtain a style image.
In combination with any one of the embodiments of the present application, the obtaining unit is configured to:
and intercepting pixel areas except the target object from the image to be processed to obtain a background image.
In combination with any one of the embodiments of the present application, the obtaining unit is configured to:
expanding the target object in the image to be processed to obtain an expanded image;
and intercepting pixel areas except the target object from the expanded image to obtain the background image.
In combination with any one of the embodiments of the present application, the replacing unit is configured to:
corroding the target object in the target image to obtain a corroded image;
and replacing pixel areas except the target object in the corroded image by using the background image to obtain the style image.
In combination with any one of the embodiments of the present application, the obtaining unit is configured to:
acquiring a background text describing a background;
and generating a background image according to the background text.
In combination with any one of the embodiments of the present application, the conversion unit is configured to:
acquiring a target text describing the target object;
and under the condition that the edge information and the target text are used as input conditions of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style, and obtaining the target image.
In combination with any one of the embodiments of the present application, the conversion unit is configured to:
acquiring an image description generation model, wherein the image description generation model is used for generating a text describing an image;
and processing the image to be processed by using the image description generation model to generate the target text.
In combination with any of the embodiments of the present application, the target image style includes a predetermined animation style.
In connection with any of the embodiments of the present application, the target object includes a character.
In a third aspect, an electronic device is provided, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform a method as described in the first aspect and any one of its possible implementations.
In a fourth aspect, there is provided another electronic device comprising: a processor, a transmitting means, an input means, an output means and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the first aspect and any implementation thereof as described above.
In a fifth aspect, there is provided a computer readable storage medium having stored therein a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the first aspect and any implementation thereof as described above.
In a sixth aspect, there is provided a computer program product comprising a computer program or instructions which, when run on a computer, cause the computer to perform the first aspect and any embodiments thereof.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
In the application, after the image processing device acquires the image to be processed and the target image style, the image style of the target object in the image to be processed is converted into the target image style under the condition that the posture of the target object is maintained, so that the target image is obtained, more details of the target object can be maintained while the image style of the target object in the image to be processed is converted into the target image style, and further the probability of distortion of the target object caused by the conversion of the image style can be reduced, and the authenticity of the target object can be improved.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The interest and entertainment of the image can be increased by changing the image style of the main body in the image. In the prior art, the image style of a main body in an image is usually changed through an image generation technology, but on one hand, the technology is difficult to control the changing strength of the image style, and further, the detail of the main body is easily lost under the condition of overlarge changing strength of the image style, so that the main body is distorted, and on the other hand, the technology is used for changing the image style of the whole image, namely, changing the image style of the main body and changing the image style of a background.
Based on this, the embodiment of the application provides an image processing method, so that details of a main body are reserved under the condition of changing the image style of the main body, the reality of the main body is further improved, and the image style of the background is not influenced under the condition of changing the image style of the main body.
The execution body of the embodiment of the application is an image processing device, where the image processing device may be any electronic device capable of executing the technical scheme disclosed in the embodiment of the method of the application. Alternatively, the image processing apparatus may be one of the following: computer, server.
It should be understood that the method embodiments of the present application may also be implemented by way of a processor executing computer program code. Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. Referring to fig. 1, fig. 1 is a flowchart of an image processing method according to an embodiment of the present application.
101. And acquiring the to-be-processed image and the target image style.
In the embodiment of the present application, the image to be processed may be any image. For example, the image to be processed may contain a person, and for example, the image to be processed may contain an automobile, and for example, the image to be processed includes both a person and an automobile. The content contained in the image to be processed is not limited in the present application. The image to be processed comprises a target object, wherein the target object can be any object. For example, the target object is a person, and for example, the target object is an automobile, and for example, the target object is a building.
In this embodiment of the present application, the target image style may be any image style, for example, the target image style is a picket style, for example, the target image style is a national style, for example, and for example, the target image style is a waterjet style.
In one implementation of acquiring the image to be processed, the image processing device receives the image to be processed input by the user through the input component to acquire the image to be processed. The input assembly includes: keyboard, mouse, touch screen, touch pad and audio input device.
In another implementation manner of acquiring the image to be processed, the image processing device receives the image to be processed sent by the terminal to acquire the image to be processed. Alternatively, the terminal may be any of the following: cell phone, computer, tablet computer, server, wearable equipment.
In yet another implementation of acquiring an image to be processed, the image processing device is loaded with a camera assembly, optionally comprising a camera. The image processing apparatus acquires an image to be processed by using the image pickup assembly.
In one implementation of acquiring a target image style, an image processing device receives a target image style input by a user through an input component.
In another implementation of acquiring the target image style, the image processing apparatus receives the target image style transmitted by the terminal to acquire the target image style.
In this embodiment of the present application, the step of acquiring the image to be processed and the step of acquiring the style of the target image may be performed by the image processing apparatus simultaneously or separately, which is not limited in this application.
102. And under the condition of keeping the gesture of the target object, converting the image style of the target object in the image to be processed into the target image style to obtain a target image.
In the embodiment of the application, the gesture of the target object includes the outline of the target object, and the outline of the target object and the details of the target object can be reserved by reserving the gesture of the target object. In one possible implementation, the target object is a character, and the gesture of the target object further includes an expression of the target object, where by preserving the gesture of the target object, the outline of the target object and the expression of the target object may be preserved.
In one possible implementation, the image processing apparatus obtains an intermediate image from an image style of a target object in the image to be processed. And migrating the gesture of the target object in the image to be processed to the intermediate image to obtain a target image.
In another possible implementation, the image processing apparatus obtains the intermediate image from an image style of the target object in the image to be processed. And replacing the gesture of the target object in the intermediate image with the gesture of the target object in the image to be processed to obtain the target image.
In the embodiment of the application, after the image processing device acquires the image to be processed and the target image style, the image style of the target object in the image to be processed is converted into the target image style under the condition that the posture of the target object is maintained, so that the target image is obtained, more details of the target object can be maintained while the image style of the target object in the image to be processed is converted into the target image style, and further the probability of distortion of the target object caused by the conversion of the image style can be reduced, so that the authenticity of the target object can be improved.
In one possible implementation, the target object is a person and the target image style is a predetermined animation style. The image processing device can convert the image style of the person in the image to be processed into a preset cartoon style under the condition of keeping the gesture (such as the figure outline, the figure expression and other details) of the person in the image to be processed based on the technical scheme provided by the embodiment of the application. Optionally, the predetermined animation style is a pickles style.
For example, fig. 2 is a schematic diagram showing a comparison between an image to be processed and a target image. As shown in fig. 2, the left side of the boundary is an image to be processed, the image to be processed includes a person, the person is a target object, the right side of the boundary is a target image, a pixel area covered by the person in the target image is a predetermined animation style, and the gesture (such as a contour) of the person in the target image is the same as the gesture of the person in the image to be processed.
It should be understood that "1/3" in the upper right corner of fig. 2 indicates that the preview image is 3 in total, and the 1 st image in the preview image is currently displayed. The information below the image (including the portrait draft, the handsome and the handsome (head portrait is related to/is not used by guest), the criaine blue for the # photo, the criaine blue for the # head portrait draft, the # custom image, the # criaine blue, the # handsome, the # cartoon head portrait, the # picture and the # hand picture head portrait) are all label information of the image to be processed and the target image.
For another example, fig. 3 is a schematic diagram showing another comparison between an image to be processed and a target image. As shown in fig. 3, the small drawing in the upper left corner is an image to be processed, and the image to be processed includes a person, which is a target object, and in the image to be processed, the target object wears an apron and holds coffee. The large image in fig. 3 is a target image, the pixel area covered by the person in the target image is a predetermined cartoon style, and the gesture (such as outline) of the person in the target image is the same as the gesture of the person in the image to be processed, i.e. the person in the target image also holds the apron and holds the coffee, and the expression of the person in the target image is the same as the expression of the person in the image to be processed.
It should be understood that the rectangular blank area in fig. 2 and the rectangular blank area in fig. 3 are used to block the image content, and should not be construed as an effect of processing the image to be processed.
As an alternative embodiment, the image processing apparatus performs the following steps in performing step 102:
201. and extracting the edge information of the target object from the image to be processed.
The image processing device can detect the edge in the image to be processed by carrying out edge detection on the image to be processed, and further can extract the edge information of the target object from the image to be processed.
In one possible implementation, the image processing apparatus performs edge detection on an image to be processed using a canny (canny) edge detection algorithm, and extracts edge information of a target object from the image to be processed.
In another possible implementation manner, the image processing apparatus performs edge detection on an image to be processed using a sobel (sobel) edge detection algorithm, and extracts edge information of a target object from the image to be processed.
In another possible implementation manner, the image processing apparatus performs edge detection on an image to be processed by using a differential edge detection method, and extracts edge information of a target object from the image to be processed.
202. And converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style when the edge information is used as an input condition of the control network (control net), so as to obtain the target image.
In the embodiment of the application, the control network is a model with the capability of generating images. In one possible implementation, the control network may convert the image style of any image, in another possible implementation, the control network may generate an image matching the text based on the text, and in yet another possible implementation, the control network may generate image content matching the text in any image based on the text.
In step 202, the image processing apparatus uses the edge information of the target object in the image to be processed as an input condition of the control network, so that the control network uses the edge information of the target object in the image to be processed as a constraint condition of the image to be generated in the process of generating the image. In other words, in the image generated based on the control network, the edge information of the target object matches the edge information of the target object in the image to be processed. That is, in the case where the edge information of the target object in the image to be processed is taken as the input condition of the control network, the edge information of the target object, that is, the pose of the target object, can be retained based on the image style of the target object in the control network conversion image to be processed. Therefore, the image processing device obtains the target image by executing the step 202, so that the gesture of the target object in the target image can be matched with the gesture of the target object in the image to be processed, the probability of distortion of the target object caused by the conversion of the image style is reduced, and the authenticity of the target object can be improved.
As an alternative embodiment, the image processing apparatus further performs the following steps after obtaining the target image:
301. a background image is acquired.
In the embodiment of the application, the background image includes the background of the target object of the target image, i.e. the background of the target object in the target image can be determined based on the background image.
The background image may include any image content, and in one possible implementation, the background image includes image content except for a pixel area covered by the target object in the image to be processed, where the background of the target object in the target image is the image content except for the pixel area covered by the target object in the image to be processed, that is, the image style of the target object is only transformed compared to the image to be processed.
In another possible implementation, the image content in the background image is different from the image content in the image to be processed except for the pixel area covered by the target object, where the background of the target object in the target image is different from the background of the target object in the image to be processed, that is, the image style of the target object is transformed, and the background of the target object is transformed, compared to the image to be processed.
In one implementation of obtaining the background image, the image processing apparatus intercepts a pixel region other than the target object from the image to be processed, and obtains the background image. At this time, the background image is the background of the target object in the image to be processed.
As an alternative embodiment, the image processing apparatus obtains the background image by performing the steps of: and expanding the target object in the image to be processed to obtain an expanded image. And cutting out pixel areas except the target object from the expanded image to obtain the background image.
The image processing apparatus expands the area of the pixel region covered by the target object by expanding the target object in the image to be processed. In this embodiment, before the image processing apparatus intercepts the pixel area except the target object from the image to be processed, the target object in the image to be processed is inflated to obtain the inflated image, so that the probability of the pixel area except the target object containing the pixels with the semantics as the target object can be reduced, and then the pixel area except the target object is intercepted from the inflated image to be used as the background image, so that the probability of the background image containing the pixels with the semantics as the target object can be reduced.
In another implementation of obtaining the background image, the image processing device obtains a background text describing the background, and generates the background image according to the background text, wherein the image content of the background image is matched with the background text. For example, if the background text is a sunny day, then the image content of the background image is a sunny day, and for example, if the background text is a rainy day, then the image content of the background image is a rainy day, and for example, if the background text is a pylon, then the image content of the background image includes a pylon.
In yet another implementation of acquiring a background image, the image processing apparatus receives a background image input by a user through the input component.
In still another implementation manner of obtaining the background image, the image processing apparatus receives the background image sent by the terminal to obtain the background image.
302. And replacing the pixel areas except the target object in the target image by using the background image to obtain a style image.
The image processing apparatus obtains the target image by executing step 302, so that the background of the target object in the target image is the image content in the background image.
In this embodiment, the image processing apparatus replaces the background of the target object in the target image with the image content in the background image by replacing the pixel region other than the target object in the target image with the background image after the background image is acquired.
In an alternative embodiment, the image processing apparatus performs the following steps in performing step 302:
401. and corroding the target object in the target image to obtain a corroded image.
The image processing apparatus erodes the target object in the target image, thereby reducing the area of the pixel region covered by the target object.
402. And replacing the pixel areas except the target object in the corroded image by using the background image to obtain the style image.
In this embodiment, the image processing apparatus first erodes the target object in the target image, and reduces the pixel area covered by the target object to obtain the eroded image, thereby improving the matching degree between the size of the target object and the size of the background in the background image. The background image is used for replacing pixel areas except the target object in the corroded image, so that a style image is obtained, the matching degree of the size of the target object and the size of the background can be improved, and the style image can be more natural.
As an alternative embodiment, the image processing apparatus performs the following steps in performing step 202:
501. And acquiring target text describing the target object.
In the embodiment of the application, the target text is a text describing the target object. In one possible implementation, the target text may be text describing properties of the target object, e.g., the target object is a person, and the target text is text describing hair, glasses, mouth shape of the target object. For another example, the target object is a person, the target text is a young man, and the mobile phone is held by wearing black-frame glasses.
In one implementation of obtaining target text of a target object, an image processing device receives target text entered by a user through an input component.
In another implementation manner of acquiring the target text of the target object, the image processing device receives the target text sent by the terminal to acquire the target text.
In yet another implementation of obtaining the target text of the target object, the image processing apparatus generates the target text describing the target object based on the target object in the image to be processed. Optionally, the image processing apparatus acquires an image description generation model, wherein the image description generation model has a capability of generating text describing the image from the image, i.e. the image description generation model is used for generating the text describing the image. The image to be processed is processed using an image description generation model that generates a target text describing the target object, optionally the picture description generation model is a visual-language pre-training model (BLIP 2).
502. And under the condition that the edge information and the target text are taken as input conditions of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style to obtain the target image.
In step 502, the image processing apparatus uses the edge information and the target text of the target object in the image to be processed as input conditions of the control network, so that the control network uses the edge information and the target text of the target object in the image to be processed as constraint conditions of the image to be generated in the process of generating the image. In other words, in the image generated based on the control network, the edge information of the target object matches the edge information of the target object in the image to be processed, and the attribute of the target object matches the target text. That is, in the case where the edge information and the target text of the target object in the image to be processed are used as input conditions of the control network, the image style of the target object in the image to be processed is converted based on the control network, and the edge information of the target object (i.e., the pose of the target object) can be retained, or the attribute of the target object can be matched with the target text. Therefore, the image processing apparatus obtains the target image by executing step 502, so that the gesture of the target object in the target image can be matched with the gesture of the target object in the image to be processed, and the attribute of the target object can be matched with the target text, so that the probability of distortion of the target object caused by the conversion of the image style can be reduced, and the authenticity of the target object can be improved.
Based on the technical scheme provided by the embodiment of the application, the embodiment of the application also provides a possible implementation mode. Referring to fig. 4, fig. 4 is a flowchart of another image processing method according to an embodiment of the present application, and as shown in fig. 4, the input of the image processing method includes: the method comprises the steps of inputting a text, inputting a background text and inputting an image, wherein the input text is the target text, namely the input text is a text for describing a target object, the input background text is a text for describing the background of the target object in the target image, and the input image is the image to be processed.
If a user inputs text to the image processing apparatus through the input component, the image processing apparatus takes the input text as input text (i.e., target text). If the input text is empty, the image processing device processes the input image (i.e. the image to be processed) by using a capture generator to generate the input text, wherein the capture generates a model for the image description.
After the image processing device acquires the input image, the image processing device detects the edge of the input image by using a Canny detector (i.e., the Canny edge detection algorithm described above) to obtain a Canny edge map. The image processing apparatus uses a Canny edge map, an input image, and an input text as inputs to a conditional control network (ControlNet), which is the control network described above. And processing the Canny edge graph, the input image and the input text through a condition control network to obtain a condition control characteristic.
The image processing device generates style characteristics through a style embedding network (LoRA), wherein the image style characterized by the style characteristics is the target image style. After the condition control features and the wind grid features are obtained, the condition control features and the wind grid features are input into a basic generation network, wherein the basic generation network is used for generating images. The basic generation network generates an image according to the condition control characteristics and the style characteristics to obtain a generation diagram, wherein the generation diagram comprises a target object, the edge information of the target object in the generation diagram is matched with the edge information of the target object in the image to be processed, and the attribute of the target object is matched with the input text. That is, the pose of the target object in the generated map matches the pose of the target object in the image to be processed, and the attribute of the target object matches the input text.
After the generated image is obtained, the image processing apparatus processes the generated image using a main body segmentation model for segmenting the pixel region covered by the target object from the image, and determines the pixel region covered by the target object from the generated image.
It should be appreciated that by performing different exercises on the subject segmentation model, the subject segmentation model may be caused to segment different segmentation objects as target objects. For example, when the segmentation object in the training data is a person, the main body segmentation model is trained using the training data, and the main body segmentation model can segment the pixel region covered by the person from the image, and the pixel region covered by the person can be set as the pixel region covered by the target object. When the segmentation object in the training data is an automobile, the training data is used to train the main body segmentation model, so that the main body segmentation model can segment the pixel region covered by the automobile from the image, and the pixel region covered by the automobile is taken as the pixel region covered by the target object.
In one possible implementation, in a case where the number of divided objects divided from the input image by the subject division model is one or more, the image processing apparatus may take all the divided objects as the target objects, and the number of the target objects is one or more.
In another possible implementation, in a case where the number of divided objects divided from the input image by the subject division model is one or more, the image processing apparatus may set the divided object having the largest area as the target object, and the number of the target objects is one. For example, the subject segmentation model segments a segmented object a and a segmented object b from an input image, wherein the area of a pixel region covered by the segmented object a is larger than the area of a pixel region covered by the segmented object b, and the segmented object a is the segmented object having the largest area at this time, and therefore the image processing apparatus determines the segmented object a as the target object. In this possible implementation manner, in the case where the number of divided objects is one or more, the probability that the divided object with the largest area is the subject in the image is large, so the image processing apparatus uses the divided object with the largest area as the target object, and the probability that the target object is the subject in the image to be processed can be improved.
After determining the pixel area covered by the target object from the generated graph through the main body segmentation model, the image processing device erodes the pixel area covered by the target object to obtain a main body binary mask, wherein the main body binary mask is the binary mask of the target object in the generated graph. Specifically, in the main binary mask, the pixels in the pixel area covered by the target object have the same pixel value, the pixels outside the pixel area covered by the target object have the same pixel value, and the pixels in the pixel area covered by the target object and the pixels outside the pixel area covered by the target object have different pixel values. That is, based on the subject binary mask, pixels in the generated map that belong to the pixel region covered by the target object may be determined. Further, the image processing apparatus generates a background binary mask according to the main binary mask, wherein the background binary mask is a mask for generating pixels in a pixel region not covered by the target object in the map.
After the image processing device acquires the input background text, the input background text is processed by using a basic generation network to generate a background image, wherein the background image is matched with the background text.
Image processing after the input image is acquired, the input image is also processed by using the subject segmentation model, and the pixel region covered by the target object is determined from the input image. After determining the pixel area covered by the target object from the input image through the main body segmentation model, the image processing device expands the pixel area covered by the target object to obtain an original image main body binary mask, so that the mask of the target object can cover the probability of the target object, wherein the original image main body binary mask is the binary mask of the target object in the image to be processed. Specifically, in the original main body binary mask, the pixels in the pixel area covered by the target object have the same pixel value, the pixels outside the pixel area covered by the target object have the same pixel value, and the pixels in the pixel area covered by the target object and the pixels outside the pixel area covered by the target object have different pixel values. That is, based on the original subject binary mask, pixels belonging to the pixel region covered by the target object in the image to be processed can be determined.
After the original image main body binary mask is obtained, the image processing device can determine pixels in a pixel area covered by the target object in the image to be processed according to the original image main body binary mask, and erase the pixels in the pixel area covered by the target object in the image to be processed by using the Inpainting model, so as to obtain a main body eliminated image.
After obtaining the graph, the background graph and the background binary mask of the generated graph after the main body is eliminated, the background of the target object in the generated graph can be determined according to any one of the graph, the background graph and the background binary mask. Specifically, the image processing apparatus may use the map with the subject removed as a background image, or the image processing apparatus may use the background map as a background image, or the image processing apparatus may determine the background of the target object in the generated map as a background image based on the background binary mask of the generated map.
After obtaining the background image, the target image can be obtained by taking the background image as the background of the target object in the generated image. Specifically, the image processing device may intercept a target object whose image style is a target image style from the generated image according to the main binary mask, and then fuse the target object with the background image to obtain the target image.
That is, in the case where the image processing apparatus determines the background image based on the subject-eliminated map, the background of the target object in the target image is the background of the target object in the input image. In the case that the image processing apparatus determines the background image based on the background map, the background of the target object in the target image is the background matched with the input background text, that is, the user may customize the background by inputting the background text to the image processing apparatus, for example, the user may customize the background to be blue sky and white cloud by inputting the background text to the image processing apparatus, or input the background text to be overcast and rainy days by inputting the background text to the image processing apparatus, or input the background text to be starry sky by inputting the background text to the image processing apparatus. In the case where the image processing apparatus determines the background image based on the background binary mask of the generated map, the background of the target object in the target image is the background of the target object in the generated map, that is, the image style of the background of the target object in the target image is also the target image style.
In one possible implementation scenario, shown in fig. 5 is an input image, and the target image shown in fig. 6 may be generated based on the input image in the case where the input background text is a cloudy day and the target image style is a predetermined animation style. Fig. 7 shows a comparison of the input image shown in fig. 5 with the target image shown in fig. 6, wherein the input image shown in fig. 5 is left of the dividing line, and the target image shown in fig. 6 is right of the dividing line. It should be understood that the rectangular blank area in fig. 5 and the rectangular blank area in fig. 7 are used to mask a human face, and should not be interpreted as image contents of an input image.
In another possible implementation scenario, shown in fig. 5 is an input image, and where the input background text is blue sky and white cloud and the target image style is a predetermined animation style, the target image shown in fig. 8 may be generated based on the input image. Fig. 9 shows a comparison of the input image shown in fig. 5 with the target image shown in fig. 8, wherein the input image shown in fig. 5 is left of the dividing line, and the target image shown in fig. 8 is right of the dividing line. It should be understood that the rectangular blank areas in fig. 9 are all used to mask a human face, and should not be interpreted as image content of an input image.
As can be seen from fig. 7 and 9, the background in the target image is different from the background in the input image, that is, the image style of the target object in the input image is converted into the target image style, and the background of the target object is also replaced, so that the cloud travel of the avatar of the target object can be realized by replacing the background.
In yet another possible implementation scenario, illustrated in fig. 5 is an input image, and the target image illustrated in fig. 10 may be generated based on the input image without changing the background of the target object in the input image and with the target image style being a predetermined animation style. In yet another possible implementation scenario, fig. 5 illustrates an input image, and the target image illustrated in fig. 11 may be generated based on the input image in the case where the background of the target object in the input image is changed to iron tower and the target image style is a predetermined animation style. In yet another possible implementation scenario, fig. 5 illustrates an input image, and the target image illustrated in fig. 12 may be generated based on the input image in a case where an image style of a background of a target object in the input image is converted into a target style and the target image style is a predetermined animation style.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information, and obtains independent consent of the individual. If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application obtains individual consent before processing the personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a kind of personal information to be processed.
The foregoing details the method of embodiments of the present application, and the apparatus of embodiments of the present application is provided below.
Referring to fig. 13, fig. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the image processing apparatus 1 includes: the acquisition unit 11, the conversion unit 12, optionally, the image processing apparatus 1 further comprises a substitution unit 13, in particular:
an acquiring unit 11 configured to acquire an image to be processed including a target object and a target image style;
and a conversion unit 12, configured to convert an image style of the target object in the image to be processed into the target image style, to obtain a target image, while maintaining the pose of the target object.
In combination with any of the embodiments of the present application, the conversion unit 12 is configured to:
extracting edge information of the target object from the image to be processed;
and under the condition that the edge information is used as an input condition of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style to obtain the target image.
In combination with any one of the embodiments of the present application, the acquiring unit 11 is further configured to acquire a background image;
The image processing apparatus 1 further includes: and a replacing unit 13, configured to replace a pixel area except the target object in the target image with the background image, so as to obtain a style image.
In combination with any one of the embodiments of the present application, the obtaining unit 11 is configured to:
and intercepting pixel areas except the target object from the image to be processed to obtain a background image.
In combination with any one of the embodiments of the present application, the obtaining unit 11 is configured to:
expanding the target object in the image to be processed to obtain an expanded image;
and intercepting pixel areas except the target object from the expanded image to obtain the background image.
In combination with any embodiment of the present application, the replacing unit 13 is configured to:
corroding the target object in the target image to obtain a corroded image;
and replacing pixel areas except the target object in the corroded image by using the background image to obtain the style image.
In combination with any one of the embodiments of the present application, the obtaining unit 11 is configured to:
acquiring a background text describing a background;
and generating a background image according to the background text.
In combination with any of the embodiments of the present application, the conversion unit 12 is configured to:
acquiring a target text describing the target object;
and under the condition that the edge information and the target text are used as input conditions of a control network, converting the image style of the image to be processed into the target image style according to the data output by the control network and the target image style, and obtaining the target image.
In combination with any of the embodiments of the present application, the conversion unit 12 is configured to:
acquiring an image description generation model, wherein the image description generation model is used for generating a text describing an image;
and processing the image to be processed by using the image description generation model to generate the target text.
In combination with any of the embodiments of the present application, the target image style includes a predetermined animation style.
In connection with any of the embodiments of the present application, the target object includes a character.
In the embodiment of the application, after the image processing device acquires the image to be processed and the target image style, the image style of the target object in the image to be processed is converted into the target image style under the condition that the posture of the target object is maintained, so that the target image is obtained, more details of the target object can be maintained while the image style of the target object in the image to be processed is converted into the target image style, and further the probability of distortion of the target object caused by the conversion of the image style can be reduced, so that the authenticity of the target object can be improved.
In some embodiments, functions or modules included in the apparatus provided in the embodiments of the present application may be used to perform the methods described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
Fig. 14 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device 2 comprises a processor 21 and a memory 22. Optionally, the electronic device 2 further comprises input means 23 and output means 24. The processor 21, memory 22, input device 23, and output device 24 are coupled by connectors, including various interfaces, transmission lines or buses, etc., as not limited in this application. It should be understood that in various embodiments of the present application, coupled is intended to mean interconnected by a particular means, including directly or indirectly through other devices, e.g., through various interfaces, transmission lines, buses, etc.
The processor 21 may comprise one or more processors, for example one or more central processing units (central processing unit, CPU), which in the case of a CPU may be a single core CPU or a multi core CPU. Alternatively, the processor 21 may be a processor group constituted by a plurality of CPUs, the plurality of processors being coupled to each other through one or more buses. In the alternative, the processor may be another type of processor, and the embodiment of the present application is not limited.
Memory 22 may be used to store computer program instructions as well as various types of computer program code for performing aspects of the present application. Optionally, the memory includes, but is not limited to, a random access memory (random access memory, RAM), a read-only memory (ROM), an erasable programmable read-only memory (erasable programmable read only memory, EPROM), or a portable read-only memory (compact disc read-only memory, CD-ROM) for associated instructions and data.
The input means 23 are for inputting data and/or signals and the output means 24 are for outputting data and/or signals. The input device 23 and the output device 24 may be separate devices or may be an integral device.
It will be appreciated that in the embodiments of the present application, the memory 22 may be used to store not only relevant instructions, but also relevant data, and the embodiments of the present application are not limited to the data specifically stored in the memory.
It will be appreciated that fig. 14 shows only a simplified design of an electronic device. In practical applications, the electronic device may further include other necessary elements, including but not limited to any number of input/output devices, processors, memories, etc., and all electronic devices that may implement the embodiments of the present application are within the scope of protection of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein. It will be further apparent to those skilled in the art that the descriptions of the various embodiments herein are provided with emphasis, and that the same or similar parts may not be explicitly described in different embodiments for the sake of convenience and brevity of description, and thus, parts not described in one embodiment or in detail may be referred to in the description of other embodiments.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital versatiledisc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: a read-only memory (ROM) or a random access memory (random access memory, RAM), a magnetic disk or an optical disk, or the like.