Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an image processing method and an image processing device, wherein the image processing device can be specifically integrated in equipment such as a server.
For example, referring to fig. 1a, when a user needs to process a certain image, an image processing request indicating information such as the image that needs to be processed and the type of element that needs to be replaced may be transmitted to the server through the terminal. After receiving the image processing request, the server may obtain a semantic segmentation model (which is trained by a deep neural network) corresponding to the element type, and then predict, according to the semantic segmentation model, a probability that each pixel in the image belongs to the element type to obtain a segmentation probability map. Thereafter, the server may further optimize the initial probability map by using a conditional random field or the like to obtain a finer segmentation result (i.e., obtain a segmentation effect map), and then fuse the image with preset element materials according to the segmentation result, for example, a first color portion (e.g., a white portion) in the segmentation effect map may be combined with a replaceable element material by a fusion algorithm, and a second color portion (e.g., a black portion) in the segmentation effect map may be combined with the image, and then combine the two combination results, and provide the combined processed image to the terminal, and so on.
The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples.
The first embodiment,
The present embodiment will be described from the viewpoint of an image processing apparatus which can be specifically integrated in a server or the like.
An image processing method comprising: the method comprises the steps of receiving an image processing request, wherein the image processing request indicates an image to be processed and an element type to be replaced, obtaining a semantic segmentation model corresponding to the element type, the semantic segmentation model is formed by training a deep neural network, predicting the probability that each pixel in the image belongs to the element type according to the semantic segmentation model to obtain an initial probability map, optimizing the initial probability map based on a conditional random field to obtain a segmentation effect map, and fusing the image and preset element materials according to the segmentation effect map to obtain a processed image.
As shown in fig. 1b, the specific flow of the image processing method may be as follows:
101. an image processing request is received.
For example, an image processing request sent by a terminal or other network-side device may be specifically received, and so on. The image processing request may indicate information such as an image to be processed and an element type to be replaced.
The element type refers to a category of elements, and an element refers to a basic element that can carry visual information, for example, if the image processing request indicates that the type of the element that needs to be replaced is "sky", it indicates that all sky parts in the image need to be replaced; for another example, if the image processing request indicates that the element type that needs to be replaced is "portrait," this indicates that all portrait portions in the image need to be replaced, and so on.
102. And acquiring a semantic segmentation model corresponding to the element type, wherein the semantic segmentation model is formed by training a deep neural network.
For example, if the received image processing request indicates that the element type requiring replacement is "sky" instep 101, a semantic segmentation model corresponding to "sky" may be acquired, and if the received image processing request indicates that the element type requiring replacement is "portrait" instep 101, a semantic segmentation model corresponding to "portrait" may be acquired, and so on.
Optionally, the semantic segmentation model may be pre-stored in the image processing apparatus or other storage devices, and acquired by the image processing apparatus when needed, or the semantic segmentation model may be built by the image processing apparatus, that is, before the step "acquiring the semantic segmentation model corresponding to the element type", the image processing method may further include:
establishing a semantic segmentation model corresponding to the element type, for example, the semantic segmentation model may specifically be as follows:
and training a preset semantic segmentation initial model by using a deep neural network according to the training data to obtain a semantic segmentation model corresponding to the element type.
For example, taking the example of establishing a semantic segmentation model corresponding to "sky", a certain number (for example, 8000) of pictures including sky may be collected, and then, according to the pictures, a preset semantic segmentation initial model is adjusted (fine tune) by using a deep neural network, and the finally obtained model is the semantic segmentation model corresponding to "sky".
It should be noted that the preset semantic segmentation initial model may be preset according to the requirements of practical applications, for example, a pre-trained semantic segmentation model for 20 categories of a general scene may be adopted.
103. Predicting the probability of each pixel in the image belonging to the element type according to the semantic segmentation model to obtain an initial probability map; for example, the following may be specifically mentioned:
(1) and importing the image into the semantic segmentation model to predict the probability that each pixel in the image belongs to the element type.
For example, if the element type is "sky", then at this time, the image may be imported into a semantic segmentation model corresponding to "sky" to predict the probability that each pixel in the image belongs to "sky".
For another example, if the element type is "portrait", then at this time, the image may be imported into a semantic segmentation model corresponding to the "portrait", so as to predict the probability that each pixel in the image belongs to the "portrait", and so on.
(2) And setting the color of the corresponding pixel on the preset mask according to the probability to obtain an initial probability map.
For example, it may be specifically determined whether the probability is greater than a preset threshold, if so, the color of the corresponding pixel on the preset mask is set as a first color, if not, the color of the corresponding pixel on the preset mask is set as a second color, and after it is determined that the colors of all pixels on the preset mask in the image are completely set, the preset mask after the colors are set is output, so as to obtain an initial probability map.
That is, a mask including a first color and a second color may be obtained, where the first color in the mask indicates that the probability that the corresponding pixel belongs to the element type is relatively high, and the second color indicates that the probability that the corresponding pixel belongs to the element type is relatively low.
For example, if the probability that a certain pixel a belongs to the "sky" is greater than 80%, the color of the pixel a on the preset mask may be set to be a first color, otherwise, if the probability that the pixel a belongs to the "sky" is less than or equal to 80%, the color of the pixel a on the preset mask may be set to be a second color, and so on.
The first color and the second color may also be determined according to the requirements of practical applications, for example, the first color may be set to white, and the second color may be set to black, or the first color may also be set to pink, and the second color may also be set to green, and so on. For convenience of description, in the embodiments of the present invention, the first color is white, and the second color is black.
104. The initial probability map is optimized based on a Conditional Random field (CRF or CRFs, also called Conditional Random Fields) to obtain a segmentation effect map.
For example, the pixels in the initial probability map may be mapped to nodes in the conditional random field, the similarity of edge constraints between the nodes is determined, and the segmentation result of the pixels in the initial probability map is adjusted according to the similarity of the edge constraints, so as to obtain a segmentation effect map.
The conditional random field is a discriminant probability model, and is a kind of random field. Like a markov random field, a conditional random field is a graph model with no direction, nodes (i.e., vertices) in the graph model represent random variables, and connecting lines between the nodes represent the dependency relationships between the random variables. The conditional random field has the capability of expressing long-distance dependency and overlapping characteristics, can better solve the problems of labeling (classification) bias and the like, and can carry out global normalization on all the characteristics to obtain a global optimal solution, so that the initial probability map can be optimized by using the conditional random field to achieve the purpose of optimizing the segmentation result.
It should be noted that, since the segmentation effect map is optimized from the initial probability map, the segmentation effect map is also a mask including the first color and the second color.
105. Fusing the image with preset element materials according to the segmentation effect graph to obtain a processed image; for example, the following may be specifically mentioned:
(1) and acquiring the replaceable element material according to a preset strategy.
The preset policy may be set according to requirements of actual applications, for example, a material selection instruction triggered by a user may be received, and then, corresponding materials are obtained from a material library according to the material selection instruction, and the corresponding materials are used as replaceable element materials.
Optionally, in order to increase the diversity of the element material, the element material may be obtained in a random interception manner, that is, the step "obtaining the replaceable element material according to the preset policy" may also include:
acquiring a candidate image, randomly intercepting the candidate image, and taking the intercepted image as a replaceable element material, and the like.
The candidate image may be obtained over the network, or may be uploaded by the user, or may even be directly captured on a terminal screen or a web page by the user and then provided to the image processing apparatus, and so on, which are not described herein again.
(2) And combining the first color part in the segmentation effect graph with the acquired element materials through a fusion algorithm to obtain a first combination graph.
Because the probability that the pixels of the first color part belong to the element type to be replaced is high, at this time, the part can be combined with the acquired element materials through a fusion algorithm, that is, the pixels of the part can be replaced by the acquired element materials.
(3) And combining the second color part in the segmentation effect map with the image through a fusion algorithm to obtain a second combination map.
Since the probability that the pixels of the second color portion belong to the element type to be replaced is low, the portion can be combined with the original image through a fusion algorithm, that is, the pixels of the portion are retained.
Optionally, in order to improve the fusion effect or implement other special effect effects, before the second color portion is combined with the image, the image may be further subjected to certain preprocessing, such as color transformation, contrast adjustment, brightness adjustment, saturation adjustment, and/or adding other special effect masks, and then the second color portion is combined with the preprocessed image by a fusion algorithm to obtain a second combination diagram.
(4) And synthesizing the first combination diagram and the second combination diagram to obtain a processed image.
Therefore, the element that needs to be replaced in the image can be replaced by the element material, for example, the "sky" in the image is replaced by the "space", and the like, and details are not repeated here.
Optionally, in order to make the fusion result more real and avoid noise or loss caused by inaccurate probability prediction, the segmentation effect map may be further processed to a certain extent before fusion, so that the segmentation boundary thereof is smoother and color transition at the joint of the replacement region may be more natural; before the step "fusing the image with the preset element material according to the segmentation effect map to obtain the processed image", the image processing method may further include:
and carrying out Appearance Model (Appearance Model) algorithm and/or image morphological operation processing on the segmentation effect map to obtain a processed segmentation effect map.
Then, the step "fusing the image with preset element materials according to the segmentation effect map to obtain a processed image" may include: and according to the processed segmentation effect graph, fusing the image with preset element materials, such as transparency (Alpha) fusion, to obtain a processed image.
The appearance model algorithm is a feature point extraction method widely applied to the field of pattern recognition, can perform statistical modeling on textures, and further fuses two statistical models of shapes and textures into an appearance model. The image morphology operation processing may include noise reduction processing and/or connected domain analysis, and the segmentation boundary may be smoother and the color transition at the joint of the replacement region may be more natural through the segmentation effect map after the processing such as the appearance model algorithm or the image morphology operation.
It should be noted that "Alpha fusion" in the embodiments of the present invention refers to fusion based on Alpha values, where Alpha is mainly used to specify the transparency level of a pixel. In general, 8 bits may be reserved for the alpha portion of each pixel, with the effective value of alpha in the range of [0, 255], with [0, 255] representing opacity [ 0%, 100% ]. Therefore, a pixel alpha of 0 indicates complete transparency, a pixel alpha of 128 indicates 50% transparency, and a pixel alpha of 255 indicates complete opacity.
As can be seen from the above, after receiving an image processing request, the embodiment may obtain, according to an instruction of the request, a semantic segmentation model corresponding to an element type to be replaced, predict, according to the model, a probability that each pixel in an image belongs to the element type, to obtain an initial probability map, optimize the initial probability map based on a conditional random field, and fuse the image and a preset element material by using a segmentation effect map obtained after the optimization, thereby achieving a purpose of replacing a certain element type part in the image with the preset element material; because the semantic segmentation model in the scheme is mainly trained by the deep neural network, and when the model is used for carrying out semantic segmentation on the image, the probability that each pixel belongs to the element type is predicted not only based on information such as color, position and the like, so that the probability of false detection and missed detection can be greatly reduced compared with the existing scheme; in addition, the scheme can also optimize the segmented initial probability map by utilizing the conditional random field, so that a more precise segmentation result can be obtained, the segmentation accuracy is greatly improved, the situation of image distortion is favorably reduced, and the image fusion effect is improved.
Example II,
The method described in the first embodiment is further illustrated by way of example.
In this embodiment, the image processing apparatus is specifically integrated in a server, and the element to be replaced is "sky" as an example.
As shown in fig. 2a and 2d, a specific flow of an image processing method may be as follows:
201. the terminal sends an image processing request to the server, wherein the image processing request can indicate information such as images needing to be processed and element types needing to be replaced.
The image processing request may be triggered in various ways, for example, by clicking or sliding a trigger key on a web page or a client interface, or by inputting a preset instruction.
For example, taking a trigger key click to trigger, referring to fig. 2b, when a user needs to replace a sky part in picture a with another element, such as a "space" element or add a "cloud," the user may upload picture a and click the trigger key "play once" to trigger generation of an image processing request, and send the image processing request to the server, where the image processing request indicates that an image to be processed is image a and the type of the element to be replaced is "sky.
It should be noted that, in this embodiment, the element to be replaced is taken as "sky" for example, and it should be understood that the type of the element to be replaced may also be other types, such as "portrait", "eye", or "plant", and the like, and the implementation thereof is similar to this, and is not described herein again.
202. After receiving the image processing request, the server acquires a semantic segmentation model corresponding to the sky, wherein the semantic segmentation model is formed by training a deep neural network.
Optionally, the semantic segmentation model may be pre-stored in the image processing apparatus or other storage devices, and is acquired by the image processing apparatus when the semantic segmentation model is needed to be used, or the semantic segmentation model may also be built by the image processing apparatus, for example, training data including the element type may be acquired, for example, a certain number of pictures including the sky are collected, and then, according to the training data (i.e., pictures including the sky), a preset semantic segmentation initial model is trained by using a deep neural network, so as to obtain the semantic segmentation model corresponding to the "sky".
It should be noted that the preset semantic segmentation initial model may be preset according to the requirements of practical applications, for example, a pre-trained semantic segmentation model for 20 categories of a general scene may be adopted.
203. The server imports the image into the semantic segmentation model to predict the probability that each pixel in the image belongs to the "sky".
For example, in step 202, if the received image processing request indicates that the image to be processed is the image a, then the image a may be imported into the semantic segmentation model corresponding to the "sky" in a three-channel color image manner to predict the probability that each pixel in the image a belongs to the "sky", and then step 204 is executed.
204. And the server sets the color of the corresponding pixel on the preset mask according to the probability to obtain an initial probability map.
For example, it may be specifically determined whether the probability is greater than a preset threshold, if so, the color of the corresponding pixel on the preset mask is set as a first color, if not, the color of the corresponding pixel on the preset mask is set as a second color, and after it is determined that the colors of all pixels on the preset mask in the image are completely set, the preset mask after the colors are set is output, so as to obtain an initial probability map.
For example, if the probability that a certain pixel K belongs to the "sky" is greater than 80%, the color of the pixel K on the preset mask may be set to be a first color, otherwise, if the probability that a certain pixel K belongs to the "sky" is less than or equal to 80%, the color of the pixel K on the preset mask may be set to be a second color, and so on.
The first color and the second color may also be determined according to the requirements of practical applications, for example, the first color may be set to white, and the second color may be set to black, or the first color may also be set to pink, and the second color may also be set to green, and so on.
For example, if the first color is set to white and the second color is set to black, the initial probability map shown in fig. 2c can be obtained after the picture a is imported into the semantic segmentation model.
205. And the server optimizes the initial probability map based on the conditional random field to obtain a segmentation effect map.
For example, the server may map pixels in the initial probability map to nodes in the conditional random field, determine similarity of edge constraints between the nodes, and adjust a segmentation result of the pixels in the initial probability map according to the similarity of the edge constraints to obtain a segmentation effect map.
Because the conditional random field is an undirected graph model, each pixel in the image can correspond to a node in the conditional random field, and the prior information including parameters such as color, texture, position and the like is preset, so that pixels with similar edge constraints among the nodes have similar segmentation results, and therefore, the segmentation results of the pixels in the initial probability graph can be adjusted according to the similarity of the edge constraints, so that the sky segmentation results are more precise, for example, referring to fig. 2c, and after the initial probability graph is optimized based on the conditional random field, a segmentation effect graph with a more precise segmentation results can be obtained.
206. The server performs appearance model algorithm and/or image morphology operation processing on the segmentation effect map to obtain a processed segmentation effect map, and then executes step 207.
The image morphology operation processing may include processing such as noise reduction processing and/or connected component analysis. By the segmentation effect graph after processing such as an appearance model algorithm or image morphology operation, the segmentation boundary can be smoother, and the color transition at the joint of the replacement region can be more natural.
It should be noted that step 206 is optional, and if step 206 is not executed, step 207 may be directly executed after step 205 is executed, and in step 208, the segmentation effect map, the image, and the element material are fused by a fusion algorithm to obtain a processed image.
207. And the server acquires the replaceable element material according to a preset strategy.
The preset policy may be set according to requirements of actual applications, for example, a material selection instruction triggered by a user may be received, and then, corresponding materials are obtained from a material library according to the material selection instruction, and the corresponding materials are used as replaceable element materials.
Optionally, in order to increase the diversity of the element material, the element material may also be obtained by a random interception method, for example, the server may obtain a candidate image, then perform random interception on the candidate image, and use the intercepted image as a replaceable element material, and so on.
The candidate image may be obtained over the network, or may be uploaded by the user, or may even be directly captured on a terminal screen or a web page by the user and then provided to the image processing apparatus, and so on, which are not described herein again.
208. And the server fuses the processed segmentation effect graph, the processed image and the element material through a fusion algorithm to obtain the processed image.
For example, the first color is white, and the second color is black, in this case, the server may combine the white portion in the segmentation effect map with the acquired element material by using a fusion algorithm to obtain a first combination map, combine the black portion in the segmentation effect map with the image a by using a fusion algorithm to obtain a second combination map, and then combine the first combination map and the second combination map to obtain a processed image.
Because the probability that the pixels of the white portion belong to the "sky" is high, at this time, the pixels of the white portion may be replaced with the acquired element materials through a fusion algorithm, and because the probability that the pixels of the black portion belong to the "sky" is low, at this time, the pixels of the white portion may be combined with the original image a through the fusion algorithm, that is, the pixels of the white portion are retained, so that after the first combination image and the second combination image are synthesized, the "sky" in the original image a may be replaced with corresponding element materials, for example, the "sky" in the image a is replaced with "night sky in christmas", and the like, see fig. 2d, which is not described herein again.
It should be noted that, optionally, as shown in fig. 2d, in order to improve the fusion effect or implement other special effect effects, before combining the black portion (i.e., the second color portion) with the image a, a certain preprocessing may be performed on the image a, such as performing color transformation, contrast adjustment, brightness adjustment, saturation adjustment, and/or adding other special effect masks, and then, the black portion is combined with the preprocessed image a by using a fusion algorithm to obtain a second combined diagram, which is not described herein again.
209. And the server sends the processed image to the terminal.
For example, the processed image may be displayed on an interface of the corresponding client. Optionally, the server may further provide a corresponding saving path and/or a sharing interface for a user to protect and/or share, for example, the processed image may be saved in a cloud or locally (i.e., in a terminal), and shared to a microblog, a friend circle, and/or inserted into a chat conversation interface of an instant chat tool, and so on, which are not described herein again.
As can be seen from the above, after an image processing request is received, a semantic segmentation model corresponding to "sky" can be obtained according to an instruction of the request, a probability that each pixel in an image belongs to "sky" is predicted according to the model to obtain an initial probability map, then, the initial probability map is optimized based on a conditional random field, and the image and a preset element material are fused by using a segmentation effect map obtained after the optimization, so that the purpose of replacing the "sky" part in the image with the preset element material is achieved; because the semantic segmentation model in the scheme is mainly trained by the deep neural network, and when the model is used for carrying out semantic segmentation on the image, the probability that each pixel belongs to the element type is predicted not only based on information such as color, position and the like, so that the probability of false detection and missed detection can be greatly reduced compared with the existing scheme; in addition, the scheme can also optimize the segmented initial probability map by utilizing the conditional random field, so that a more precise segmentation result can be obtained, the segmentation accuracy is greatly improved, the situation of image distortion is favorably reduced, and the image fusion effect is improved.
Example III,
In order to better implement the above method, an embodiment of the present invention further provides an image processing apparatus, which may be specifically integrated in a server or the like.
As shown in fig. 3a, the image processing apparatus includes a receivingunit 301, an obtainingunit 302, a predictingunit 303, an optimizingunit 304, and afusing unit 305, as follows:
(1) areceiving unit 301;
a receivingunit 301 configured to receive an image processing request indicating information such as an image that needs to be processed and an element type that needs to be replaced.
(2) Anacquisition unit 302;
an obtainingunit 302, configured to obtain a semantic segmentation model corresponding to the element type, where the semantic segmentation model is trained by a deep neural network.
For example, if the image processing request received by the receivingunit 301 indicates that the element type requiring replacement is "sky", at this time, the obtainingunit 302 may obtain a semantic segmentation model corresponding to "sky", and if the image processing request received by the receivingunit 301 indicates that the element type requiring replacement is "portrait", at this time, the obtainingunit 302 may obtain a semantic segmentation model corresponding to "portrait", and so on, which are not listed here.
Optionally, the semantic segmentation model may be pre-stored in the image processing apparatus or other storage devices, and acquired by the image processing apparatus when needed, or the semantic segmentation model may be built by the image processing apparatus, that is, as shown in fig. 3b, the image processing apparatus may further include amodel building unit 306, as follows:
themodel establishing unit 306 may be configured to establish a semantic segmentation model corresponding to the element type, for example, specifically, the following model is established:
and training a preset semantic segmentation initial model by using a deep neural network according to the training data to obtain a semantic segmentation model corresponding to the element type.
The preset semantic segmentation initial model may be preset according to the requirements of practical applications, for example, a pre-trained semantic segmentation model for 20 categories of a general scene may be adopted.
(3) Aprediction unit 303;
the predictingunit 303 is configured to predict, according to the semantic segmentation model, a probability that each pixel in the image belongs to the element type, so as to obtain an initial probability map.
For example, theprediction unit 303 may include a prediction subunit and a setting subunit, as follows:
and a prediction subunit, configured to import the image into the semantic segmentation model to predict a probability that each pixel in the image belongs to the element type.
For example, if the element type is "sky", then at this time, the prediction subunit may introduce the image into a semantic segmentation model corresponding to "sky" to predict the probability that each pixel in the image belongs to "sky".
And the setting subunit is used for setting the color of the corresponding pixel on the preset mask according to the probability to obtain an initial probability map.
For example, the setting subunit may be specifically configured to determine whether the probability is greater than a preset threshold, and if so, set a color of the corresponding pixel on a preset mask as a first color; if not, setting the color of the corresponding pixel on the preset mask as a second color; and after the colors of all pixels in the image on the preset mask are determined to be set, outputting the preset mask with the set colors to obtain an initial probability map.
The preset threshold may be set according to the requirement of the actual application, and the first color and the second color may also be determined according to the requirement of the actual application, for example, the first color may be set to white, the second color may be set to black, and so on.
(4) Anoptimization unit 304;
and the optimizingunit 304 is configured to optimize the initial probability map based on the conditional random field to obtain a segmentation effect map.
For example, theoptimization unit 304 may be specifically configured to map pixels in the initial probability map to nodes in the conditional random field, determine similarity of edge constraints between the nodes, and adjust a segmentation result of the pixels in the initial probability map according to the similarity of the edge constraints to obtain a segmentation effect map.
(5) Afusion unit 305;
and afusion unit 305, configured to fuse the image with a preset element material according to the segmentation effect map, so as to obtain a processed image.
For example, thefusion unit 305 may include a material acquisition subunit, a first fusion subunit, a second fusion subunit, and a composition subunit, as follows:
the material obtaining subunit is configured to obtain a replaceable element material according to a preset policy.
The preset policy may be set according to requirements of actual applications, for example, the material obtaining subunit may be specifically configured to receive a material selection instruction triggered by a user, obtain a corresponding material from a material library according to the material selection instruction, and use the material as a replaceable element material.
Optionally, in order to increase the diversity of the element material, the element material may also be obtained in a random interception manner, that is:
the material obtaining subunit is specifically configured to obtain a candidate image, randomly intercept the candidate image, and use the intercepted image as a replaceable element material.
The candidate image may be obtained over the network, or may be uploaded by the user, or may even be directly captured on a terminal screen or a web page by the user and then provided to the image processing apparatus, and so on, which are not described herein again.
The first blending subunit may be configured to combine, by using a blending algorithm, the first color part in the segmentation effect map with the acquired element material to obtain a first combined map.
The second fusion subunit may be configured to combine the second color part in the segmentation effect map with the image through a fusion algorithm to obtain a second combination map.
The combining subunit may be configured to combine the first combination map and the second combination map to obtain a processed image.
Optionally, in order to make the fusion result more real and avoid noise or loss caused by inaccurate probability prediction, the segmentation effect map may be further processed to a certain extent before fusion, so that the segmentation boundary thereof is smoother and color transition at the joint of the replacement region may be more natural; that is, as shown in fig. 3b, the image processing apparatus may further include apreprocessing unit 307 as follows:
thepreprocessing unit 307 may be configured to perform an appearance model algorithm and/or an image morphology operation on the segmentation effect map to obtain a processed segmentation effect map.
Then, thefusion unit 305 may be specifically configured to fuse the image with the preset element material according to the processed segmentation effect map to obtain a processed image.
The image morphological operation processing may include processing such as noise reduction processing and/or connected domain analysis, which is not described herein again.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.
As can be seen from the above, in this embodiment, after receiving an image processing request, the obtaining unit 302 obtains a semantic segmentation model corresponding to an element type to be replaced according to an instruction of the request, and the prediction unit 303 predicts a probability that each pixel in an image belongs to the element type according to the model to obtain an initial probability map, and then the optimization unit 304 optimizes the initial probability map based on a conditional random field, and the fusion unit 305 fuses the image and a preset element material by using a segmentation effect map obtained after the optimization, so as to achieve a purpose of replacing a certain element type portion in the image with the preset element material; because the semantic segmentation model in the scheme is mainly trained by the deep neural network, and when the model is used for carrying out semantic segmentation on the image, the probability that each pixel belongs to the element type is predicted not only based on information such as color, position and the like, so that the probability of false detection and missed detection can be greatly reduced compared with the existing scheme; in addition, the scheme can also optimize the segmented initial probability map by utilizing the conditional random field, so that a more precise segmentation result can be obtained, the segmentation accuracy is greatly improved, the situation of image distortion is favorably reduced, and the image fusion effect is improved.
Example four,
An embodiment of the present invention further provides a server, as shown in fig. 4, which shows a schematic structural diagram of the server according to the embodiment of the present invention, specifically:
the server may include components such as aprocessor 401 of one or more processing cores,memory 402 of one or more computer-readable storage media, apower supply 403, and aninput unit 404. Those skilled in the art will appreciate that the server architecture shown in FIG. 4 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
theprocessor 401 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in thememory 402 and calling data stored in thememory 402, thereby performing overall monitoring of the server. Optionally,processor 401 may include one or more processing cores; preferably, theprocessor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into theprocessor 401.
Thememory 402 may be used to store software programs and modules, and theprocessor 401 executes various functional applications and data processing by operating the software programs and modules stored in thememory 402. Thememory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, thememory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, thememory 402 may also include a memory controller to provide theprocessor 401 access to thememory 402.
The server further includes apower supply 403 for supplying power to each component, and preferably, thepower supply 403 may be logically connected to theprocessor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. Thepower supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The server may also include aninput unit 404, theinput unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, theprocessor 401 in the server loads the executable file corresponding to the process of one or more application programs into thememory 402 according to the following instructions, and theprocessor 401 runs the application program stored in thememory 402, thereby implementing various functions as follows:
the method comprises the steps of receiving an image processing request, wherein the image processing request indicates an image to be processed and an element type to be replaced, obtaining a semantic segmentation model corresponding to the element type, the semantic segmentation model is formed by training a deep neural network, predicting the probability that each pixel in the image belongs to the element type according to the semantic segmentation model to obtain an initial probability map, optimizing the initial probability map based on a conditional random field to obtain a segmentation effect map, and fusing the image and preset element materials according to the segmentation effect map to obtain a processed image.
For example, a replaceable element material may be obtained according to a preset policy, then a first color portion in the segmentation effect map is combined with the obtained element material through a fusion algorithm to obtain a first combination map, a second color portion in the segmentation effect map is combined with the image through the fusion algorithm to obtain a second combination map, and then the first combination map and the second combination map are combined to obtain a processed image.
Optionally, the semantic segmentation model may be pre-stored in the image processing apparatus or other storage devices, and acquired by the image processing apparatus when needed, or the semantic segmentation model may be built by the image processing apparatus, that is, theprocessor 401 may further run an application program stored in thememory 402, so as to implement the following functions:
and training a preset semantic segmentation initial model by using a deep neural network according to the training data to obtain a semantic segmentation model corresponding to the element type.
The preset semantic segmentation initial model may be preset according to the requirements of practical applications, for example, a pre-trained semantic segmentation model for 20 categories of a general scene may be adopted.
Optionally, in order to make the fusion result more real and avoid noise or loss caused by inaccurate probability prediction, the segmentation effect map may be further processed to a certain extent before fusion, so that the segmentation boundary thereof is smoother and color transition at the joint of the replacement region may be more natural; that is, theprocessor 401 may also run an application program stored in thememory 402, thereby implementing the following functions:
the segmentation effect graph is subjected to an appearance model algorithm and/or image morphological operation processing to obtain a processed segmentation effect graph, so that during subsequent fusion, the image and preset element materials can be fused according to the processed segmentation effect graph to obtain a processed image, which is detailed in the foregoing embodiment and is not repeated herein.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
As can be seen from the above, after receiving an image processing request, the server in this embodiment may obtain, according to an instruction of the request, a semantic segmentation model corresponding to an element type to be replaced, predict, according to the model, a probability that each pixel in an image belongs to the element type, to obtain an initial probability map, optimize the initial probability map based on a conditional random field, and fuse the image with a preset element material by using a segmentation effect map obtained after the optimization, thereby achieving a purpose of replacing a certain element type in the image with the preset element material; because the semantic segmentation model in the scheme is mainly trained by the deep neural network, and when the model is used for carrying out semantic segmentation on the image, the probability that each pixel belongs to the element type is predicted not only based on information such as color, position and the like, so that the probability of false detection and missed detection can be greatly reduced compared with the existing scheme; in addition, the scheme can also optimize the segmented initial probability map by utilizing the conditional random field, so that a more precise segmentation result can be obtained, the segmentation accuracy is greatly improved, the situation of image distortion is favorably reduced, and the image fusion effect is improved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The foregoing detailed description is directed to an image processing method and apparatus according to an embodiment of the present invention, and the principles and embodiments of the present invention are described herein by using specific examples, which are merely used to help understand the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.