Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
With the development of information technology and internet technology, image processing technology has also been successfully applied to schemes including disaster relief, weather prediction, photo entertainment, face recognition, shopping quick payment, and the like. However, images are easily affected by weather such as rainy days and foggy days, severe lighting conditions, camera shake and other factors in the processes of camera acquisition, storage, transmission, processing imaging and the like, so that the captured images are unclear. In order to ensure the imaging effect of the image, the unclear image needs to be restored to a clear image, that is, the unclear image needs to be subjected to image denoising processing.
The traditional image denoising algorithm utilizes an image prior model to denoise, such as a non-local self-similarity model, a gradient model, a sparse dictionary model and a Markov random field model. The extension of the classical three-dimensional block matching (BM3D) algorithm and its variants to color images CBM3D, the main idea is based on three-dimensional non-local similarity block matching. The three-dimensional non-local similar block matching is different from the traditional non-local average thought (NLM), the similar blocks are searched by adopting hard threshold linear transformation, then the three-dimensional arrays are processed by a combined filtering method, and finally the processed result is returned to the original image through inverse transformation, so that the de-noised image is obtained. The traditional image denoising algorithm is complex in processing optimization problem, tedious parameter adjustment is often involved in the denoising process, and the effects in the aspects of noise removal and detail preservation are not satisfactory.
In recent years, the image denoising process based on the neural network is to subject a noisy image to a denoising model to obtain a noise-free image. At present, the process of training the noise reduction model usually includes designing a noise distribution in advance, specifically, the noise distribution is simulated by a computer, for example, gaussian noise, then, obtaining an original drawing, which is an acquired image of the real world, adding the preset noise distribution to the original drawing to form an image with noise, training to generate the noise reduction model, inputting the image with noise into the noise reduction model to obtain a noise reduction picture, and obtaining the trained noise reduction model through supervised learning.
However, the inventors have found in their research that the image samples required for the noise reduction model training are all artificially added noise (e.g., white gaussian noise) to the original image to generate image samples, and the noise reduction model is trained. However, the artificially added noise cannot rightly reflect the noise of the real environment, so that the training of the noise reduction model is not accurate enough, and the noise reduction effect of the model is not ideal.
Therefore, in order to solve the above-mentioned drawbacks, embodiments of the present application provide an image processing method, an apparatus, an electronic device, and a computer-readable medium, in which, through a noise extraction operation of the denoising model, a region of noise to be processed in an image to be processed can be determined to achieve the noise extraction operation, and then, according to an image of a second region, that is, a region other than the first region in the target image, noise to be processed in the first region is removed, the denoising model is trained based on a noisy sample image, noise contained in the noisy sample image is noise collected by an image collection device when the image collection device collects the image, and therefore, noise in a sample on which the denoising model is based is not artificially added but noise collected by the image collection device when the image collection device collects a real world image, so that a training result of the denoising model is more accurate, denoising of the denoising model can be more ideal.
As an implementation manner, the embodiment of the present application may be applied to a user terminal, that is, the user terminal may serve as an execution main body of the image processing method of the present application, and specifically, the execution main body may be an application program installed in the user terminal, and then a training process of the denoising model and a process of denoising an image according to the denoising model are both executed by the user terminal.
As another implementation manner, the embodiment of the present application may be applied to a server, where the server may serve as an execution main body of the image processing method of the present application, and both a training process of the denoising model and a process of denoising an image according to the denoising model are executed by the server, so that the server may obtain an image to be processed uploaded by a user terminal, and perform denoising processing on the image to be processed based on the denoising model.
The embodiment of the application can be applied to an image processing system, as shown in fig. 1, the image processing system includes aserver 10 and auser terminal 20, theserver 10 and theuser terminal 20 are located in a wireless network or a wired network, and data interaction can be performed between theserver 10 and theuser terminal 20. Theserver 10 may be an individual server, or a server cluster, or a local server, or a cloud server.
In one embodiment, theuser terminal 20 may be a terminal used by a user to browse images, and may be a device for capturing images by the user, and in some embodiments, an image capturing device is disposed in the user terminal. Theserver 10 may store pictures in theuser terminal 20, and in some embodiments, theserver 10 may be configured to train the model or algorithm involved in the embodiment of the present application, and in addition, theserver 10 may also migrate the trained model or algorithm to the user terminal, and of course, theuser terminal 20 may also directly train the model or algorithm involved in the embodiment of the present application. Specifically, in the embodiments of the present application, the execution subject of each method step in the embodiments of the present application is not limited.
Referring to fig. 2, fig. 2 shows an image processing method provided in an embodiment of the present application, where an execution subject of the method may be the server or the user terminal, and specifically, the method includes: s201 to S203.
S201: acquiring a target image to be processed, wherein the target image contains noise to be processed.
As an embodiment, the image to be processed may be an image captured by the user terminal within a camera application. Specifically, a user acquires an image by using a camera application in a user terminal, and the shot image is taken as an image to be processed after shooting of the image is completed, so that the image shot by the user can be automatically subjected to denoising processing according to the image processing method of the application when the user finishes shooting, and the denoised image is obtained and stored. In one embodiment, the denoised image and the target image to be processed, which is the image before the denoising process, may be stored together.
As another implementation, the image to be processed may also be an image requested to be displayed by the user terminal, and specifically, the image to be processed may be an image downloaded by the user terminal, and when the image is rendered, the image is denoised based on the image processing method of the embodiment of the present application, and the denoised image is displayed. The downloaded image may be an image taken by a device other than the user terminal.
The digital image is often affected by noise interference between the imaging device and an external environment in the digitization and transmission processes, and specifically, due to the self physical constraint of the image acquisition device and the limitation of the external light environment, noise inevitably exists in the acquired image.
S202: and determining a distribution region of the to-be-processed noise in the target image based on a pre-trained denoising model as a first region.
As an embodiment, the denoising model may be a deep learning based Neural network model, for example, a Convolutional Neural Network (CNN). In particular, a feature layer of the neural network model may be set according to characteristics of noise within the image, so that the model can determine a noise region within the image according to the feature layer, i.e., perform an operation of extracting noise within the image. Parameters (such as the weight of each characteristic layer and the like) in the trained denoising model are reasonable, so that the denoising model can more accurately determine a noise region in an image.
In one embodiment, the denoising model is trained based on a noisy sample image, and noise contained in the noisy sample image is noise collected when the image collection device collects an image. Specifically, when an initial denoising model, namely an untrained denoising model, is set, network parameters in the denoising model may not be reasonable enough, a noise region determined by the initial denoising model is not accurate enough, and through continuous learning of a noisy sample image on the initial denoising model, the network parameters of the denoising model are continuously optimized, so that the noise region determined by the trained denoising model is more accurate, and extracted noise is more reasonable.
In some embodiments, the noise contained in the noisy sample image is the noise collected when the image collecting device collects the image, wherein the image collecting device may be the camera of the user terminal. As an embodiment, the image to be processed may be an image captured by a camera of the user terminal, and then noise contained in the noise-containing sample image is also the noise captured by the camera of the user terminal. As an implementation manner, the image acquisition device may acquire a plurality of images as a sample image set in different shooting environments, each image includes noise, and the noise of each image is related to the shooting environment of the image, where the shooting environment may include lighting conditions, weather, shooting equipment, shooting parameters, and the like, and then a denoising model trained based on the sample image set enables the denoising model to have more accurate noise extraction and removal capabilities for different shooting environments.
In the embodiment of the application, compared with artificially added noise which is simulated by adopting a computer, the noise-removing model can be trained by adopting the noise which is collected when the image collecting device collects the image. Since the artificially added noise is the noise simulated by the computer, some mathematical relations (for example, gaussian functions) are generally satisfied, and the noise belongs to a known mathematical distribution, is ordered noise, and is different from the real noise of the image acquisition device for acquiring the real world image, and the real noise is often unordered and is often difficult to describe by the mathematical relations. Therefore, noise contained in the noisy sample image is collected by the image collecting device when the image is collected, and compared with noise which is artificially added through a real noise training denoising model, the noise extraction of the denoising model obtained through training is more accurate. Specifically, the training process for the denoising model is described in the following embodiments.
S203: and removing the to-be-processed noise in the first region according to the image in the second region in the target image to obtain the denoised target image.
And the second area is an area outside the first area in the target image. Specifically, the first region corresponds to the to-be-processed noise in the to-be-processed image, that is, the distribution region of the to-be-processed noise is the first region, as shown in fig. 3, thefirst region 301 is the distribution region of the to-be-processed noise, and it can be seen that the noise in the first region is caused by defects such as insufficient illumination when the image is acquired. Note that, other distribution areas of noise are not labeled in fig. 3.
Therefore, after determining the distribution area of the noise to be processed within the image to be processed, the area where the noise is not distributed, that is, the second area can be determined, and the noise to be processed in the first area can be removed from the image within the second area. As an embodiment, the image in the first region may be generated from the image in the second region, and the image in the first region may be replaced with the generated image. Specifically, the image data in the second area in the image to be processed and the image data in the first area have similar or identical data, wherein the image data may be pixel data corresponding to a pixel unit. As shown in fig. 3, the noise in thefirst area 301 is noise caused by light, specifically, the image in thefirst area 301 is too dark, that is, the color of the shadow of the object is too dark, the image of the shadow of the object also exists in the second area, that is, the area with the dark image exists, and the object should have the shadow under the light, so that the image to be processed should have the shadow, and the pixel data in the first area can be updated based on the pixel data of the image in the second area closer to the first area, and of course, the pixel data in the first area can be updated based on the pixel data in the sub-area in the second area matching the image type of the first area.
As an implementation manner, the first region is divided into a plurality of sub-regions, the size of the sub-region may be formed by N number of pixel points, where N is a positive integer, the sub-region in the first region is named as a first sub-region, the second region may also be divided into a plurality of sub-regions, named as a second sub-region, and the size of the first sub-region is the same as that of the second sub-region, that is, the first sub-region is only a sub-region formed by the same number of pixel points. Determining a second sub-region corresponding to each first sub-region based on a distance between each first sub-region and each second sub-region, please refer to the following embodiments in detail.
As another embodiment, the pixel data within the first region may be updated according to pixel data within a sub-region within the second region that matches the image category of the first region. Considering that in the same shooting scene, some regions of the image contain noise, some regions of the image do not contain noise, and the two regions have different categories, for example, in one image, there are multiple target objects, some regions of the target object do not contain noise, and some regions of the target object contain noise. For example, in fig. 3, under the influence of the illumination condition, noise exists at the shadow of a certain object in the first region, and no noise exists at the shadow of a certain object in the second region, the image at the shadow of a certain object in the first region may be modified according to the image at the shadow of a certain object in the second region.
In some embodiments, the noise in the first region is noise corresponding to an illumination shadow of the object, for example, the shadow is too dark or there is noise, and the like, the noise corresponding to the illumination shadow in the first region may be named as shadow noise to be processed, the type of the noise in the first region may be determined to be an illumination shadow class, the image data corresponding to the specified illumination shadow region may be determined in the second region according to the illumination shadow class, and the image data may be used as reference pixel data, and the pixel data at the illumination shadow in the first region is modified according to the reference pixel data, so that the image at the illumination shadow in the first region is consistent with the image at the illumination shadow in the second region, and the noise in the image at the illumination shadow in the first region is removed. As an embodiment, the designated illumination shadow area may be an illumination shadow area within the second area that matches the size of the shadow noise to be processed.
As another implementation mode, the specified illumination shadow area can be determined according to the category of the projection object corresponding to the shadow noise to be processed. The projection object is an object which can block the light propagation to generate shadow noise to be processed. Specifically, light is irradiated on a projection object, and a shadow corresponding to the projection object is displayed behind the light irradiation direction, and then noise exists in the shadow, and therefore the shadow becomes shadow noise to be processed. Determining the category of the projection object, wherein the category comprises human bodies, animals, buildings, furniture, electric appliances and the like. For example, when the projected object is an animal, the contour of the projected object and characteristic information such as the ear, corners, ears and limbs can be collected. When the projection object is a human body, the human face feature extraction may be performed on the projection object, wherein the method of extracting the human face feature may include a knowledge-based characterization algorithm or a characterization method based on algebraic features or statistical learning. After the appointed category of the projection object is determined, the appointed object which is the appointed category is searched in the second area, and the illumination shadow area corresponding to the appointed object is used as the appointed illumination shadow area.
In some embodiments, if the number of the specified objects is relatively large, it is necessary to determine a reference object in the specified objects, and take the illumination shadow area corresponding to the reference object as the specified illumination shadow area. Specifically, an embodiment of determining the reference object in the plurality of designated objects may be that a shadow parameter of an illumination shadow area corresponding to each designated object is determined, a shadow parameter of which the shadow parameter meets a preset shadow condition is determined as a target shadow parameter, and the illumination shadow area corresponding to the target shadow parameter is taken as the designated illumination shadow area. The shadow parameter may be a brightness of the shadow, a size of the shadow, or the like. And the illumination shadow area corresponding to the shadow noise to be processed also corresponds to the shadow parameter and is recorded as the designated shadow parameter. The embodiment of determining the shadow parameter whose shadow parameter meets the preset shadow condition as the target shadow parameter may be that, in the illumination shadow area corresponding to each designated object, the shadow parameter whose shadow parameter matches the target shadow parameter is searched for as the shadow parameter of the preset shadow condition. In one embodiment, the shadow parameter is the size of the shadow, the shadow size of the illumination shadow area corresponding to each specified object is determined, and the illumination shadow area with the size difference value smaller than a specified value with the shadow size of the shadow noise to be processed is searched as the specified illumination shadow area.
Therefore, the distribution region of the noise to be processed in the target image is determined based on the pre-trained denoising model, and is used as the first region. Through the noise extraction operation of the denoising model, the region of the to-be-processed noise in the to-be-processed image can be determined to realize the noise extraction operation, then the to-be-processed noise in the first region is removed according to the second region, namely the image of the region outside the first region in the target image, and as the second region is the region outside the noise region, the image in the second region is theoretically a noise-free image, and the noise in the first region is removed through the noise-free image, so that the noise in the to-be-processed image can be removed. In addition, the denoising model is trained based on a noisy sample image, and noise contained in the noisy sample image is the noise collected by the image collecting device when the image collecting device collects the image, so that the noise in the sample based on the denoising model is not artificially added but is the noise collected by the image collecting device when the image of the real world is collected, the training result of the denoising model is more accurate, and the denoising of the denoising model can be more ideal. As shown in fig. 4, compared to fig. 3, the image in thefirst area 301 is no longer dark as in fig. 3, and the color is more realistic, i.e. the noise caused by the lighting conditions is eliminated.
Referring to fig. 5, fig. 5 shows an image processing method provided in an embodiment of the present application, where an execution subject of the method may be the server or the user terminal, and specifically, the method includes: s510 to S570.
S510: and denoising the noise-containing sample image through a denoising model to be trained to obtain first noise data and a first image.
Wherein the first noise data comprises a plurality of noise values and pixel coordinates corresponding to each noise value. And the first noise value is the noise collected when the image collecting apparatus collects the sample image, for example, the noise is caused by the illumination condition in the real environment.
The first noise data refers to noise data extracted by a denoising model to be trained from a noisy sample image, and the first image refers to data except the first noise data in original image data corresponding to the noisy sample image. As an embodiment, the original image data may be a pixel matrix, which is referred to as an original image matrix, and the matrix corresponds to a plurality of pixel data and each pixel data corresponds to a pixel coordinate, where the pixel data may be an rgb value of a pixel point corresponding to each pixel coordinate. Specifically, assuming that the size of the original image is M × N, where M and N are both positive integers, the width of the original image is M pixels, and the height of the original image is N pixels, the corresponding matrix of the original image is an M × N matrix. For example, if M is 6 and N is 5, the original image matrix is a matrix with 5 rows and 6 columns, which may specifically be:
each element in the formula (1) corresponds to a pixel coordinate point in the original image, and the value corresponding to each element in the matrix is the pixel value of the pixel coordinate point corresponding to the element, for example, a11Is the pixel value corresponding to the (1,1) pixel coordinate point.
The process of denoising the noisy sample image by the to-be-trained denoising model may be that the denoising model extracts first noise data from the noisy sample image, so that a noisy region can be determined in the noisy sample image, further, pixel coordinates corresponding to the noisy region can be determined, and pixel values corresponding to the pixel coordinates of the noisy region are recorded as noise values, thereby determining a plurality of noise values and pixel coordinates corresponding to each noise value.
As an embodiment, an element in the matrix corresponding to the first noise data, denoted as a noise element, may be determined from the original image matrix, and the noise element in the matrix may be capable of characterizing the pixel coordinate of the first noise data, and the value corresponding to the noise element is the pixel value corresponding to the pixel coordinate of the first noise data, i.e. the noise value. As an embodiment, the first noise data may also be a matrix, named noise matrix. Taking the original image matrix as a matrix with 5 rows and 6 columns as an example, the noise matrix may be:
in the equation (2), the pixel coordinate points of the first noise data corresponding to the non-zeroed elements can be determined by the position of the pixel coordinate points of the first noise data, and the region of the first noise data in the original image can be determined. In one embodiment, the zeroing operation may be replaced with other data, and is not limited herein.
The first image is an image in an area outside the area corresponding to the first noise data in the original image. For example, the matrix corresponding to the first image is denoted as a noiseless image matrix, and taking the original image matrix as a matrix with 5 rows and 6 columns as an example, the noiseless image matrix is:
as can be seen from comparing the above equations (2) and (3), the corresponding elements of the first noise data are set to zero, and the matrix obtained by superimposing the equations (2) and (3) is the matrix of equation (1), i.e., the original image matrix.
The more accurately the denoising model is trained, the more accurately the extracted first noise data is, and the less noise the first image contains, or even a pure noise-free image.
S520: and obtaining a plurality of second noise data based on at least part of noise values of the first noise data and pixel coordinates corresponding to at least part of noise values.
As an embodiment, at least part of the noise values of the first noise data may be determined, that is, candidate noise values, pixel coordinates corresponding to each candidate noise value may be determined, and denoted as noise coordinates, and the second noise data may be determined based on the candidate noise values and the noise coordinates corresponding to each candidate noise value. Since the first noise data is not artificially added, that is, is not ordered noise generated according to a preset mathematical function, but noise caused by photographing affected by the illumination condition of the real environment, the second noise data determined based on the candidate noise values in the first noise data and the noise coordinates corresponding to each candidate noise value can reflect noise in the real environment. Moreover, second noise data are generated through the first noise data, sample images can be increased, the number of samples is increased, the accuracy of noise extraction after the noise removal model is trained can be increased, and meanwhile the labor cost for collecting a noise-containing sample image set can be avoided.
In some embodiments, it may be that the change keeps the noise coordinates of each candidate noise value unchanged, but changes the value of the candidate noise value to obtain the second noise data. For example, a portion of the candidate noise is zeroed out or otherwise altered to other values.
In other embodiments, the noise coordinates corresponding to the candidate noise values may also be scrambled, for example, if the pixel coordinate of the noise value a is (x1, y1), and the pixel coordinate of the noise value B is (x2, y2), the positions of the two may be replaced, that is, the pixel coordinate of the noise value a is (x2, y2), and the pixel coordinate of the noise value B is (x1, y 1).
In one embodiment, a region of the first noise data corresponding to the noisy sample image is denoted as a noise region, and the plurality of second noise data are located in the noise region, so that the second noise data are prevented from being added to a region outside the noise region, and the first image is polluted by noise, thereby resulting in a poorer result of the training of the denoising model.
Specifically, an embodiment of obtaining the plurality of second noise data based on at least part of the noise values of the first noise data and pixel coordinates corresponding to at least part of the noise values may be that a plurality of times of pixel position interchanging operations are performed on the first noise data to obtain the plurality of second noise data, where the plurality of times of pixel position interchanging operations are not all the same, the pixel position interchanging operation being to interchange pixel coordinates corresponding to at least part of the noise values in the plurality of noise values in the first noise data.
Taking the noise matrix V as an example, performing a pixel position interchange operation on the noise matrix V, and obtaining a noise matrix V' of the second noise data as follows:
it can be seen that in the noise matrix V' of the second noisy data, a23And a34Exchange of position of a32And a24The positions of (a) and (b) are interchanged. As an embodiment, the position exchange operation may be performed once, and then exchanged again based on the exchanged position, that is, the position exchange is performed at least twice for a certain pixel coordinate point, for example, when a is performed23And a34After interchanging the positions of (a), b33And a23The new position of (2) is exchanged.
A plurality of second noise data may be obtained based on the above-described position interchanging operation, and each of the second noise data is obtained from the first noise data, that is, the noise value in each of the second noise data belongs to the noise value in the first noise data, but the distribution positions of the respective noise values are not all the same.
S530: a plurality of second images are derived from a plurality of the second noise data and the first image.
Since the first image is regarded as a noise-free image and the second noise data is noise data generated from the first noise data, and the first noise data and the second noise data are both located within a noise area, the second noise data may be superimposed with the first image to obtain a plurality of second images, and specifically, an embodiment of obtaining a plurality of second images from a plurality of the second noise data and the first image may be such that each of the second noise data is superimposed with the first image to obtain a plurality of the second images.
As an embodiment, the area corresponding to the second noise data in the first image may be recorded as the designated pixel coordinate for each pixel coordinate corresponding to each noise value in the second noise data, and the pixel value of the designated pixel coordinate in the first image may be replaced with the noise value in the second noise data corresponding to the designated pixel coordinate, thereby fusing the areas into the plurality of second images.
As another embodiment, the image data of the first image is referred to as first image data, the first image data includes a plurality of first pixel data and pixel coordinates corresponding to each of the first pixel data, and the second noise data includes a plurality of second noise values and pixel coordinates corresponding to each of the second noise values. The first image data and the second noise data are superimposed, specifically, for each pixel coordinate, the first pixel data and the second noise value corresponding to the pixel coordinate are added, the added value is used as the second pixel data of the pixel coordinate, and each pixel coordinate and the second pixel data constitute the second image data, thereby obtaining the second image.
As described above, the matrix T 'of the second image data obtained by adding the noise-free image matrix T corresponding to the first image and the noise matrix V' of the second noise data is:
comparing matrix R with matrix T' shows that the original image data is different from the second image data, which is also an image containing noise data, and the area of its noise distribution is not changed, but all the noise is different from the noise in the original image data. Thus, the generated second image data, although also computer-synthesized, contains noise that is capable of characterizing real-world noise.
S540: and training the denoising model to be trained based on the distribution difference between the noisy sample image and each second image to obtain a trained denoising model.
As an implementation manner, the noise characteristic is that noise is noise regardless of where the noise appears in the image, and the distribution of non-noise is often logical, which randomly disturbs the distribution of pixel values in a non-noise region, and disturbs the distribution of non-noise, so that logically disordered pixel values become noise, and thus, more noise may appear in the non-noise region.
As an embodiment, as shown in fig. 6, in an extreme case, after the noise-containing sample image is denoised by the denoising model to be trained, the first noise data is obtained as all the noise-free areas, that is, the pixel data of the noise-free areas in the original image is used as the first noise data, and the noise areas in the original image is used as the first image data. When an operation of obtaining a plurality of second images from a plurality of the second noise data and the first image is performed, the distribution of noise in the obtained second images may be greatly different from the distribution of noise in the original, and the second images may be contaminated with noise to a greater extent.
Therefore, if the obtained first noise data is accurate and reasonable enough when the denoising processing operation of the image of the noisy sample is performed through the denoising model to be trained, that is, the noise value in the region originally containing noise in the original image can be accurately extracted, the degree of noise pollution of the obtained second image data should be not much different from that of the original image, that is, the distribution difference between the two is smaller.
Therefore, after the distribution difference between the noise-containing sample image and each second image is obtained, the denoising model is continuously trained according to the distribution difference, but the distribution difference is converged, the denoising model training is completed, and the trained denoising model can accurately extract the noise in the noise-containing image, namely accurately search the noise distribution area in the noise-containing image. And the distribution difference is the mean square error of the noise-containing sample image and each second image. As an embodiment, the distribution gap convergence may be that the distribution difference is smaller than a specified gap, and a plurality of consecutive distribution gaps are smaller than the specified gap, and there is no obvious downward trend.
As an implementation manner, when determining the distribution difference between the noisy sample image and each of the second images, in order to compare two groups of images at the same scale, a scale change process needs to be performed on the noisy sample image and each of the second images, specifically, the distribution difference between the noisy sample image and each of the second images may be implemented by performing a scale transformation operation on the noisy sample image and each of the second images; acquiring distribution difference between the noise-containing sample image after the scale transformation and each second image; and training the denoising model to be trained based on each distribution gap.
The noise-containing sample image and the second images are scaled into the same scale space by performing a scaling operation on the noise-containing sample image and the second images, for example, by linear transformation or neural network simulation. And obtaining distribution differences between the noise-containing sample image after the scale transformation and the second image after each scale transformation, thereby obtaining a plurality of distribution differences, and then training the to-be-trained denoising model based on each distribution difference.
S550: acquiring a target image to be processed, wherein the target image contains noise to be processed.
S560: and determining a distribution region of the to-be-processed noise in the target image based on a pre-trained denoising model as a first region.
S570: and removing the to-be-processed noise in the first region according to the image in the second region in the target image to obtain the denoised target image.
Wherein the first region includes a plurality of first sub-regions, and the second region includes a plurality of second sub-regions, then the implementation of S570 may be as shown in fig. 7, and S570 may include: s571 and S572.
S571: and determining a second sub-area corresponding to each first sub-area based on the distance between each first sub-area and each second sub-area.
In one embodiment, the first and second sub-regions are each formed by a specified number of pixels, and the first and second sub-regions have the same shape. In particular, the first and second sub-regions may each be rectangular. The first sub-region is provided with a first position point, the second sub-region is provided with a second position point, and the distance between the first sub-region and the second sub-region may be the distance between the first position point of the first sub-region and the second position point of the second sub-region, that is, the distance between the first coordinate point of the first position point and the second coordinate of the second position point. The first position point may be a position of a certain pixel in the first sub-area, and the first coordinate is a pixel coordinate of the certain pixel.
As an embodiment, the first location point may be a vertex of the first sub-region. In some embodiments, the origin of the pixel coordinate of the image to be processed is located at the upper left corner of the image to be processed, and the first position point may be a vertex in the first sub-region that is closest to the origin of the pixel coordinate of the image to be processed, for example, the first position point may be a vertex at the upper left corner of the first sub-region. Similarly, the second position point of the second sub-region may also be a vertex in the second sub-region closest to the origin of the pixel coordinates of the image to be processed.
Thereby determining the distance between the coordinates of the first position point within the first sub-area and the coordinates of the second position point within the second sub-area as the distance between the first sub-area and the second sub-area, thereby determining the distance of each first sub-area from the respective second sub-area.
As an embodiment, the second sub-region corresponding to each of the first sub-regions may be determined based on a distance between each of the first sub-regions and each of the second sub-regions, and the second sub-region corresponding to the shortest distance among the distances corresponding to each of the first sub-regions may be used as the second sub-region corresponding to the first sub-region.
Specifically, one sub-region is sequentially determined from the plurality of first sub-regions in sequence to serve as a target sub-region, the distance between the target sub-region and each second sub-region is determined, and the second sub-region closest to the distance between the target sub-regions is searched to serve as the second sub-region corresponding to the target sub-region. In some embodiments, it is determined whether the target sub-region corresponds to a distance between alternative first sub-regions, wherein the image corresponding to the alternative first sub-regions has been modified, e.g., the image corresponding to the alternative first sub-regions has been modified by the image within the second sub-region. If the target sub-region corresponds to the alternative first sub-region, determining the distance between the target sub-region and the alternative first sub-region, searching the alternative first sub-region closest to the target sub-region as the first sub-region to be confirmed, then determining a designated sub-region by the second sub-region corresponding to the target sub-region and the first sub-region to be confirmed, and taking the designated sub-region as the finally determined second sub-region corresponding to the target sub-region.
Specifically, an embodiment of determining the designated sub-region from the second sub-region corresponding to the target sub-region and the first sub-region to be confirmed may be that a distance between the second sub-region corresponding to the target sub-region and the target sub-region is determined and recorded as a first distance, a distance between the first sub-region to be confirmed and the target sub-region corresponding to the target sub-region is determined and recorded as a second distance, a minimum distance is determined from the first distance and the second distance, and the sub-region corresponding to the minimum distance is taken as the designated sub-region. If the first distance and the second distance are the same, a sub-region may be randomly determined as the designated sub-region.
As shown in fig. 8, the gray area in fig. 8 is the first sub-area, the white area is the second sub-area, the position point P1 is the first position point of the first sub-area, the position point P2 is the second position point of the second sub-area, the distances between thefirst sub-area 701 and the second sub-areas 702 and 703 are the same, and the distance between thefirst sub-area 701 and the second sub-areas 702 and 703 is the minimum, that is, the second sub-area with the minimum distance between thefirst sub-area 701 and the second sub-areas 702 and 703 is thesecond sub-area 702 and thesecond sub-area 703, one sub-area can be randomly selected from thesecond sub-area 702 and thesecond sub-area 703 as the second sub-area corresponding to the first.
S572: and modifying the image in the first sub-area corresponding to each second sub-area according to the image in the second sub-area to obtain the denoised target image.
As an implementation manner, an implementation manner of modifying the image in the first sub-region corresponding to each second sub-region according to the image in the second sub-region may be that the image in the first sub-region is modified into the image in the second sub-region corresponding to the first sub-region, that is, the image in the first sub-region corresponding to the second sub-region is replaced according to the image in each second sub-region. As shown in fig. 9, if the second sub-region corresponding to thefirst sub-region 701 is thesecond sub-region 703, the image of thefirst sub-region 701 is replaced by the image in thesecond sub-region 703.
As an embodiment, after the image in the first sub-region is replaced by the image of the second sub-region corresponding to each first sub-region, the pixel data of the first sub-region after replacement may be superimposed with the first image to form a target image, i.e., a denoised image.
In the above process, based on the distance between each first sub-region and each second sub-region, the second sub-region corresponding to each first sub-region is determined, and the image in the first sub-region corresponding to the second sub-region is modified according to the image in each second sub-region, so as to obtain the denoised target image, the second sub-region closest to each first sub-region may be determined in a convolution manner, and the pixel data in the first sub-region is replaced with the pixel data in the second sub-region closest to the first sub-region. In addition, the spatial scale of the images of the plurality of second subregions is not uniform, and thus, the images cannot be directly convolved. The whole image to be processed can be abstracted into a graph (graph), wherein the second sub-region corresponding to each first sub-region is used as a node (graph node) of the graph, and the distance between the second sub-region and the first sub-region is used as an edge (graph edge) of the graph, so that the second sub-region with the nearest distance to each first sub-region is determined in a convolution mode, and the pixel data in the first sub-region is replaced by the pixel data in the second sub-region with the nearest distance to the first sub-region.
Referring to fig. 10, fig. 10 shows an image processing method provided in this embodiment of the present application, where an execution subject of the method may be the server or the user terminal, and in this embodiment of the present application, the method is applied to a user terminal, and the user terminal may be a head-mounted display device, and the head-mounted display device can implement Augmented Reality (AR) or Virtual Reality (VR), specifically, the method includes: s1001 to S1004.
S1001: and acquiring an image rendered by the renderer as a target image to be processed.
After the video data are decoded, the decoded video data are sent to a renderer, and the decoded video data are rendered and synthesized by the renderer and then displayed on a display screen. The renderer may be a Surface flag, which is an independent Service, and receives all windows surfaces as input, calculates the position of each Surface in the final composite image according to parameters such as ZOrder, transparency, size, and position, and then sends the position to HWComposer or OpenGL to generate a final display Buffer, and then displays the final display Buffer on a specific display device.
The renderer needs a lot of time and effort when rendering high quality (noiseless) pictures, and is difficult to be directly applied to real-time rendered scenes (such as games, VR, etc.). Because, if the renderer consumes too much time when rendering a high-quality image, it may cause the image displayed by the display device to be stuck, for example, a head-mounted display device capable of rendering a VR scene may cause a user to feel dazzling and reduce the user experience due to the stuck or high delay of the displayed image, and if the high-quality image is discarded and a low-quality image is used, the user's visual experience may be greatly reduced.
Therefore, the renderer can render an image with lower quality first and record the image as the noise to be processed, the noise to be processed has more noise, the target image is obtained through the subsequent steps, and the target image is displayed, so that the image with high quality can be displayed, the too high time consumption of the renderer for rendering the image without noise can be reduced, and the purpose of rendering a real game scene and a VR scene in real time can be achieved.
Specifically, the renderer may generate an image with lower quality, that is, to-be-processed noise, using a first strategy, and display the denoised target image after denoising the to-be-processed noise by using the image processing method according to the embodiment of the present application. The renderer may also generate a high quality image, i.e. a noise-free image, using a second strategy.
In particular, the choice of the first or second strategy may be determined according to the real-time nature of the image to be displayed. Specifically, the real-time property of the image to be displayed may correspond to an application program requesting to display the image, that is, the application program corresponding to the image to be displayed is determined, and the real-time property of the image to be displayed is determined according to the application program.
As an implementation manner, an identifier of an application program corresponding to an image to be displayed is determined, and then a real-time level of the image to be displayed is determined according to the identifier of the application program. Specifically, an identifier of an application program that sends the play request of the image to be displayed is determined, and a type of the application program corresponding to the identifier of the application program is determined. As an embodiment, the real-time level may be preset, and the categories of different applications correspond to different real-time levels, for example, the real-time level of the game class is J1, the real-time level of the video class is J2, and the real-time level of the audio class is J3. Of these, J1 is ranked the highest, and next, J2 and J3 decrease in order.
And judging whether the real-time level corresponding to the image to be displayed meets a preset level, if so, rendering the image by using a first strategy by using the renderer, and otherwise, rendering the image by using a second strategy by using the renderer. The preset level is a preset real-time level and can be set by a user according to requirements. For example, the preset level is J2 and below. If the real-time level corresponding to the image to be displayed is J3, the real-time level of the image to be displayed meets the preset level, that is, for the image to be displayed with a high real-time requirement, the second strategy may not be executed, but an image with poor quality and containing noise is generated, and then the image is denoised according to the method of the embodiment of the present application, so as to avoid that the user experience is affected due to too much time consumption for rendering the image with high quality.
S1002: acquiring a target image to be processed, wherein the target image contains noise to be processed.
S1003: and determining a distribution region of the to-be-processed noise in the target image based on a pre-trained denoising model as a first region.
S1004: and removing the to-be-processed noise in the first region according to the image in the second region in the target image to obtain the denoised target image.
Therefore, the renderer can render an image with lower quality first and record the image as the noise to be processed, the noise to be processed has more noise, the target image is obtained through the subsequent steps, and the target image is displayed, so that the image with high quality can be displayed, the too high time consumption of the renderer for rendering the image without noise can be reduced, and the purpose of rendering a real game scene and a VR scene in real time can be achieved.
Referring to fig. 11, a block diagram of animage processing apparatus 1100 according to an embodiment of the present disclosure is shown, where the apparatus may include: anacquisition unit 1101, adetermination unit 1102 and aprocessing unit 1103.
An acquiringunit 1101 is configured to acquire a target image to be processed, where the target image includes noise to be processed.
The determiningunit 1102 is configured to determine, as a first region, a distribution region of the to-be-processed noise in the target image based on a pre-trained denoising model, where the denoising model is trained based on a noisy sample image, and noise contained in the noisy sample image is noise acquired by an image acquisition device when the image acquisition device acquires the image.
Theprocessing unit 1103 is configured to remove, according to an image in a second region in the target image, to-be-processed noise in the first region, to obtain the denoised target image, where the second region is a region outside the first region in the target image.
Further, theprocessing unit 1103 is further configured to determine, based on a distance between each first sub-area and each second sub-area, a second sub-area corresponding to each first sub-area; and modifying the image in the first sub-area corresponding to each second sub-area according to the image in the second sub-area to obtain the denoised target image.
Further, theprocessing unit 1103 is further configured to use a second sub-area corresponding to a shortest distance in a plurality of distances corresponding to each of the first sub-areas as the second sub-area corresponding to the first sub-area.
Further, the image processing apparatus 1000 further includes a training unit, where the training unit is configured to perform denoising processing on the noise-containing sample image through a denoising model to be trained to obtain first noise data and a first image, where the first noise data includes a plurality of noise values and pixel coordinates corresponding to each noise value; obtaining a plurality of second noise data based on at least part of noise values of the first noise data and pixel coordinates corresponding to at least part of noise values; obtaining a plurality of second images from a plurality of said second noise data and said first image; and training the denoising model to be trained based on the distribution difference between the noise-containing sample image and each second image to obtain a trained denoising model, wherein the distribution difference obtained by the trained denoising model is smaller than a specified value.
Further, the training unit is further configured to superimpose each of the second noise data with the first image to obtain a plurality of second images.
Further, the training unit is further configured to perform a plurality of pixel position interchanging operations on the first noise data to obtain a plurality of second noise data, where the plurality of pixel position interchanging operations are not all the same, and the pixel position interchanging operation is to interchange pixel coordinates corresponding to at least some of the plurality of noise values in the first noise data.
Further, the training unit is further configured to perform a scaling operation on both the noisy sample image and the plurality of second images; acquiring distribution difference between the noise-containing sample image after the scale transformation and each second image; and training the denoising model to be trained based on each distribution gap. Wherein the distribution gap is a mean square error of the noisy sample image and each of the second images.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
Referring to fig. 12, a block diagram of an electronic device according to an embodiment of the present disclosure is shown. Theelectronic device 100 may be the user terminal and the server described above. The user terminal may be an image capturing apparatus (e.g., a camera, a video camera, etc.), a terminal in which the image capturing apparatus is installed, e.g., a mobile terminal, etc., or a device having a function of rendering and displaying an image, e.g., a head mounted display device, a mobile terminal, a computer device, etc., which can implement AR or VR.
As an embodiment, the electronic device is a camera, and the electronic device can implement the image processing method, so even if the lens quality of the camera is not high or the exposure can be poor, the disordered highlight noise generated by the poor exposure lens can be removed to a certain extent by the method, so that even if the hardware of the camera is poor, the image acquired by the camera can exceed the level of the image quality obtained by the hardware parameters of the camera by the method, and the requirement on the hardware of the camera is reduced.
By adopting the method, the renderer can render an image with lower quality first and record the image as the noise to be processed, the noise to be processed has more noise, a target image is obtained through subsequent steps, and the target image is displayed, so that the high-quality image can be displayed, the too high time consumption of the renderer for rendering the image without noise can be reduced, and the purpose of rendering a real game scene and a VR scene in real time can be achieved. Therefore, obtaining high quality pictures by this method is faster and less costly than spending more overhead.
Theelectronic device 100 in the present application may include one or more of the following components: aprocessor 110, amemory 120, and one or more applications, wherein the one or more applications may be stored in thememory 120 and configured to be executed by the one ormore processors 110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.
Processor 110 may include one or more processing cores. Theprocessor 110 connects various parts within the overallelectronic device 100 using various interfaces and lines, and performs various functions of theelectronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in thememory 120 and calling data stored in thememory 120. Alternatively, theprocessor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). Theprocessor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into theprocessor 110, but may be implemented by a communication chip.
TheMemory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Thememory 120 may be used to store instructions, programs, code sets, or instruction sets. Thememory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal 100 in use, such as a phonebook, audio-video data, chat log data, and the like.
Referring to fig. 13, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 1300 has stored therein program code that can be called by a processor to execute the method described in the above-described method embodiments.
The computer-readable storage medium 1300 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 1300 includes a non-volatile computer-readable storage medium. The computer readable storage medium 1300 has storage space forprogram code 1310 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. Theprogram code 1310 may be compressed, for example, in a suitable form.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.