Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make those skilled in the art better understand the technical solutions provided by the embodiments of the present application, the first data format and the second data format described in the present application are briefly described below.
The first data format is a raw data format in which the image sensor converts the captured light source signal into a digital signal, and the raw data is sensing data containing light from one or more spectral bands.
Illustratively, the raw data may include sensed data sampled for optical signals in the spectral band having wavelengths in the range of 380nm to 780nm, and/or 780nm to 2500 nm.
For example, an RGB sensor senses a resulting RAW (unprocessed) image signal.
Illustratively, the imaging device collects the light source signal, converts the collected light source signal into an analog signal, converts the analog signal into a digital signal, inputs the digital signal into the processing chip for processing (the processing may include bit width clipping, image processing, encoding and decoding processing, and the like), obtains data in the second data format, and transmits the data in the second data format to the display device for displaying or other devices for processing.
Therefore, the image in the first data format is an image when the imaging device converts the acquired light source information into a digital signal, the image is not processed by the processing chip, the bit width is high, and the image contains more abundant image information compared with the image in the second data format which is subjected to bit width cutting, image processing and encoding and decoding processing.
In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic flowchart of an image processing method provided in an embodiment of the present application is shown, where the image processing method may be applied to a video monitoring front-end device, such as an IPC (Internet Protocol Camera), and as shown in fig. 1, the image processing method may include the following steps:
for convenience of description and understanding, the following description will be made taking the execution subject of steps S100 to S140 as an example of IPC.
S100, collecting and caching an image frame in a first data format; the first data format is an original data format after the image sensor converts the captured light source signal into a digital signal.
In this embodiment, the IPC may convert the captured optical signal into raw data of a digital signal through the image sensor to obtain image data in the first data format.
For example, the IPC may buffer the captured image data in the first data format.
Step S110, processing the image frame in the first data format to obtain an image frame in a second data format, where the second data format is an image format suitable for display or transmission.
In the embodiment of the present application, for a data image in the first data format (referred to as an image frame in the first data format herein) corresponding to any frame image, the IPC may convert the data image into a second data format suitable for display or transmission.
Illustratively, the second data format may include, but is not limited to, an RGB format, a YUV format, a JPEG format, or the like.
In one example, the processing of the image frames in the first data format may include:
performing preset operation processing on the image frame in the first data format;
the default operation process includes one or more of:
white balance correction, color interpolation, curve mapping.
For example, for an image frame in any first data format, the IPC may convert the image frame in the first data format into an image frame in a second data format through one or more of white balance correction, color interpolation, curve mapping and the like (referred to as preset operation processing herein), and a specific implementation thereof may be described in detail below with reference to a specific example, which is not described herein again in this embodiment of the present application.
Step S120, performing target detection on the image frame in the second data format to determine target position information in the image frame in the second data format.
In the embodiment of the application, when the IPC obtains the image frame in the second data format, the target detection may be performed on the image frame in the second data format to detect an interested target (such as a pedestrian, a vehicle, an animal, or a license plate).
For any image frame of the second data format, when an object is detected by IPC in the image frame of the second data format, position information of the object in the image frame of the second data format (referred to herein as object position information) may be determined.
Step S130, based on the target position information, performing ROI area interception on the cached image frame in the first data format to obtain an ROI area image in the first data format.
In this embodiment Of the application, when the IPC determines the target position information in the image frame in the second data format, based on the target position information, ROI (Region Of Interest) Region truncation may be performed on the image frame in the first data format corresponding to the cached image frame in the second data format, that is, an image Of the ROI Region is truncated from the original image, so as to obtain an ROI Region image in the first data format.
Step S140, transmitting the ROI area image in the first data format to a server, so that the server performs enhancement processing on the ROI area image in the first data format to obtain an ROI area image in a second data format.
In the embodiment of the application, considering that the IPC processes the image frame in the first data format into the image frame in the second data format and transmits the image frame in the second data format to the server, when the server performs enhancement processing, the enhancement processing of the image is limited due to the fact that the data bit width is cut and information in the image processing process is lost, so in order to improve the image enhancement effect, the IPC may transmit the ROI area image in the first data format to the server, and the server performs image enhancement based on the ROI area image in the first data format.
For example, when the IPC intercepts the ROI area image in the first data format in the manner described in the above step, the ROI area image in the first data format may be transmitted to the server.
When receiving the ROI area image in the first data format, the server may perform enhancement processing on the ROI area image in the first data format to obtain an ROI area image in a second data format.
For example, when the server obtains the ROI area image in the second data format, the server may perform target identification on the ROI area image in the second data format, or perform other processing on the ROI area image in the second data format according to a preset policy, and specific implementation thereof will be described below with reference to specific examples, which is not described herein again in this embodiment of the present application.
As a possible implementation manner, the transmitting the ROI area image in the first data format to the server may include:
and carrying the image information of the ROI area image in the first data format in the image frame in the second data format and transmitting the image information to a server.
For example, in order to improve data transmission efficiency, image information of the ROI area image in the first data format may be carried in an image frame in the second data format corresponding to the ROI area image in the first data format, and transmitted to the server, that is, the ROI area image in the first data format and the image frame in the second data format corresponding to the ROI area image in the first data format are integrated, compressed and encoded, and transmitted to the server through a network.
Illustratively, the image information of the ROI area image in the first data format is raw data (data that has not been subjected to encoding processing) of the ROI area image in the first data format.
In one example, the transmitting the image information of the ROI area image in the first data format to the server in the image frame in the second data format may include:
and carrying the image information of the ROI area image in the first data format in a reserved field of a file header of the image frame in the second data format and transmitting the image information to a server.
Illustratively, in order to reduce the information loss of the ROI area image in the first data format and avoid the influence of the transmission of the ROI area image in the first data format on the display of the image frame in the second data format, when the IPC intercepts the ROI area image in the first data format, the image information of the ROI area image in the first data format may be copied into a reserved field of a header of an image frame in the second data format, and transmitted to a server through a network, thus, it is possible to perform the encoding and compression of the ROI area image in the first data format without the need of performing the encoding and compression, the ROI area image in the first data format is transmitted to a server, so that the information loss in the transmission process is reduced to the maximum extent, and the reserved field of the file header of the image frame in the second data format carries the image information of the ROI area image in the first data format, so that the normal display of the image frame in the second data format is not influenced.
When the server receives the image frame in the second data format, the server can analyze the header file of the image frame in the second data format, acquire the image information of the ROI area image in the first data format carried in the reserved field of the header file, obtain the ROI area image in the first data format, and perform enhancement processing based on the ROI area image in the first data format, so that the effect of enhancement processing is improved.
It should be noted that, in the embodiment of the present application, the IPC may also separately compress and encode the ROI area image in the first data format and transmit the ROI area image in the first data format to the server, that is, transmit the ROI area image in the first data format and the image frame in the second data format corresponding to the ROI area image in the first data format to the server; or, after the IPC acquires the image frame in the first data format, the image frame in the first data format and the image frame in the second data format obtained by processing the image frame in the first data format are directly transmitted to the server without intercepting the ROI, that is, the image frame full map in the first data format is transmitted to the server, and the server performs image enhancement based on the full map in the image frame in the first data format, which is not described herein in detail.
In addition, in the embodiment of the present application, if there are a plurality of different targets in one frame of image, the ROI area image in the first data format may be respectively captured for the plurality of different targets, and the ROI area image in the first data format may be transmitted to the server in the manner described above.
For example, for a plurality of ROI region images in the first data format corresponding to a plurality of targets in the same frame of image, the ROI region images may be carried in the same frame of image in the second data format and transmitted to the server.
In one example, to improve the security of data transmission, the ROI area image in the first data format transmitted by the IPC to the server may be an encrypted ROI area image in the first data format, and a specific encryption rule of the ROI area image may be determined by negotiation between the IPC and the server or configured in advance in the IPC and the server.
For example, the encrypted image information of the ROI area image in the first data format may be carried in a reserved field of a header of an image frame in the second data format and transmitted to the server.
It can be seen that, in the method flow shown in fig. 1, by capturing the ROI area image in the first data format from the image frame in the first data format acquired by the IPC, and transmitting the ROI area image in the first data format to the server, the server performs image addition based on the ROI area image in the first data format, and compared with the second data format image cut by the bit width, the first data format image has a high bit width, and is not affected by image processing, encoding and decoding processing, and the like on the original information, which is beneficial to improving the effect of image enhancement processing, and improves the quality of the image output after the image enhancement processing; in addition, the first data format image is transmitted to the server for enhancement processing, so that the image enhancement processing process can be performed on the server with stronger computing power, and an enhancement algorithm with higher complexity and better effect can be used for further improving the image enhancement effect; moreover, only the ROI area image in the first data format is transmitted, so that compared with the implementation mode of transmitting the whole image of the image frame in the first data format, the transmission pressure is reduced, and the realizability is improved.
In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following describes the technical solutions provided in the embodiments of the present application with reference to specific application scenarios.
Referring to fig. 2, a schematic structural diagram of a specific application scenario provided in the embodiment of the present application is shown in fig. 2, where the image processing system may include a front-end camera and a server (taking the server as an example).
Illustratively, the front-end camera may include an image acquisition module, a first processing module, a second processing module, an ROI region truncation module, and a third processing module.
The server may include a fourth processing module, a fifth processing module, and a sixth processing module.
The image processing flow of the present application will be described below with reference to the functions of the respective modules.
Front-end camera
1. Image acquisition module
The image acquisition module is used for acquiring image data in a first data format and caching the acquired image data in the first data format.
2. First processing module
The first processing module is used for processing the first data format image into a second data format image.
In one example, as shown in fig. 3A, the first processing module may include a white balance correction unit, a color interpolation unit, and a curve mapping unit.
Illustratively, the white balance correction unit is configured to perform a white balance correction process on the image in the first data format to remove color cast of the image due to ambient light during image formation, so as to restore original color information of the image, generally by using a gain factor Rgain、Ggain、BgainTo adjust the corresponding R, G, B component:
R′=R*Rgain
G′=G*Ggain
B′=B*Bgain
wherein R, G, B is the first data format Image (IMG) input to the white balance correction unitin) R ', G ', B ' are white balance correction unit output Images (IMG)awb) The color component of (a).
The color interpolation unit is used for converting the single-channel image into an RGB three-channel image.
Taking nearest neighbor interpolation as an example, for a single-channel image (e.g., IMG)awb) Filling the pixel points with missing corresponding colors directly with the nearest color pixels, so that each pixel point contains three color components of RGB, and the specific interpolation process can be as shown in FIG. 3B, and obtaining the image IMG after interpolationcfa。
The curve mapping unit is used for performing curve mapping processing on the image output by the color interpolation unit so as to enhance the brightness and the contrast of the image. The common curve mapping processing mode comprises Gamma curve mapping, namely, linear mapping is carried out on the image according to a Gamma table, and the image after color interpolation is converted into a second data format image.
2. Second processing module
The second processing module is used for detecting an interested target, positioning the target after detecting the target and outputting the target position information after positioning.
By way of example, common objects of interest include, but are not limited to, pedestrians, vehicles, animals, or license plates, etc.
In one example, the second processing module may be implemented by a neural network, directly outputting the target coordinates. As shown in fig. 3C, the neural network for implementing the first sub-processing unit may include a convolutional layer (Conv layer), a pooling layer (Pool layer), a full connection layer (FC layer), and a Bounding Box Regression (BBR).
For example, the operation of a convolutional layer may be represented by the following formula:
YCi(I)=g(Wi*YCi-1(I)+Bi)
wherein, YCi(I) Is the output of the current convolutional layer, YCi-1(I) For the input of the current convolution layer, denotes the convolution operation, WiAnd BiG () represents an activation function, and when the activation function is ReLU, g (x) is max (0, x).
The pooling layer is a special down-sampling layer, i.e. the feature map obtained by convolution is reduced, the size of a reduction window is NxN, and when the maximum pooling is used, the maximum value is obtained from the NxN window to be used as the value of the corresponding point of the latest image, and the specific formula is as follows:
YPj(I)=maxpool(YPj-1(I))
wherein, YCi(I) For the input of the jth pooling layer, YPj(I) Is the output of the jth pooling layer.
The full connection layer FC can be regarded as a convolution layer with a filter window of 1 × 1, and is implemented similarly to convolution filtering, where the expression is as follows:
wherein, FkI(I) For input to the kth fully-connected layer, YFk(I) Is the output of the kth fully-connected layer, R, C is FkI(I) Width and height of (W)ijAnd BijThe connection weights and offsets, respectively, for the fully-connected layer, g () represents the activation function.
The frame regression BBR is to find a relation such that the window P output by the full link layer is mapped to obtain a window G' closer to the real window G. The regression is typically performed by a translation or scaling transformation of the window P.
Let the coordinate of the window P of the full link layer output be (x)1,x2,y1,y2) Transformed coordinate (x) after window3,x4,y3,y4) If the translation is converted into translation transformation, the translation scale is (Δ x, Δ y), and the coordinate relationship before and after translation is as follows:
x3=x1+Δx
x4=x2+Δx
y3=y1+Δy
y4=y2+Δy
if the transformation is scaling transformation, the scaling in direction X, Y is dx and dy, and the coordinate relationship before and after transformation is:
x4-x3=(x2-x1)*dx
y4-y3=(y2-y1)*dy
3. ROI area intercepting module
And intercepting the ROI area image in the first data format from the cached image in the first data format according to the target position information output by the second processing module.
In one example, assume that the target location information output by the second processing module is [ x, y, h, w]X and y are coordinates of the starting point of the detection frame, h and w are the height and width of the detection frame, and fus is used for the first data format imagerawTo indicate, the cut out ROI area image fus _ t of the first data formatrawComprises the following steps:
fus_traw=fusraw(y+1∶y+h,x+1∶x+w)
4. third processing module
The third processing module is used for integrating the information of the first data format image and the second data format image and outputting the information to the transmission module for network transmission.
In one example, assuming that the second data format is JPEG, for any frame of image in the first data format, the third processing module may parse header file information of the image frame in the JPEG format corresponding to the image frame in the first data format, find an APPn field memory of the image in the JPEG format, and copy the image information of the ROI area image in the first data format, which is cut from the image in the first data format, into the field.
For example, before copying, the image information of the ROI area image in the first data format may be encrypted.
For example, all pixels of the ROI area image in the first data format may be randomly scrambled, the position information map before each point is scrambled is stored, and then both the position information map and the scrambled ROI area image in the first data format are copied into the APPn field (the front-end camera may negotiate the meaning of the copied contents in the APPn field with the server in advance).
Second, server
1. Fourth processing module
The fourth processing module is used for analyzing the image frame of the second data format received by the server to obtain the ROI area image of the first data format contained in the image frame.
In an example, still taking JPEG format as an example, the fourth processing module first parses header information of a JPEG image, finds an APPn field memory, and copies data in the APPn field memory.
For example, if the image information of the ROI area image in the encrypted first data format is transmitted by the third processing module, the fourth processing module needs to perform data decryption when copying the data in the APPn field memory.
For example, taking the above encryption manner as an example, the fourth processing module may obtain a position information map from the copied data in the APPn memory field to determine the original position information of each pixel point before the ROI area image in the first data format is scrambled, and restore the scrambled data to the original data based on the original position information.
2. Fifth processing module
And the fifth processing module is used for performing enhancement processing on the acquired ROI regional image in the first data format, and converting the ROI regional image into a high-quality ROI regional image in a second data format for subsequent display and/or intelligent identification.
For example, the enhancement processing operation (which may be referred to as an enhancement operation) may include, but is not limited to, one or more of demosaicing, denoising, deblurring, defogging, resolution enhancement, brightness adjustment, color restoration, contrast enhancement, dynamic range enhancement, sharpness enhancement, etc. of the image.
In one example, as shown in fig. 3D, the fifth processing module may include a correction processing unit and an image enhancement processing unit.
Illustratively, the correction processing unit may perform one or more of the following processes on the ROI area image in the first data format:
1) and correcting the sensor: in the production process of the sensor, certain physical defects exist due to process limitations, so that the problems of black level, dead spots, fixed pattern noise, G1/G2 green channel imbalance and the like can occur during imaging of the sensor. The sensor correction process can correct one or more of the imaging problems, so that the corrected ROI area image in the first data format can eliminate the imaging problems of different sensor production processes, eliminate equipment correlation and enable a subsequent convolutional neural network to adapt to sensors of different models.
1.1, the black spot leveling correction method can include:
out=in–blackVal
where out is the black level correction result, in is the input image, blackVal is the black level value of the sensor, and different models of sensors have different sizes of black level values.
1.2, the dead pixel correction method may include: median filtering
1.3, the fixed pattern noise correction method may include: and manually calibrating the noise position of the fixed mode, and replacing the noise pixel with the pixel point at the position by adopting the interpolation value of the surrounding pixel points.
1.4, the green channel imbalance correction method may include: and G channel average filtering.
2) And white balance correction: see the relevant implementation of the white balance correction of the first processing module.
3. A sixth processing unit
And the sixth processing unit is used for carrying out target identification on the ROI area image in the second data format obtained after the enhancement processing of the fifth processing unit.
In one example, as shown in fig. 3E, the sixth processing unit may include a target feature vector extraction unit and a feature vector comparison unit.
Illustratively, the target feature vector extraction unit may be implemented by using a deep id network, and outputs a 160-dimensional target feature vector.
The characteristic vector comparison unit can calculate Euclidean distances between the target characteristic vector and the registered standard characteristic vector, compare the Euclidean distances with a set threshold value, and judge to identify the target.
In the embodiment of the application, the image frames in the first data format are collected and cached, the image frames in the first data format are processed to obtain the image frames in the second data format, and target detection is performed on the image frames in the second data format to determine target position information in the image frames in the second data format; furthermore, the ROI area of the cached image frame in the first data format is intercepted based on the target position information to obtain an ROI area image in the first data format, and the ROI area image in the first data format is transmitted to the server, so that the server performs enhancement processing on the ROI area image in the first data format to obtain an ROI area image in the second data format, and the image enhancement processing effect is improved.
The methods provided herein are described above. The following describes the apparatus provided in the present application:
referring to fig. 4, a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure is shown in fig. 4, where the image processing apparatus may include:
anacquisition unit 410 for acquiring image frames in a first data format; the first data format is an original data format after the image sensor converts the captured light source signal into a digital signal;
abuffer unit 420, configured to buffer the image frames in the first data format;
aprocessing unit 430, configured to process the image frame in the first data format to obtain an image frame in a second data format; the second data format is an image format suitable for display or transmission;
adetection unit 440, configured to perform target detection on the image frames in the second data format to determine target position information in the image frames in the second data format;
the interceptingunit 450 is configured to perform region-of-interest ROI area interception on the cached image frame in the first data format based on the target position information, so as to obtain a region-of-interest ROI image in the first data format;
thetransmission unit 460 is configured to transmit the ROI area image in the first data format to a server, so that the server performs enhancement processing on the ROI area image in the first data format to obtain an ROI area image in a second data format.
In an optional implementation manner, thetransmission unit 460 is specifically configured to transmit the image information of the ROI area image in the first data format to the server by being carried in the image frame in the second data format.
In an optional implementation manner, the transmittingunit 460 is specifically configured to transmit the image information of the ROI area image in the first data format to the server by carrying the image information in a reserved field of a header of the image frame in the second data format.
In an optional implementation manner, thetransmission unit 460 is specifically configured to carry the encrypted image information of the ROI area image in the first data format in a reserved field of a file header of the image frame in the second data format, and transmit the image information to the server.
In an optional implementation manner, theprocessing unit 430 is specifically configured to perform preset operation processing on the image frame in the first data format to obtain an image frame in a second data format;
the preset operation treatment comprises one or more of the following steps:
white balance correction, color interpolation, curve mapping.
Fig. 5 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus may include aprocessor 501, a machine-readable storage medium 502 storing machine-executable instructions. Theprocessor 501 and the machine-readable storage medium 502 may communicate via a system bus 503. Also, theprocessor 501 may perform the image processing method described above by reading and executing machine-executable instructions in the machine-readable storage medium 502 corresponding to the image processing logic.
The machine-readable storage medium 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
Embodiments of the present application also provide a machine-readable storage medium, such as machine-readable storage medium 502 in fig. 5, comprising machine-executable instructions that are executable byprocessor 501 in an image processing apparatus to implement the image processing method described above.
Referring to fig. 6, a schematic structural diagram of an image processing system according to an embodiment of the present disclosure is shown in fig. 6, where the image processing system may include: video surveillance front-end equipment 610 and aserver 620; wherein:
the video monitoring front-end device 610 is configured to collect and cache image frames in a first data format; the first data format is a spectrum waveband sensing data format; processing the image frame in the first data format to obtain an image frame in a second data format; the second data format is an image format suitable for display or transmission; performing target detection on the image frames in the second data format to determine target position information in the image frames in the second data format; performing region-of-interest (ROI) region interception on the cached image frame in the first data format based on the target position information to obtain an ROI region image in the first data format; transmitting the ROI area image in the first data format to a server;
theserver 620 is configured to perform enhancement processing on the ROI area image in the first data format to obtain an ROI area image in a second data format.
In an optional implementation manner, theserver 620 is specifically configured to perform a preset enhancement operation on the ROI area image in the first data format;
the predetermined enhancement operation includes one or more of:
demosaicing, denoising, deblurring, defogging, resolution improvement, brightness adjustment, color restoration, contrast enhancement, dynamic range enhancement and definition enhancement.
In an optional implementation manner, theserver 620 is further configured to, if the image information of the ROI area image in the first data format is encrypted image information of the ROI area image in the first data format, decrypt the image information of the first ROI area image according to a preset encryption rule to obtain the first ROI area image;
theserver 620 is specifically configured to perform enhancement processing on the first ROI area image.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.