CN119810394A

Movatterモバイル変換

Info

Publication number: CN119810394A
Application number: CN202510297710.3A
Authority: CN
Inventors: 艾斌; 张坤; 刘明镜
Original assignee: Chengdu Chuangshi Hansa Technology Co ltd
Current assignee: Chengdu Chuangshi Hansa Technology Co ltd
Priority date: 2025-03-13
Filing date: 2025-03-13
Publication date: 2025-04-11
Anticipated expiration: 2045-03-13
Also published as: CN119810394B

Abstract

The invention discloses a data processing method and a system based on image acquisition equipment, which relate to the technical field of computer vision and comprise the steps of constructing a CNN-VAE to generate a low-light image of a multi-mode image, adopting a GAN model with a counternotice module and a double-discriminant structure, and combining parallel expansion and expansion convolution expansion training data sets; the image in the training data set is enhanced by the double-branch U-Net, the image details are enhanced by utilizing the local variance self-adaptive Gaussian filter, pixel gradients are classified by a support vector machine, edge detection is carried out, resNet is adopted for extracting features, and the RPN and non-maximum suppression are combined to accurately position candidate frames. The training data is expanded through a CNN-VAE and a counternotice GAN model, image details are enhanced through double-branch U-Net and local variance self-adaptive Gaussian filtering, edge detection is optimized through a support vector machine, resNet characteristic extraction is adopted, and an RPN network and non-maximum suppression positioning object candidate frame are combined, so that high-precision object detection and recognition are realized, and the precision and robustness of image processing are remarkably improved.

Description

Data processing method and system based on image acquisition equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to a data processing method and system based on image acquisition equipment.

Background

With the wide application of image acquisition equipment, image processing technology has been greatly developed in a plurality of fields, especially in the fields of medical imaging, automatic driving, security monitoring, remote sensing image analysis and the like, the accurate processing and analysis of image data has become a core technology, in recent years, an image processing method based on deep learning is gradually dominant, especially, the wide application of technologies such as Convolutional Neural Network (CNN) and generation countermeasure network (GAN) and the like, so that tasks such as image classification, detection, enhancement and the like have been significantly progressed, CNN is taken as a powerful feature extraction tool, has been widely applied to image recognition and segmentation tasks, GAN has shown great potential in terms of image generation and enhancement by virtue of the generation capability of GAN, and meanwhile, a variational self-encoder (VAE) has been increasingly applied to the image processing field due to the advantages of the variational self-encoder (VAE) in terms of generating models and data reconstruction, and the combination of the technologies provides a more flexible solution for multi-mode image processing.

However, the existing image processing technology still faces many challenges when dealing with multi-mode image processing, the multi-mode image is usually from different sensor devices, such as RGB image and infrared image, the illumination condition, resolution and detail performance of the two are large, how to effectively fuse the images and ensure the processing accuracy is still a difficult problem, the existing image enhancement method, such as the enhancement method based on simple convolution operation, often has difficulty in fully extracting and retaining the detail information of the image, especially in low illumination and shadow areas, and often has the situation of detail loss, although the generation of a countermeasure network (GAN) can effectively enhance the image quality, the generated image still has a certain quality problem under complex conditions of uneven illumination, noise and the like, especially has insufficient accuracy in terms of edge detection and object positioning, and the existing object detection method based on deep learning often fails to effectively combine the characteristics of different modes in terms of joint processing and analysis of multi-mode data, so that the accuracy and robustness of object recognition are limited.

Disclosure of Invention

The present invention has been made in view of the above-described problems occurring in the prior art.

Therefore, the invention provides a data processing method and a system based on image acquisition equipment, which solve the problems of insufficient image fusion, detail loss, inaccurate object detection and the like in the existing image processing technology when processing multi-mode images;

in order to solve the technical problems, the invention provides the following technical scheme:

in a first aspect, the present invention provides a data processing method based on an image acquisition apparatus, comprising:

acquiring multi-mode images of the image acquisition equipment and preprocessing the multi-mode images;

the multi-modal image includes an RGB image and an infrared image;

Integrating the multi-mode images into a training data set, constructing CNN-VAE to generate a low-light image of the multi-mode images, adopting a gaN model with a counternotice module and double-discriminant structure, and combining with parallel expansion and expansion convolution to expand the training data set;

Enhancing the image in the training data set through a double-branch U-Net, enhancing the image details through local variance self-adaptive Gaussian filtering, classifying pixel gradients through a support vector machine, and performing edge detection;

Extracting features by ResNet and combining RPN with a non-maximum inhibition accurate positioning candidate frame, carrying out object detection and identification by classification regression and carrying out visual display;

All data is stored in a database and managed.

The invention is used as a preferable scheme of the data processing method based on the image acquisition equipment, wherein the construction of CNN-VAE to generate a low-light image of a multi-mode image adopts a GAN model with a counternotice module and a double-discriminant structure, and the combination of the parallel expansion and expansion convolution expansion training data set comprises the following steps:

luminance adjustment is carried out on the multi-mode images through histogram equalization by using different directions of the random rotation simulation preprocessed images and the multi-mode images are integrated into a training data set, the training data set is mapped to a potential space through a variational self-encoder, and potential space data are obtained by sampling from the potential spaceThe variation self-encoder learns the distribution of potential space by maximizing the variation lower bound, generates a low-light image by minimizing a loss function, and uses the generated low-light image as a new sample expansion training data set;

Using a generated countermeasures network extended training data set to define a counternotice module, constructing a dual-discriminant structure to extract global features and local features of an image, adding the global features and the local features to each layer in a generator, using global average pooling to encode spatial information of a multi-mode image in the training data set, performing nonlinear transformation through a Sigmoid activation function, generating a counternotice map, and suppressing unnecessary areas through performing counteroperations to generate the image;

And integrating a parallel expansion convolution module in the generation antagonism network, generating a receptive field of the antagonism network by using convolution expansion of different expansion rates in parallel, enhancing the image, and outputting a final enhancement data set.

The method for processing the data based on the image acquisition equipment is characterized in that the construction of the double-discriminant structure to extract the global features and the local features of the image comprises the following steps:

The anti-attention module extracts global features and local features of the image, adjusts the weight of a feature channel according to the importance of the global features and the local features, adopts PatchGAN as a local discriminator, combines the global discriminator for training, uses the real image and the generated image for training the discriminator, calculates the discriminator loss, and uses an Adam optimizer for training a model until convergence.

As a preferable scheme of the data processing method based on the image acquisition device, the method for enhancing the image in the training data set through the double-branch U-Net comprises the following steps:

The method comprises the steps of using a U-Net framework as a joint learning framework, integrating an anti-attention module for each convolution layer, respectively inputting an RGB image and an infrared image in a final enhanced data set into two parallel U-Net branches to generate an enhanced image, retaining local features of the image through jump connection, updating parameters of the U-Net by using a joint loss function, defining an optimizer, calculating gradients and updating, optimizing the U-Net model, and obtaining the final enhanced image.

The method for processing data based on image acquisition equipment according to the invention is characterized in that the step of enhancing image details by using local variance adaptive Gaussian filtering, and the step of classifying pixel gradients and performing edge detection by using a support vector machine comprises the following steps:

Converting the enhanced image into a gray image, carrying out fixed-size blocking processing on the gray image to obtain a plurality of local subareas, calculating local gray variance values in each local subarea, taking the gray variance values of all the local subareas as one-dimensional distribution, and determining an optimal gray variance threshold value through calculating a gray histogram of the gray image and an Otsu algorithm formula;

determining the unique local Gaussian smoothing window size of each sub-region according to the optimal gray variance threshold, performing deterministic Gaussian filtering operation on the gray value of each pixel position in each corresponding region in the original image, replacing the original gray value with the filtered gray value, filtering each sub-region, splicing the sub-region back to the complete image of the original size according to the original image position certainty, calculating the gradient value of each pixel of the image, calculating the gradient amplitude and direction by a Canny edge detection algorithm, and performing edge detection on each pixel;

Selecting a support vector machine algorithm, using RBF kernels as kernel functions, extracting characteristics of each pixel from the enhanced image as input characteristics of a training data set, marking edge detection conditions of each pixel in the input characteristics, using the training data set with the marks, training a model by minimizing a loss function of the support vector machine, and solving to obtain an optimal hyperplane and bias by using a gradient descent method;

the characteristics of each pixel of the image are input into a trained support vector machine model for classification, and the edge probability of each pixel in the image is adjusted based on the output of the support vector machine model.

The method for processing the data based on the image acquisition equipment comprises the following steps of adopting ResNet to extract features, combining RPN with a non-maximum inhibition accurate positioning candidate frame, carrying out object detection and identification through classification regression, and carrying out visual display:

using a pre-trained ResNet model as a feature extractor;

Pooling the features of each channel by using global average pooling to obtain one-dimensional feature vectors, generating object candidate frames by using Region Proposal Network of Faster R-CNN, selecting an optimal candidate frame from a plurality of object candidate frames by using a non-maximum suppression algorithm, classifying each candidate frame by using a Softmax classifier, judging the class of each candidate frame by setting a threshold value, carrying out regression operation on each candidate frame, predicting the specific position and size of an object in the frame, visualizing object detection and identification results, and generating final output.

As a preferable scheme of the data processing method based on the image acquisition device, the storing and managing all data in the database comprises:

And selecting relational database management data and relational analysis results, designing a database table structure to store different types of data, setting a periodic backup task, backing up all data in a database, and carrying out encryption storage on authority management static data of a database user.

In a second aspect, the present invention provides a data processing system based on an image acquisition device, comprising:

the data acquisition module is used for acquiring multi-mode images of the image acquisition equipment and preprocessing the multi-mode images;

The model construction module is used for integrating the multi-mode images into a training data set, constructing CNN-VAE to generate a low-light image of the multi-mode images, adopting a gaN model with a counternotice module and double-discriminant structure, and combining the parallel expansion convolution expansion training data set;

The image enhancement module is used for enhancing the images in the training data set through the double-branch U-Net;

the edge detection module is used for enhancing image details by utilizing local variance self-adaptive Gaussian filtering, classifying pixel gradients through a support vector machine and carrying out edge detection;

The feature extraction module is used for extracting features by adopting ResNet and combining RPN with non-maximum inhibition accurate positioning candidate frames;

The target recognition module is used for carrying out object detection recognition and visual display through classification regression;

and the data storage module is used for storing all the data into the database and managing the data.

In a third aspect, the invention provides a computer device comprising a memory and a processor, the memory storing a computer program, wherein the computer program when executed by the processor implements any of the steps of the image acquisition device based data processing method according to the first aspect of the invention.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements any of the steps of the image acquisition device based data processing method according to the first aspect of the present invention.

The invention has the beneficial effects that training data is expanded through a CNN-VAE and a counternotice GAN model, image details are enhanced by utilizing double-branch U-Net and local variance self-adaptive Gaussian filtering, edge detection is optimized through a support vector machine, resNet characteristic extraction is adopted, and an object candidate frame is positioned by combining an RPN network and non-maximum inhibition, so that high-precision object detection and identification are realized, and the precision and robustness of image processing are remarkably improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a data processing method based on an image acquisition device in the present invention.

Fig. 2 is a schematic diagram of a data processing system based on an image acquisition device in the present invention.

FIG. 3 is a flow chart of the present invention for expanding a training data set.

FIG. 4 is a flow chart of the support vector machine for classifying pixel gradients and performing edge detection in the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

An embodiment, referring to fig. 1 to 4, provides a data processing method based on an image acquisition device, including the steps of:

S1, acquiring multi-mode images of an image acquisition device and preprocessing the multi-mode images;

Specifically, according to task requirements, using image acquisition equipment (such as low-light photographing equipment and an infrared sensor) to synchronously acquire multi-mode images of a target area in a low-light environment (such as at night or in cloudy days, etc.), and assuming that a target scene is remote sensing monitoring;

the RGB image is a three-channel image of red, green and blue, the data of each channel is a matrix, and the value of each pixel represents the color information of RGB;

the infrared image is a single-channel gray level image, and the acquisition time stamps are aligned to ensure that the multi-mode image corresponds to the same area;

The preprocessing comprises the steps of carrying out spatial alignment and resolution standardization on the multi-mode image, up-sampling the infrared image to the resolution of the RGB image, calculating new pixel values by using a bilinear interpolation method, respectively applying a non-local mean denoising algorithm to the RGB image and the infrared image, suppressing noise in the equipment acquisition process, checking the missing area of the multi-mode image, filling the missing area by using a low-rank approximation method based on matrix filling, and carrying out pixel value normalization on the multi-mode image.

The preprocessing process ensures that the infrared image and the RGB image have the same resolution through spatial alignment and resolution standardization, ensures the spatial consistency of the images, is convenient for subsequent analysis and fusion, effectively inhibits noise possibly generated in the acquisition process by adopting a non-local mean denoising algorithm, improves the definition of the images, enhances the image quality especially in a weak light environment, fills the missing area in the multi-mode image through a matrix complement low-rank approximation method, ensures the integrity of data, eliminates the brightness difference under different image sources and acquisition conditions through pixel value normalization, ensures the uniform scale of the images, and is beneficial to improving the training effect of a subsequent deep learning model, thereby improving the accuracy and the robustness of remote sensing monitoring tasks.

S2, integrating the multi-mode images into a training data set, constructing CNN-VAE to generate low-light images of the multi-mode images, and expanding the training data set by adopting a gaN model with a counternotice module and a double-discriminator structure and combining parallel expansion convolution;

specifically, constructing a CNN-VAE to generate a low-light image of a multi-mode image, adopting a anti-attention module and a GAN model with a double-discriminant structure, and combining a parallel expansion convolution expansion training data set comprises:

Using random rotation to simulate different directions of the preprocessed image, carrying out brightness adjustment on the multi-mode image through histogram equalization and integrating the multi-mode image into a training data set;

To increase the diversity of the training data set and increase the robustness of the model, the training data set is mapped to the potential space by a variational self-encoder (VAE) whose structure includes an encoder designed as a 5-layer Convolutional Neural Network (CNN), a decoder designed as a 5-layer deconvolution network, outputting the mean and variance of the potential space (dimension set to 128), and the potential space data sampled from the potential spaceThe formula is:

,

Wherein,Representing the mean value of the potential space,Representing the variance of the potential space,Representing a standard normal distribution random noise;

by maximizing the lower bound of variation (ELBO), letting the variation learn the distribution of potential space from the encoder and generating a low-light image by minimizing the loss function, providing more samples for the model, enhancing the generalization ability of the model, the loss function is:

,

Wherein,Representing the total loss function value of the variation from the encoder,For the output of the encoder, a given input image is representedAnd potential spatial dataIs inferred from the variation,Representing given potential spatial data for output of decoderUnder the condition of (1) generating an input imageIs a function of the probability distribution of (1),Representing the desired value, i.e. the desired operator,Representing KL divergence, measuring encoder distributionPrior distributionThe difference between the two is that,Representing given potential spatial data for the expected value portionWhen the input image is reconstructedLog-likelihood of (2) measuring potential spatial dataReconstructing an input image in a decoderBy maximizing this desired value, the simulation is able to generate an output image that is similar to the input image,Representing potential spatial dataA priori distribution of the variation from the encoder typically uses a standard normal distribution as the a priori distribution;

Taking the generated low-light image as a new sample expansion training data set;

to further optimize image quality, using a Generated Antagonism Network (GAN) extended training dataset, defining an anti-attention module (AAB), constructing a dual arbiter structure to extract global and local features of an image and add to each layer in the generator, encoding spatial information of a multi-modal image in the training dataset using global averaging pooling, performing a nonlinear transformation by Sigmoid activation function, generating an anti-attention map and suppressing unwanted regions by performing an anti-operation (suppressing light pollution regions by anti-attention operation, emphasizing dark detail of the image), generating an image;

Integrating a parallel expansion convolution module in a generating contrast network, expanding and generating a receptive field of the contrast network by using convolutions with different expansion rates (such as expansion rates of 1,3 and 5) in parallel, performing image enhancement, capturing more context information, and outputting a final enhancement data set (comprising an original image, a VAE enhancement image and a high-quality image generated by GAN), wherein the formula is as follows:

,

Wherein,An enhanced image is represented and is displayed,The stitching operation of the images is represented,Representing a pair of imagesIs subjected to the expansion convolution operation of (a),Representing the expansion rate, features can be extracted from multiple scales.

Based on the above, the proposed scheme combines various advanced techniques to significantly enhance the diversity and image quality of the training data set, thereby enhancing the generalization capability and robustness of the model, random rotation and histogram equalization are used for preprocessing and brightness adjustment of the multi-modal image, expanding the diversity of the training data set, then mapping the data to a potential space through a variation self-encoder (VAE), generating low-light images, which are used for expanding the data set and enhancing the adaptability of the model to low-light conditions, further optimizing the generated image through generating an antagonism network (GAN) and an anti-attention module (AAB), emphasizing dark details, suppressing unwanted areas such as light pollution, generating a clearer and more natural low-light image, integrating a parallel expansion convolution module, extracting multi-scale context information through convolution of different expansion rates, enhancing the feeling field and quality of the generated image, through the steps, the generated enhanced image not only can enhance the performance of the model under low-light conditions, but also help to enhance the model accuracy under different environmental conditions.

Further, constructing the dual arbiter structure to extract global features and local features of the image includes:

The anti-attention module extracts global features (namely information such as the overall brightness and contrast of the image) and local features (namely shadows in the image or textures of dark parts) of the image, ensures that an antagonistic network is generated to pay attention to the overall perception of the image, can process dark part details, adjusts the weight of a feature channel according to the importance of the global features and the local features, and particularly strengthens the weight of a dark part area in the process, so that the dark part details can be fully recovered in the image generation process;

PatchGAN is adopted as a local discriminator and is combined with a global discriminator to train, a real image and a generated image are used for training the discriminator, the loss of the discriminator is calculated, an Adam optimizer is used for training a model until convergence, the global discriminator is responsible for evaluating the authenticity of the whole generated image and outputting a classification result to indicate whether the image is authentic, the structure is generally based on a Full Convolution Network (FCN), the aim is to evaluate the large-scale characteristics (such as textures, hues and the like) of the image, the local discriminator is focused on the local details of the image, a plurality of local areas (such as 32 multiplied by 32) of the image are processed by adopting smaller convolution kernels, the details of the local areas of the image can be thinned in the generation process, such as edges, textures and the like, and the generated image can be ensured to achieve the sense of reality in the detail level;

By combining the global discriminant with the local discriminant, the dual discriminant structure can enhance the focus of the generator on the details of the image, so that the generated image not only looks true on the whole, but also is finer in detail, and the detail processing capability of the local discriminant can avoid distortion of the generated image in certain local areas (such as shadows or highlights).

The dual-discriminant structure is combined with the anti-attention module, so that the performance of the generated countermeasure network (GAN) in the image generation process can be remarkably improved, the anti-attention module extracts global features (such as brightness, contrast and the like) and local features (such as shadow or dark textures), the weights of feature channels are dynamically adjusted according to the importance of the features, particularly the weights of dark areas are enhanced, the dark details are fully recovered, patchGAN is adopted as a local discriminant, the global discriminant based on a Full Convolution Network (FCN) is combined, the dual-discriminant structure enables the generator to pay attention to the overall perception (such as textures and hues) of an image, and can also finely process local details (such as edges, textures and the like), the structure effectively improves the sense of reality of the generated image on the detail level, avoids distortion of the local areas, particularly in shadow and high-light parts, and finally enables the generated image to achieve higher quality on the whole and detail, and improves the sense of reality and visual effect of the image.

S3, enhancing the image in the training data set through a double-branch U-Net, enhancing the image details through local variance self-adaptive Gaussian filtering, classifying pixel gradients through a support vector machine, and performing edge detection;

specifically, enhancing the image in the training dataset through the dual-branch U-Net includes:

The method comprises the steps of using a U-Net framework as a joint learning framework and integrating anti-attention modules for each convolution layer to strengthen dark parts and detail areas of images, enabling the U-Net to generate clearer images under low light and shadow conditions with the aid of the anti-attention modules, respectively inputting RGB images and infrared images in final enhanced data sets into two parallel U-Net branches to generate enhanced images, retaining local characteristics of the images through jump connection, updating parameters of the U-Net by using joint loss functions, defining an optimizer, calculating gradients and updating, optimizing the U-Net models, and obtaining final enhanced images;

The joint loss function comprises exposure consistency loss, color consistency loss and space consistency loss, wherein the exposure consistency loss ensures that the exposure degree of an image is within a reasonable range, and avoids the image from being too dark or too bright, and the loss function calculates the difference between the brightness of a local area of the image and the target exposure degree rangeThe loss function is:

,

Wherein,Representing enhanced images in a training datasetFirst, theThe luminance value of the individual local areas,Indicating an ideal exposure range, it is generally desirable that the brightness of the image be within this range, indicating a moderate exposure,Representing the total number of local regions in the image;

the color consistency loss is used for avoiding color distortion of the generated image, ensuring that the RGB channels of the image are similar in color, and measures the color difference between the red, green and blue channels of the imageThe loss function is:

,

Wherein,、、Respectively representing the average intensity values of red, green and blue channels after image enhancement;

The spatial consistency loss ensures the consistency of the generated image and the original image in spatial structure, avoids the loss of image details, and calculates the intensity difference of local areas of the imageReflecting the degree of retention of the image structure;

,

Wherein,AndRepresenting the enhanced image and the original image respectively at the firstAverage intensity values of the individual local regions;

Calculating joint loss functionThe formula is:

,

Wherein,、、Representing the weight of the loss function.

The method is characterized in that the local characteristics of the images are reserved by utilizing jump connection through parallel processing of RGB images and infrared images, important details are not lost in the enhancement process, the joint loss functions comprise exposure consistency loss, color consistency loss and space consistency loss, the consistency of exposure, color and space structure of the images is respectively ensured, so that darkness, color distortion and detail loss of the images are avoided, the model can generate more real and clear images under different environment conditions through optimizing the joint loss functions, the diversity and quality of the training data sets are improved, and finally the adaptability of the model to low light and complex environments is enhanced.

Further, enhancing image detail using local variance adaptive gaussian filtering, classifying pixel gradients and performing edge detection by a support vector machine includes:

converting the enhanced image into a gray image, performing fixed-size blocking processing on the gray image, and using 16The method comprises the steps of carrying out region-by-region scanning and segmentation on a gray image through a 16-pixel fixed square window to obtain a plurality of local subareas, calculating local gray variance values in each local subarea, taking the gray variance values of all the local subareas as one-dimensional distribution to measure the complexity of regional details, setting a current candidate gray variance threshold, and determining an optimal gray variance threshold through a gray histogram of the calculated gray image and an Otsu algorithm formula so as to effectively detect edges under different illumination conditions, wherein the formula is as follows:

,

Wherein,Represents the optimal gray level variance threshold value,The number of pixels with pixel gray variance smaller than and greater than or equal to the current candidate gray variance threshold is respectively represented,AndRepresenting average gray values of regions where the pixel gray variance is less than and greater than or equal to the current candidate gray variance threshold,The optimal gray variance threshold to be solved is represented and used for distinguishing the detail richness of the region;

Determining the unique local Gaussian smooth window size of each sub-region according to the optimal gray variance threshold, wherein the formula is as follows:

,

Wherein,Representing the first in a gray scale imageLine 1The size of the Gaussian filter window corresponding to the column region, ifIndicating that the area has rich detail, the Gaussian filter window which should be selected in the area is 33 Pixels to preserve image details as much as possible ifRepresenting the region being smooth, the region should have a Gaussian filter window of 77 Pixels to achieve better noise reduction;

based on the unique local Gaussian smoothing window size of each sub-region, carrying out deterministic Gaussian filtering operation on the gray value of each pixel position in each corresponding region in the original image, and replacing the original gray value with the filtered gray value;

each subarea is filtered and spliced back to a complete image with the original size according to the original image position certainty;

Calculating the gradient value of each pixel of an image (namely, the brightness change speed of the image), calculating the gradient amplitude and the gradient direction through a Canny edge detection algorithm, carrying out edge detection on each pixel, and judging whether each pixel is an edge or a non-edge;

Selecting a Support Vector Machine (SVM) algorithm, using RBF kernels (radial basis kernels) as kernel functions, extracting features (e.g., gradient magnitude and direction) of each pixel from the enhanced image as input features of a training data set, marking edge detection conditions (edges or non-edges) of each pixel in the input features, using the labeled training data set as target output of the SVM classifier, training a model by minimizing a loss function of the support vector machine, and solving for optimal hyperplane and bias by using a gradient descent method so as to minimize classification errors, thereby obtaining an optimal classification model;

The characteristics of each pixel of the image are input into a trained support vector machine model for classification, the SVM model can judge whether each pixel belongs to an edge area, and the edge probability of each pixel in the image is adjusted based on the output of the support vector machine model, so that the detected edge is more accurate;

the output after the optimization of the SVM can help to more accurately identify the real edge by improving the accuracy of edge detection and reducing the phenomena of false detection and missing detection, especially in the area with more fuzzy edge transition or more noise.

According to the method, image detail enhancement and edge detection are carried out by combining local variance adaptive Gaussian filtering and Support Vector Machine (SVM) classification, the edge recognition accuracy of an image is remarkably improved, in the detail enhancement process, local gray variance is used for measuring detail complexity of each sub-area, so that the most proper Gaussian smoothing window is distributed for each area, details can be reserved, noise can be restrained, the SVM algorithm utilizes gradient amplitude and direction as characteristics to carry out edge classification, the edge detection accuracy is optimized, the model can accurately distinguish edges from non-edge pixels by minimizing a loss function of the SVM, false detection and omission detection are reduced, and particularly in areas with fuzzy edge transition or more noise, the edge detection is ensured to be more accurate and reliable. The method can still stably and effectively identify the image edge under the complex illumination condition, and has stronger robustness.

S4, extracting features by ResNet, combining RPN with a non-maximum inhibition accurate positioning candidate frame, carrying out object detection and identification through classification regression, and carrying out visual display;

Specifically, extracting features by ResNet, combining the RPN with a non-maximum suppression accurate positioning candidate frame, performing object detection and identification through classification regression, and performing visual display comprises:

the pre-trained ResNet model is used as a feature extractor, resNet is a convolutional neural network containing 50 layers, has strong image feature extraction capability, and can extract multi-level features from images through multi-layer convolution, pooling and residual connection;

Pooling features of each channel using global averaging pooling to obtain one-dimensional feature vectors for describing objects in the image, generating object candidate boxes using Region Proposal Network (RPN) of fast R-CNN, predicting candidate boxes for each location by sliding a small window over the convolution feature map, scoring each candidate box to represent its probability of containing an object, selecting an optimal candidate box from a plurality of object candidate boxes using a non-maximum suppression (NMS) algorithm, classifying each candidate box using a Softmax classifier, judging the class of each candidate box by setting a threshold, considering that an object belongs to a class if the probability of that class exceeds the set threshold, otherwise considering that the object does not exist or belongs to a "background" class, performing regression operation on each candidate box, predicting the specific location and size of the object within the box, visualizing the object detection and recognition result, and generating a final output.

By combining a pre-trained ResNet feature extractor, an RPN and a non-maximum suppression algorithm, the method can accurately extract multi-level features from an image, generate high-quality object candidate frames and conduct accurate classification and regression, resNet provides strong image feature extraction capability, global average pooling is utilized to effectively reduce the number of parameters, model efficiency is improved, the RPN can generate potential object areas in a sliding window mode, confidence scores are given to each candidate frame, optimization of the candidate frames is guaranteed due to non-maximum suppression, interference of repeated frames is avoided, the model can accurately identify object types, fine prediction is conducted on specific positions of the objects, object detection and identification are conducted by combining the methods, not only are the detection accuracy improved, but also the positions and the types of the objects can be clearly displayed in the image, and efficient and accurate object detection and visualization are achieved.

S5, storing all the data into a database and managing the data;

specifically, storing all data in a database and managing includes:

The method has the advantages that all data are stored in the database and managed, the relational database can effectively manage different types of data and relations among the data, consistency, integrity and queriability of the data are guaranteed, data storage and retrieval performance can be optimized, data management efficiency is improved, regular backup tasks guarantee data safety, a system can be quickly restored when faults occur, data loss is prevented, authorized users can access sensitive data through database user authority management, data safety is improved, data leakage can be effectively prevented for encryption storage of static data, confidentiality of the data in the storage process is guaranteed, and safety, reliability and management efficiency of the data can be improved through the measures.

The embodiment also provides a data processing system based on the image acquisition device, including:

The embodiment also provides computer equipment, which is suitable for the situation of the data processing method based on the image acquisition equipment, and comprises a memory and a processor, wherein the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the data processing method based on the image acquisition equipment.

The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

The present embodiment also provides a storage medium having a computer program stored thereon, which when executed by a processor implements the data processing method based on the image capturing device as set forth in the above embodiment, the storage medium may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM for short), an electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM for short), a Programmable Read-Only Memory (ROM for short), a magnetic Memory, a flash Memory, a magnetic disk or an optical disk.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

Translated fromChinese

1.一种基于图像采集设备的数据处理方法，其特征在于：包括：1. A data processing method based on an image acquisition device, characterized in that it includes:

采集图像采集设备的多模态图像并进行预处理；Acquire multimodal images from an image acquisition device and perform preprocessing;

所述多模态图像包括RGB图像和红外图像；The multimodal image includes an RGB image and an infrared image;

将多模态图像整合为训练数据集，构建CNN-VAE生成多模态图像的低光图像，采用反注意模块和双判别器结构的GAN模型，结合并行扩展膨胀卷积扩充训练数据集；Integrate multimodal images into training datasets, build CNN-VAE to generate low-light images of multimodal images, use the GAN model with anti-attention module and dual discriminator structure, and combine parallel expansion and dilated convolution to expand the training dataset;

通过双分支U-Net对训练数据集中的图像进行增强，利用局部方差自适应高斯滤波增强图像细节，通过支持向量机分类像素梯度并进行边缘检测；The images in the training dataset are enhanced by using a dual-branch U-Net, the image details are enhanced by using a local variance adaptive Gaussian filter, and the pixel gradients are classified and edge detection is performed by a support vector machine.

采用ResNet50提取特征，结合RPN与非最大抑制精准定位候选框，通过分类回归进行物体检测识别并进行可视化展示；ResNet50 is used to extract features, and RPN and non-maximum suppression are combined to accurately locate candidate boxes. Object detection and recognition are performed through classification regression and visualized.

将所有数据存储至数据库并进行管理。All data is stored and managed in the database.

2.如权利要求1所述的基于图像采集设备的数据处理方法，其特征在于：所述构建CNN-VAE生成多模态图像的低光图像，采用反注意模块和双判别器结构的GAN模型，结合并行扩展膨胀卷积扩充训练数据集包括：2. The data processing method based on the image acquisition device according to claim 1 is characterized in that: the construction of the CNN-VAE to generate the low-light image of the multimodal image, the use of the GAN model with an anti-attention module and a dual discriminator structure, combined with the parallel expansion of the dilated convolution to expand the training data set includes:

使用随机旋转模拟预处理后的图像的不同方向，通过直方图均衡化对多模态图像进行亮度调整并整合为训练数据集，通过变分自编码器将训练数据集映射到潜在空间，从潜在空间中采样得到潜在空间数据，通过最大化变分下界，使变分自编码器学习潜在空间的分布并通过最小化损失函数生成低光图像，将生成的低光图像作为新的样本扩展训练数据集；Use random rotation to simulate different directions of the preprocessed image, adjust the brightness of the multimodal image through histogram equalization and integrate it into the training data set, map the training data set to the latent space through the variational autoencoder, and sample the latent space data from the latent space , by maximizing the variational lower bound, the variational autoencoder learns the distribution of the latent space and generates low-light images by minimizing the loss function, and the generated low-light images are used as new samples to expand the training dataset;

使用生成对抗网络扩展训练数据集，定义反注意模块，构建双判别器结构提取图像的全局特征和局部特征并添加到生成器中的每一层，使用全局平均池化对训练数据集中的多模态图像的空间信息进行编码，通过Sigmoid激活函数进行非线性变换，生成反注意力图并通过进行反操作抑制不需要的区域，生成图像；Use the generative adversarial network to expand the training dataset, define the anti-attention module, build a dual discriminator structure to extract the global and local features of the image and add them to each layer in the generator, use global average pooling to encode the spatial information of the multimodal images in the training dataset, perform nonlinear transformation through the Sigmoid activation function, generate the anti-attention map and suppress unnecessary areas by performing the reverse operation to generate the image;

在生成对抗网络中集成并行扩展卷积模块，通过并行使用不同膨胀率的卷积拓展生成对抗网络的感受野并进行图像增强，输出最终增强数据集。A parallel extended convolution module is integrated into the generative adversarial network. By using convolutions with different dilation rates in parallel, the receptive field of the generative adversarial network is expanded and image enhancement is performed, and the final enhanced data set is output.

3.如权利要求2所述的基于图像采集设备的数据处理方法，其特征在于：所述构建双判别器结构提取图像的全局特征和局部特征包括：3. The data processing method based on the image acquisition device according to claim 2, characterized in that: the step of constructing a dual discriminator structure to extract the global features and local features of the image comprises:

反注意模块提取图像的全局特征和局部特征，根据全局特征与局部特征的重要性，调整特征通道的权重，采用PatchGAN作为局部判别器，并结合全局判别器进行训练，将真实图像和生成图像用于训练判别器，计算判别器损失，使用Adam优化器训练模型至收敛。The anti-attention module extracts the global and local features of the image, adjusts the weights of the feature channels according to the importance of the global and local features, uses PatchGAN as the local discriminator, and combines it with the global discriminator for training. The real image and the generated image are used to train the discriminator, calculate the discriminator loss, and use the Adam optimizer to train the model until convergence.

4.如权利要求1所述的基于图像采集设备的数据处理方法，其特征在于：所述通过双分支U-Net对训练数据集中的图像进行增强包括：4. The data processing method based on the image acquisition device according to claim 1, characterized in that: said enhancing the images in the training data set by using the dual-branch U-Net comprises:

使用U-Net架构作为联合学习框架并为每个卷积层集成反注意模块，将最终增强数据集中的RGB图像和红外图像分别输入两个并行U-Net分支，生成增强图像，通过跳跃连接保留图像的局部特征，使用联合损失函数更新U-Net的参数，定义优化器，计算梯度并更新，优化U-Net模型，得到最终的增强图像。The U-Net architecture is used as the joint learning framework and the anti-attention module is integrated for each convolutional layer. The RGB images and infrared images in the final enhanced dataset are respectively input into two parallel U-Net branches to generate enhanced images. The local features of the image are retained through skip connections. The parameters of U-Net are updated using the joint loss function. The optimizer is defined, the gradient is calculated and updated, and the U-Net model is optimized to obtain the final enhanced image.

5.如权利要求1所述的基于图像采集设备的数据处理方法，其特征在于：所述利用局部方差自适应高斯滤波增强图像细节，通过支持向量机分类像素梯度并进行边缘检测包括：5. The data processing method based on the image acquisition device according to claim 1, characterized in that: the method of enhancing image details by using local variance adaptive Gaussian filtering, classifying pixel gradients by a support vector machine and performing edge detection comprises:

将增强图像转换为灰度图像，对灰度图像进行固定大小的分块处理，获得多个局部子区域，计算每个局部子区域内的局部灰度方差值并将所有局部子区域的灰度方差值作为一个一维分布，通过计算灰度图像的灰度直方图和Otsu算法公式确定最优灰度方差阈值；The enhanced image is converted into a grayscale image, and the grayscale image is divided into fixed-size blocks to obtain multiple local sub-regions. The local grayscale variance value in each local sub-region is calculated and the grayscale variance values of all local sub-regions are used as a one-dimensional distribution. The optimal grayscale variance threshold is determined by calculating the grayscale histogram of the grayscale image and the Otsu algorithm formula.

根据最优灰度方差阈值确定每个子区域唯一的局部高斯平滑窗口尺寸，对原图中每个对应区域内每个像素位置的灰度值执行确定性高斯滤波操作，使用滤波后的灰度值替换原灰度值，将各子区域滤波后按照原图位置确定性拼接回原始尺寸的完整图像，计算图像每个像素的梯度值，通过Canny边缘检测算法对梯度幅值和方向计算，对每个像素进行边缘检测；Determine the unique local Gaussian smoothing window size for each sub-region according to the optimal grayscale variance threshold, perform deterministic Gaussian filtering on the grayscale value of each pixel position in each corresponding region of the original image, replace the original grayscale value with the filtered grayscale value, and deterministically splice each sub-region back to the original size of the complete image according to the original image position after filtering. Calculate the gradient value of each pixel in the image, calculate the gradient amplitude and direction through the Canny edge detection algorithm, and perform edge detection on each pixel;

选择支持向量机算法，使用RBF核作为核函数，从增强图像中提取每个像素的特征作为训练数据集的输入特征，标记输入特征中的每个像素的边缘检测情况，使用带有标记的训练数据集，通过最小化支持向量机的损失函数训练模型，通过使用梯度下降法，求解得到最优的超平面和偏置；Select the support vector machine algorithm, use the RBF kernel as the kernel function, extract the features of each pixel from the enhanced image as the input features of the training data set, mark the edge detection of each pixel in the input features, use the labeled training data set, train the model by minimizing the loss function of the support vector machine, and solve the optimal hyperplane and bias by using the gradient descent method;

对图像的每个像素的特征输入到训练好的支持向量机模型中进行分类，基于支持向量机模型的输出，对图像中每个像素的边缘概率进行调整。The features of each pixel of the image are input into the trained support vector machine model for classification, and the edge probability of each pixel in the image is adjusted based on the output of the support vector machine model.

6.如权利要求1所述的基于图像采集设备的数据处理方法，其特征在于：所述采用ResNet50提取特征，结合RPN与非最大抑制精准定位候选框，通过分类回归进行物体检测识别并进行可视化展示包括：6. The data processing method based on the image acquisition device according to claim 1 is characterized in that: the extraction of features by ResNet50, the precise positioning of candidate frames by combining RPN and non-maximum suppression, the object detection and recognition by classification regression and the visual display include:

使用预训练的ResNet50模型作为特征提取器；Use the pre-trained ResNet50 model as the feature extractor;

使用全局平均池化对每个通道的特征进行池化，得到一维特征向量，使用Faster R-CNN的Region Proposal Network生成物体候选框，使用非最大抑制算法从多个物体候选框中选择最优的候选框，使用Softmax分类器对每个候选框进行分类，通过设定阈值判断每个候选框的类别，对每个候选框进行回归操作，预测物体在框内的具体位置和尺寸，将物体检测和识别结果进行可视化，并生成最终输出。Use global average pooling to pool the features of each channel to obtain a one-dimensional feature vector, use Faster R-CNN's Region Proposal Network to generate object candidate boxes, use the non-maximum suppression algorithm to select the best candidate box from multiple object candidate boxes, use the Softmax classifier to classify each candidate box, determine the category of each candidate box by setting a threshold, perform regression on each candidate box, predict the specific position and size of the object in the box, visualize the object detection and recognition results, and generate the final output.

7.如权利要求1所述的基于图像采集设备的数据处理方法，其特征在于：所述将所有数据存储至数据库并进行管理包括：7. The data processing method based on the image acquisition device according to claim 1, characterized in that: storing and managing all data in the database comprises:

选择关系型数据库管理数据及关系分析结果，设计数据库表结构存储不同类型的数据，设置定期备份任务，备份数据库中的所有数据，对数据库用户进行权限管理静态数据进行加密存储。Select a relational database to manage data and relationship analysis results, design the database table structure to store different types of data, set up regular backup tasks, back up all data in the database, manage the permissions of database users, and encrypt and store static data.

8.基于权利要求1-7任一所述的基于图像采集设备的数据处理方法的基于图像采集设备的数据处理系统，其特征在于：包括：8. A data processing system based on an image acquisition device based on the data processing method based on an image acquisition device according to any one of claims 1 to 7, characterized in that it comprises:

数据采集模块，用于采集图像采集设备的多模态图像并进行预处理；A data acquisition module, used for acquiring multimodal images from an image acquisition device and performing preprocessing;

模型构建模块，用于将多模态图像整合为训练数据集，构建CNN-VAE生成多模态图像的低光图像，采用反注意模块和双判别器结构的GAN模型，结合并行扩展膨胀卷积扩充训练数据集；The model building module is used to integrate multimodal images into a training dataset, build a CNN-VAE to generate low-light images of multimodal images, and use a GAN model with an anti-attention module and a dual discriminator structure, combined with parallel expansion and dilated convolution to expand the training dataset;

图像增强模块，用于通过双分支U-Net对训练数据集中的图像进行增强；Image enhancement module, used to enhance images in the training dataset through a dual-branch U-Net;

边缘检测模块，用于利用局部方差自适应高斯滤波增强图像细节，通过支持向量机分类像素梯度并进行边缘检测；Edge detection module, used to enhance image details using local variance adaptive Gaussian filtering, classify pixel gradients and perform edge detection through support vector machine;

特征提取模块，用于采用ResNet50提取特征，结合RPN与非最大抑制精准定位候选框；Feature extraction module, which uses ResNet50 to extract features and combines RPN and non-maximum suppression to accurately locate candidate boxes;

目标识别模块，用于通过分类回归进行物体检测识别并进行可视化展示；The target recognition module is used to detect and identify objects through classification and regression and to visualize them;

数据存储模块，用于将所有数据存储至数据库并进行管理。The data storage module is used to store and manage all data in the database.

9.计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，其特征在于：所述处理器执行所述计算机程序时实现权利要求1~7任一所述的基于图像采集设备的数据处理方法的步骤。9. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, and wherein the processor implements the steps of the data processing method based on an image acquisition device according to any one of claims 1 to 7 when executing the computer program.

10.计算机可读存储介质，其上存储有计算机程序，其特征在于：所述计算机程序被处理器执行时实现权利要求1~7任一所述的基于图像采集设备的数据处理方法的步骤。10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the data processing method based on an image acquisition device according to any one of claims 1 to 7.