Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an image processing method and system based on artificial intelligence, which are used for solving the problems in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
in a first aspect, an embodiment of the present invention provides an image processing method based on artificial intelligence, including the steps of:
S1, performing preliminary extraction of image characteristic information based on an image input signal;
s2, optimizing image quality, removing noise and improving image definition based on the extracted primary characteristic information, and obtaining optimized image data;
S3, training a model and performing target detection by using the optimized image data, and identifying a target area in the image and providing an identifier;
S4, after target detection is completed, local feature alignment is carried out on the target area, and the target area is classified and marked according to semantic information;
S5, performing enhancement and restoration processing on the marked target region, and performing image enhancement and image restoration to obtain processed image data;
s6, customizing an output format of the processed image data according to the application scene.
In the step S1, preliminary feature extraction is performed on the image through a deep learning model of the convolutional neural network CNN, and low-level features in the image are identified, including edges, textures and color distribution, to form a preliminary image description;
In the primary extraction process, the convolution kernel is used for extracting local features of the image, dimension reduction is carried out by combining pooling operation, and space hierarchical structure information of the image is extracted.
In step S2, the adaptive filter is used to perform noise suppression, the adaptive filter dynamically adjusts parameters of the filter according to local features of the image, and suppresses noise of different types to different extents according to noise features of different areas in the image.
Further optimizing the technical scheme, in the step S3, a model trained by using the optimized image data is as follows:
;
Wherein,
Training a total loss function of the model;
training the number of samples;
coordinate sets of a real frame and a prediction frame;
The intersection ratio, measure the overlapping degree of the predicted frame and the real frame;
the loss weight coefficient is used for balancing position loss and category loss;
the total number of categories;
Sample ofIn categoryA real label on the card;
Sample ofIn categoryAnd predictive probability thereon.
Further optimizing the technical scheme, in the step S3, when performing target detection, a multi-scale target detection model is constructed, where the multi-scale target detection model is as follows:
;
Wherein,
The detected target set;
the overall reasoning function of the model;
inputting a feature map set of an image;
A weight parameter set of the model;
a multi-scale layer set representing resolution levels of different feature graphs;
For the firstA target detection function of the layer scale feature map;
First of allAnd the layer characteristic diagram and the corresponding weight.
Further optimizing the technical scheme, in the step S4, performing local feature alignment on the target area by using a local feature matching algorithm includes:
Detecting key points;
Extracting a feature descriptor;
Feature matching;
Estimating geometric transformation;
The targets are aligned.
Further optimizing the technical scheme, in the step S4, classifying and labeling the target area according to the semantic information by using the deep semantic network includes:
Generating an image description;
embedding semantic information;
refining and classifying;
Target labeling and relationship reasoning;
And outputting the label.
In step S5, an adaptive enhancement algorithm is used for image enhancement, an adaptive enhancement loss function is constructed, and adaptive enhancement is performed on the target area, wherein the adaptive enhancement loss function is as follows:
;
Wherein,
Self-adaptive enhancement total loss;
The number of target areas;
target areaIs a raw contrast value of (1);
target areaIs a post-enhancement contrast value of (2);
target areaIs a luminance value of the original;
target areaIs a luminance value after enhancement;
for the target areaA sharpening measure of (2);
weight coefficient, is used for balancing the optimization of contrast, luminance and sharpening degree;
Enhanced target area characteristics.
Further optimizing the technical scheme, in the step S5, the image restoration is performed by adopting a generation countermeasure network GAN, and the GAN generates and optimizes the image through a discriminator and a generator so as to ensure the naturalness and the authenticity of the target area;
the loss function between the arbiter and the generator generating the countermeasure network is designed as follows:
;
Wherein,
GAN total loss;
The discriminator is used for real imageIs judged by (1);
The image generated by the generator is based on the noise vector;
Real data distribution;
noise data distribution.
An image processing system based on artificial intelligence is constructed based on the image processing method based on artificial intelligence, and functional modules of the system comprise a preliminary feature extraction module, a data preprocessing module, a target area detection module, a target classification labeling module, an image enhancement restoration module and an image format conversion module.
In a second aspect, embodiments of the present invention provide a computer device, including a memory and a processor, where the memory stores a computer program, where the computer program instructions implement the steps of an artificial intelligence based image processing method and system according to the first aspect of the present invention when executed by the processor.
In a third aspect, embodiments of the present invention provide a computer readable storage medium having a computer program stored thereon, wherein the computer program instructions, when executed by a processor, implement the steps of an artificial intelligence based image processing method and system according to the first aspect of the present invention.
Compared with the prior art, the invention provides an image processing method and system based on artificial intelligence, which have the following beneficial effects:
According to the image processing method and system based on artificial intelligence, the efficient positioning, optimization and recovery of the image target area are realized by setting a self-adaptive image enhancement algorithm, target detection and feature extraction technology and generating an countermeasure network GAN to carry out image restoration. The method has the remarkable advantages of noise removal and target detection precision, and can avoid distortion caused by excessive processing by dynamically adjusting an enhancement strategy, thereby ensuring the prominence and natural fusion of a target region in an image. Compared with the traditional image processing technology, the method effectively improves the quality and detail recovery capability of the image, has stronger robustness and adaptability, can process more complex image scenes, and has wide application prospect.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Embodiment one:
referring to fig. 1 to 3, in a first embodiment of the present invention, an image processing method based on artificial intelligence is provided, which includes the following steps:
s1, performing preliminary extraction of image characteristic information based on an image input signal.
In the embodiment, the deep learning model of the convolutional neural network CNN is used for carrying out preliminary feature extraction on the image, and identifying low-level features in the image, including edges, textures and color distribution, to form preliminary image description, wherein the low-level features provide a basis for subsequent image enhancement, target detection and classification tasks.
In the primary extraction process, the convolution kernel is used for extracting local features of the image, dimension reduction is carried out by combining pooling operation, and space hierarchical structure information of the image is extracted. The feature extracted dataset provides not only rough contour information for image understanding, but also effective input for subsequent reasoning and recognition.
S2, optimizing image quality, removing noise and improving image definition based on the extracted primary characteristic information, and obtaining optimized image data.
In this embodiment, an adaptive filter is used to perform noise suppression, and the adaptive filter dynamically adjusts parameters of the filter according to local features of the image, and suppresses noise of different types to different extents according to noise features of different areas in the image.
The conventional fixed filter often has poor effect when processing complex scenes, and the adaptive filter can autonomously optimize the filtering process according to noise characteristics of different areas in the image. The method not only can reduce the influence of noise on the image quality, but also can reserve important detail information in the image, and avoid excessive fuzzy processing, thereby improving the precision of the subsequent image analysis task.
And S3, training a model and detecting a target by using the optimized image data, and identifying a target area in the image and providing an identifier.
In this embodiment, a model trained using the optimized image data is as follows:
;
Wherein,
Training a total loss function of the model;
training the number of samples;
coordinate sets of a real frame and a prediction frame;
The intersection ratio, measure the overlapping degree of the predicted frame and the real frame;
the loss weight coefficient is used for balancing position loss and category loss;
the total number of categories;
Sample ofIn categoryA real label on the card;
Sample ofIn categoryAnd predictive probability thereon.
The model aims to jointly optimize target location detection and class classification. First itemCalculating the error of the target position to ensure that the model can accurately predict the boundary frame of the object, and a second termIs a classification loss based on cross entropy for evaluating accuracy of class prediction. During the training process, through optimizationMinimizing, stepwise adjusting the weights of the depth network to enable it to learn the salient features of the target in the optimized data (step S2)
Further, when target detection is performed, a multi-scale target detection model is constructed, and the multi-scale target detection model is as follows:
;
Wherein,
The detected target set;
the overall reasoning function of the model;
inputting a feature map set of an image;
A weight parameter set of the model;
a multi-scale layer set representing resolution levels of different feature graphs;
For the firstA target detection function of the layer scale feature map;
First of allAnd the layer characteristic diagram and the corresponding weight.
The core of the model is multi-scale target detection,The function can detect the targets of the feature images under different resolutions, and the detection results are operated in a combined mode between different scalesFusion is carried out, and accurate detection of the size targets is ensured. Detection function of each layerExtracting region features by convolution operations, and filtering the repeated detection boxes in combination with non-maximum suppression (NMS) to generate a final target set。
The multi-scale design can effectively solve the problem of target scale difference. For example, small objects rely more on high resolution feature map detection, while large objects are more captured by low resolution feature maps. In actual use, the detection process is carried out step by step:
1. inputting optimized image features。
2. Sequentially processing the feature maps of each scale。
3. Combining detection results of all scales to generate a target set。
The model is first trained using a training model. And (3) inputting the image data optimized in the step (S2) into a training frame, and adjusting the model weight through a loss function so as to learn advanced features in the image.
And (3) inputting the feature map optimized in the step (S2) into a multi-scale target detection model to generate a plurality of resolution feature levels (such as capturing small targets with high resolution and capturing large targets with low resolution), and processing the feature maps with different scales layer by layer to generate candidate targets.
And S4, after target detection is completed, carrying out local feature alignment on the target area, and classifying and labeling the target area according to semantic information.
In this embodiment, local feature alignment is performed on the target area by using a local feature matching algorithm (e.g., SIFT, SURF), including:
Detecting key points;
first, key points in an image are detected by SIFT or SURF algorithms. Key points are typically areas in the image with significant local features, such as edges, corner points, where texture changes are significant.
Extracting a feature descriptor;
And extracting local feature descriptors from the area around each key point. SIFT and SURF generate descriptors (e.g., 128-dimensional vectors generated by SIFT and 64-dimensional vectors generated by SURF), respectively, which can effectively capture texture, color and shape information of a local region, with rotation and scale invariance.
Feature matching;
And matching key point descriptors in different images by using a nearest neighbor search algorithm (such as violent matching or k-d tree) to find the most similar characteristic point pairs. These matching point pairs will help to register the images at a later stage.
Estimating geometric transformation;
The geometric transformation (such as homography) between the images is estimated by matching pairs of keypoints and one of the images is transformed into the coordinate system of the other image. A common method includes RANSAC (random sample consensus algorithm) to reject mismatching points.
Aligning the targets;
after the geometric transformation, the target areas in the image will be precisely aligned. These aligned image regions may provide accurate input data for subsequent target recognition, classification, and labeling.
Classifying and labeling the target area according to semantic information by adopting a deep semantic network (such as BERT), and comprises the following steps:
Generating an image description;
After keypoint matching and target alignment of the image, the system generates a preliminary image description from the contextual information of the image (e.g., background around the target, spatial relationship between objects, etc.). This description may be a short sentence in natural language outlining the main objects in the image and their relationships.
Embedding semantic information;
The image description is input into a pre-trained semantic network (e.g., BERT). The deep semantic network will transform the input text into a higher level semantic representation through multiple levels of semantic understanding modules. Through contextual relationships, the network can understand the semantic connections between different objects.
Refining and classifying;
Based on the semantic representation of the deep semantic network, the system can refine and classify the targets in the image. That is, the model not only identifies the general class of the object (e.g., "car" or "person"), but also further describes the attributes of the object (e.g., "red car", "person sitting in a chair"), improving the accuracy and detail of the classification.
Target labeling and relationship reasoning;
After the refinement classification is completed, the deep semantic network can infer the relationship between targets. For example, the model may infer that a relationship between "person" and "car" is "standing aside" or "driving", further improving the semantic understanding level of the image.
Outputting labels;
Finally, the network outputs the identified targets and their attributes (e.g., category, location, relationship) in the form of text labels. This information may be used as a basis for further processing, presentation or storage.
S5, performing enhancement and restoration processing on the marked target region, and performing image enhancement and image restoration to obtain processed image data.
In this embodiment, the image enhancement adopts an adaptive enhancement algorithm, an adaptive enhancement loss function is constructed, and the adaptive enhancement is performed on the target area, where the adaptive enhancement loss function (AELF for short) is as follows:
;
Wherein,
Self-adaptive enhancement total loss;
The number of target areas;
target areaIs a raw contrast value of (1);
target areaIs a post-enhancement contrast value of (2);
target areaIs a luminance value of the original;
target areaIs a luminance value after enhancement;
for the target areaA sharpening measure of (2);
weight coefficient, is used for balancing the optimization of contrast, luminance and sharpening degree;
Enhanced target area characteristics.
AELF the loss function optimizes the enhancement effect mainly by three parts:
Contrast optimization first termAnd calculating the difference of the contrast of the target area, and ensuring that the contrast of the enhanced target area is not excessively distorted while being improved.
Luminance optimization, second termThe brightness of the target area is adjusted, so that the target can keep definition and adapt to complex background.
Sharpening optimization, third termFor enhancing the sharpness of the edge of the object. Details of the target region are enhanced by sharpening operations (e.g., laplace operator or sharpened convolution kernels in a deep learning model) so that the target is more prominent.
In the enhancement process, the weight coefficientAnd the method can be dynamically adjusted according to the specific content of the target area so as to avoid detail loss or information distortion caused by excessive processing.
The image restoration is carried out by adopting a generation countermeasure network GAN, and the GAN generates and optimizes the image through a discriminator and a generator so as to ensure the naturalness and the authenticity of a target area;
the loss function between the arbiter and the generator generating the countermeasure network is designed as follows:
;
Wherein,
GAN total loss;
The discriminator is used for real imageIs judged by (1);
The image generated by the generator is based on the noise vector;
Real data distribution;
noise data distribution.
Distinguishing deviceJudging input image by trainingWhether from the actual data distribution or the counterfeit data generated by the generator. A generatorBy accepting random noise vectorsAn image is generated that is as realistic as possible.
By passing throughThe generator can fill the missing details in the target area, and repair details and improve image quality in the enhancement process.
In this step, based on the detection and calibration in step S4, features of the target region in the image are extracted, including the boundary, brightness, contrast, sharpness, and the like of the target. These features will serve as inputs to the enhancement process.
The contrast, brightness and sharpness of the target area are optimized by AELF loss functions using an adaptive enhancement loss function. And (3) adaptively adjusting an enhancement strategy according to the local characteristics of the target area to ensure that the target detail is promoted and the background area is not influenced excessively.
After enhancement, the missing parts or details in the target area are further repaired using GAN generation countermeasure network. The generator generates missing details according to the characteristics of the target area, and the discriminator verifies the authenticity of the image to ensure the naturalness and consistency of the supplementary content of the target area.
Finally, the enhanced target area is optimized through a deep learning model, so that the target can be more prominent under a complex background, and the visual effect of the image and the usability of practical application are enhanced.
S6, customizing an output format of the processed image data according to the application scene.
In this embodiment, the image is intelligently optimized and output according to the target application scene. The method combines the specific requirements of application scenes, such as the image is required to be used in the fields of medical diagnosis, automatic driving, industrial detection and the like, and outputs images with different formats and resolutions. The storage format of the image (such as JPG, PNG, TIFF, etc.), and the compatibility and processing efficiency of the subsequent images are considered. The optimization algorithm can automatically select the best output parameters according to the specific requirements of the application, and perform final compression, format conversion and data storage on the image. The artificial intelligence is combined with the depth of a specific application scene, so that the processed image is ensured to exert the effect to the greatest extent in practical application.
Embodiment two:
Referring to fig. 4, in a second embodiment of the present invention, an image processing system based on artificial intelligence is provided, and an image processing method based on artificial intelligence according to the first embodiment is constructed, where functional modules of the system include a preliminary feature extraction module, a data preprocessing module, a target area detection module, a target classification labeling module, an image enhancement restoration module, and an image format conversion module.
The modules are correspondingly constructed based on the image processing method based on artificial intelligence according to the first embodiment, so as to realize the process of the method.
Embodiment III:
The embodiment also provides a computer device, which is suitable for the condition of the image processing method and system based on the artificial intelligence, and comprises a memory and a processor, wherein the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions, so that the image processing method and system based on the artificial intelligence provided by the embodiment are realized.
The present embodiment also provides a storage medium having a computer program stored thereon, which when executed by a processor, implements an image processing method and system based on artificial intelligence as set forth in the above embodiments.
The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.