CN116993982A

Movatterモバイル変換

Info

Publication number: CN116993982A
Application number: CN202310960057.5A
Authority: CN
Inventors: 郭耀光
Original assignee: China Telecom Intelligent Network Technology Co ltd
Current assignee: China Telecom Intelligent Network Technology Co ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-11-03

Abstract

The application discloses an infrared image segmentation method and an infrared image segmentation system. Wherein the method comprises the following steps: acquiring an infrared image to be segmented; extracting airspace characteristics of the infrared image by adopting an airspace neural network in the joint learning model; extracting frequency domain features of the infrared image by adopting a frequency domain neural network in the joint learning model; and dividing the infrared image according to the spatial domain characteristics and the frequency domain characteristics. The application solves the technical problems that noise influence exists and processing frequency information is limited when the infrared image segmentation is processed by the airspace-based deep learning method in the prior art.

Description

Infrared image segmentation method and system

Technical Field

The application relates to the field of image processing and artificial intelligence, in particular to an infrared image segmentation method and an infrared image segmentation system.

Background

A technique of dividing an image into several regions of similar internal features or properties, with significant differences between these regions, is called image segmentation. The image segmentation algorithm plays an important role in infrared target detection. At present, the infrared image segmentation technology is mainly divided into infrared image segmentation methods based on threshold values, region-based, edge-based and deep learning, and the methods have some limitations, such as insufficient segmentation when processing images with noise, limitation when processing frequency information, and the like.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides an infrared image segmentation method and an infrared image segmentation system, which at least solve the technical problems that noise influence exists and processing frequency information is limited when an infrared image segmentation is processed by a spatial domain-based deep learning method in the prior art.

According to an aspect of an embodiment of the present application, there is provided an infrared image segmentation method including: acquiring an infrared image to be segmented; extracting airspace characteristics of the infrared image by adopting an airspace neural network in the joint learning model; extracting frequency domain features of the infrared image by adopting a frequency domain neural network in the joint learning model; and dividing the infrared image according to the spatial domain characteristics and the frequency domain characteristics.

Optionally, before the spatial domain neural network in the joint learning model is used for extracting the spatial domain characteristics of the infrared image, the method further comprises: a joint learning model is established by: filtering the sample image in the airspace to obtain an airspace image; training a first initial neural network by adopting an airspace image to obtain an airspace neural network, wherein sample data are infrared images; performing frequency domain conversion on the sample image to obtain a frequency domain image; training a second initial neural network by adopting the frequency domain image to obtain a frequency domain neural network; and performing joint training on the space domain neural network and the frequency domain neural network to obtain a joint learning model.

Optionally, the spatial domain neural network includes: the system comprises a plurality of encoders and decoders, wherein the encoders are used for carrying out weighting processing on a space domain image to obtain a first weighted feature map; and processing the first weighted feature map by adopting a decoder to obtain a second weighted feature map, and taking the second weighted feature map as a spatial domain feature.

Optionally, the weighting processing is performed on the space domain image by using an encoder to obtain a first weighted feature map, including: extracting an original airspace feature map of an airspace image by adopting a batch normalization processing layer and a linear rectification function layer in an encoder; carrying out average pooling or maximum pooling operation on the original airspace feature map by adopting a first global pooling layer in the encoder to obtain global statistical information of each channel; carrying out convolution processing on global statistical information by adopting a convolution layer in an encoder to obtain a first attention weight; determining a first weighted feature map according to the first attention weight and the original feature map;

optionally, processing the first weighted feature map with a decoder to obtain a second weighted feature map includes: up-sampling the first weighted feature map by adopting an deconvolution layer in the decoder to obtain a deconvolution feature map; processing the deconvolution feature map sequentially through a second global pooling layer, a second convolution layer and a second activation layer to obtain an attention feature map; and determining a second weighted feature map according to the attention feature map and the deconvolution feature map, and taking the second weighted feature map as a spatial domain feature.

Optionally, in the process of performing joint training on the air domain neural network and the frequency domain neural network, the method further includes: determining a first loss function and a second loss function which correspond to the airspace neural network and the frequency domain neural network respectively; and determining a joint loss function of the joint learning model according to the first loss function and the second loss function, and training the joint learning model according to the joint loss function.

According to another aspect of the embodiment of the present application, there is also provided a model training method, including: filtering the sample image in the airspace to obtain an airspace image; training a first initial neural network by adopting an airspace image to obtain an airspace neural network, wherein sample data are infrared images; performing frequency domain conversion on the sample image to obtain a frequency domain image; training a second initial neural network by adopting the frequency domain image to obtain a frequency domain neural network; and performing joint training on the space domain neural network and the frequency domain neural network to obtain a joint learning model.

According to another aspect of the embodiment of the present application, there is also provided an infrared image segmentation apparatus including: the first processing module is used for continuously collecting the airspace and frequency domain information of the target image in the process of collecting the target image information and carrying out image noise reduction processing; the second processing module is used for extracting the characteristics of the target image data of the mobile phone, including airspace characteristic extraction and frequency domain characteristic extraction; and the third processing module is used for jointly training the space domain characteristics and the frequency domain characteristics into a parameter model and carrying out infrared image segmentation by using the trained model.

According to another aspect of the embodiment of the present application, there is further provided a non-volatile storage medium, in which a program is stored, where the device in which the non-volatile storage medium is controlled to execute any other method of dividing an infrared image when the program runs.

According to another aspect of the embodiment of the present application, there is also provided an electronic device, including: the system comprises a memory and a processor, wherein the processor is used for running a program stored in the memory, and the program runs to execute any other infrared image segmentation method.

In the embodiment of the application, an improved infrared segmentation method and system are adopted, and the purpose of inhibiting the noise influence of an infrared image is achieved by combining the spatial domain characteristics and the frequency domain characteristics of the image, so that the technical effects of improving the accuracy and stability of the infrared image segmentation are realized, and the technical problems that the noise influence exists and the processing frequency information is limited when the infrared image segmentation is processed by a spatial domain-based deep learning method in the prior art are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of an alternative computer architecture according to an embodiment of the application;

FIG. 2 is a flow chart of an alternative infrared image segmentation method in accordance with an embodiment of the present application;

FIG. 3 is an alternative spatial network overall architecture diagram according to an embodiment of the present application;

FIG. 4 is an alternative frequency domain network overall architecture diagram according to an embodiment of the present application;

FIG. 5 is a flowchart of an alternative model training method according to an embodiment of the present application;

fig. 6 is a block diagram of an alternative infrared image segmentation apparatus process in accordance with an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to better understand the embodiments of the present application, technical terms related to the embodiments of the present application are explained as follows:

neural network model: mathematical methods for simulating human actual neural networks have been slowly used to refer to such artificial neural networks directly as neural networks. The neural network has wide and attractive prospect in the fields of system identification, pattern identification, intelligent control and the like, particularly in the intelligent control, people are particularly interested in the self-learning function of the neural network, and the important characteristic of the neural network is regarded as one of key keys for solving the problem of the adaptability of the controller in the automatic control.

Airspace: the spatial domain (spatial domain) is also called the spatial domain, i.e. the pixel domain, and processing in the spatial domain is processing at the pixel level, such as image superimposition at the pixel level. After fourier transformation, the spectrum of the image is obtained. Representing the energy gradient of the image.

Frequency domain: any waveform in the frequency domain (frequency domain) can be decomposed into a sum of a plurality of sine waves. Each sine wave has its own frequency and amplitude. Any one waveform signal has its own set of frequency and amplitude. The frequency domain is the signal whose spatial domain has undergone fourier transformation.

Loss function: the loss function (loss function) is an operation function for measuring the difference degree between the predicted value f (x) and the true value Y of the model, and is a non-negative real value function, generally expressed by L (Y, f (x)), and the smaller the loss function is, the better the robustness of the model is.

In accordance with an embodiment of the present application, an infrared image segmentation method embodiment is provided, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer-executable instructions, and, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

The method embodiments provided by the embodiments of the present application may be performed in a mobile terminal, a computer terminal, or similar computing device. Fig. 1 shows a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an infrared image segmentation method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more processors 102 (shown as 102a, 102b, … …,102 n) which may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA, a memory 104 for storing data, and a transmission module 106 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuits described above may be referred to generally herein as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or mobile device). As referred to in embodiments of the application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the infrared image segmentation method in the embodiment of the present application, and the processor 102 executes the software programs and modules stored in the memory 104, thereby executing various functional applications and data processing, that is, implementing the above-mentioned vulnerability detection method of application program. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

In the above operating environment, the embodiment of the present application provides an infrared image segmentation method, as shown in fig. 2, including the following steps:

step S202, acquiring an infrared image to be segmented;

the method for acquiring the infrared image comprises the following steps:

1. using an infrared camera: an infrared camera is a device dedicated to capturing infrared images. It can convert infrared radiation into visible light signals which are then captured and recorded by an image sensor. Specialized infrared cameras may be purchased or leased to acquire infrared images.

2. An infrared filter was used: the infrared filter is arranged on a common digital camera or a mobile phone camera, can filter visible light signals and only retains infrared radiation. The image thus taken is not as clear as a professional infrared camera, but can also achieve a certain degree of infrared effect.

3. Using a thermal infrared imager: a thermal infrared imager is a device that can display the temperature distribution of the surface of an object. It can convert the infrared radiation emitted by the object into a heat map and display it on the screen. An infrared image of the surface of the object can be obtained by the thermal infrared imager.

4. Using an infrared sensor: some electronic devices have been equipped with infrared sensors that can be used to acquire infrared images. For example, an infrared sensor in a smart phone may be used to capture infrared images, but needs to be implemented using a specific application.

In the scheme provided in step S202, it should be noted that, after the infrared image to be segmented is acquired, image data processing is required, where the data processing includes processing the infrared image to be detected in the spatial domain and the frequency domain layers.

Specifically, the infrared image to be detected is processed on the airspace level, gaussian noise and background noise are eliminated through filtering operation, and the processed and denoised airspace image is obtained. The spatial domain is a pixel domain, and usually some characteristics with little difference in the infrared image are difficult to identify and acquire, so that the characteristics which are difficult to identify are highlighted by using the gray level difference of the pixel domain by adopting a filtering method, and the interference of surrounding noise is removed. According to an embodiment of the present application, the filtering method may be gaussian filtering, guided filtering, bilateral filtering, non-local mean filtering, and so on.

Specifically, the infrared image to be measured is processed on the frequency domain level, the airspace image obtained after the airspace level processing is transformed, and the corresponding frequency domain image is obtained from the airspace image. According to the specific embodiment of the application, the spatial image is subjected to Fourier transformation to obtain a corresponding spatial image frequency spectrum, namely a frequency domain image.

Step S204, extracting the airspace characteristics of the infrared image by adopting an airspace neural network in the joint learning model;

in the solution provided in step S204, it should be noted that, before the spatial neural network in the joint learning model is used to extract the spatial features of the infrared image, the joint learning model needs to be established. Training a first initial neural network by adopting the airspace image obtained in the step S202 to obtain the airspace neural network; training the second initial neural network by adopting the frequency domain image converted from the hollow domain image in the step S202 to obtain a frequency domain neural network; and performing joint training on the space domain neural network and the frequency domain neural network to obtain a joint learning model.

Specifically, spatial domain neural network in a joint learning model is adopted to extract spatial domain characteristics of an infrared image, wherein the spatial domain neural network in the joint learning model comprises an encoder and a decoder framework, the spatial domain image is weighted through the encoder to obtain a first weighted characteristic diagram, the first weighted characteristic diagram is processed through the decoder to obtain a second weighted characteristic diagram, and the second weighted characteristic diagram is used as the spatial domain characteristics.

Specifically, the spatial domain neural network adopts an encoder-decoder structure to reconstruct the original neural network. In a specific embodiment of the present application, the spatial domain neural network structure is shown in fig. 3, where the encoder divides the neural network into 4 blocks, each block includes five layers, namely a batch normalization processing layer (Batch Normalization, abbreviated as BN layer), a linear rectification function layer (Rectified Linear Unit, abbreviated as ReLU layer), a first global pooling layer, a 3*3 convolution layer and a sigmoid activation layer, and the encoder sequentially extracts features of different layers through each block; the decoder also divides the neural network into 4 blocks, each of which also contains five layers of a batch normalization layer, a linear rectification function layer, a first global pooling layer, a 3*3 convolution layer, and a sigmoid activation layer.

In the embodiment of the present application, the specific operations of the encoder and the decoder for extracting spatial features are as follows:

the encoder is adopted to carry out weighting processing on the space domain image, and the obtaining of the first weighted feature map comprises the following steps: extracting an original airspace feature map of an airspace image by adopting a batch normalization processing layer and a linear rectification function layer in an encoder; carrying out average pooling or maximum pooling operation on the original airspace feature map by adopting a first global pooling layer in an encoder to obtain global statistical information of each channel; carrying out convolution processing on global statistical information by adopting a convolution layer in an encoder to obtain a first attention weight; and determining a first weighted feature map according to the first attention weight and the original feature map.

Processing the first weighted feature map by a decoder to obtain a second weighted feature map, including: up-sampling the first weighted feature map by adopting an deconvolution layer in the decoder to obtain a deconvolution feature map; processing the deconvolution feature map sequentially through a second global pooling layer, the second convolution layer and a second activation layer to obtain an attention feature map; and determining a second weighted feature map according to the attention feature map and the deconvolution feature map, and taking the second weighted feature map as the airspace feature.

It should be noted that, in the process of performing the joint training on the spatial domain neural network and the frequency domain neural network, determining a first loss function and a second loss function respectively corresponding to the spatial domain neural network and the frequency domain neural network, that is, the spatial domain neural network corresponds to the first loss function and the frequency domain neural network corresponds to the second loss function; and determining a joint loss function of the joint learning model according to the first loss function and the second loss function, and training the joint learning model according to the joint loss function.

The process of determining the joint loss function includes: determining a first loss function corresponding to the airspace neural network, wherein the first loss function consists of a loss function L1 and a loss function L2, the sensitivity of the loss function L1 to noise data is easy to be affected by an abnormal value, the sensitivity of the loss function L2 to the abnormal value is easy to be affected by the abnormal value, the loss function L1 and the loss function L2 are added with a certain weight to obtain the first loss function, and the weight can be adjusted to enable the first loss function to adapt to various different conditions; and determining a second loss function corresponding to the frequency domain neural network, and determining a joint loss function by combining the first loss function and the second loss function.

It should be noted that, the use of the loss function is in the training stage of the joint learning model, after each batch of training data is sent to the model, the predicted value is output through forward propagation, and then the loss function calculates the difference value between the predicted value and the actual value, that is, the loss value. After the loss value is obtained, the model updates each parameter through back propagation to reduce the loss between the true value and the predicted value, so that the predicted value generated by the model is close to the true value, the fluctuation of the loss value curve is smaller and smaller along with the continuous increase of training rounds, the model tends to be in a stable state, and the training is finished at the moment, so that the converged joint learning model is obtained.

Step S206, extracting frequency domain features of the infrared image by adopting a frequency domain neural network in the joint learning model;

specifically, before the frequency domain neural network in the joint learning model is adopted to extract the frequency domain characteristics of the infrared image, the joint learning model is built and trained; the frequency domain neural network structure is shown in fig. 4, and includes 3 residual blocks (complete_bn, complete_conv, complete_relu, collectively referred to as block), a 1*1 convolution layer, and a sigmoid activation layer. The flow of extracting frequency domain characteristics by the frequency domain neural network comprises the following steps: and inputting a frequency domain image, wherein the frequency domain image sequentially passes through three blocks, transmits frequency domain image information to a next convolution layer, and generates final frequency domain characteristics through the 1*1 convolution layer and the sigmoid activation layer.

Step S208, dividing the infrared image according to the airspace characteristics and the frequency domain characteristics;

after the target infrared image is segmented by adopting the joint learning model, post-processing is further needed to be carried out on the segmentation result, noise, filling holes and the like in segmentation are eliminated, accuracy and completeness of the segmentation result are improved, in addition, the segmentation result can be evaluated, and the segmentation performance of the joint learning model can be evaluated by using some evaluation indexes such as accuracy, recall rate and F1 value.

According to the embodiment of the application, the infrared image is segmented according to the spatial domain characteristics and the frequency domain characteristics, wherein the spatial domain image and the frequency domain image which are processed and converted by the infrared image to be detected are segmented according to the trained joint learning model combining the spatial domain neural network and the frequency domain neural network.

Through the steps, the characteristic information can be extracted more comprehensively, the spatial information of the image can be better reserved, details are prevented from being lost, and the method has strong robustness and reliability for cutting the infrared image with noise and complex background.

According to the embodiment of the application, the functions involved in the establishment process of the frequency domain neural network are required to be supplemented, wherein the functions involved in the establishment process of the frequency domain neural network are as follows:

1. batch normalization processing layer: wherein the method comprises the steps ofHaving real and imaginary parts, α and β being two parameters;

in the image processing, complex_bn (input) represents a Complex batch normalization (Batch Normalization) operation on the input image input. This operation has the following effects:

(1) Data normalization: the input image may be normalized by the Complex BN (input) such that the pixel value of the image is in a small range, typically between 0 and 1, or the mean is 0 and the variance is 1. This helps to improve the stability and training speed of the model.

(2) Acceleration convergence: by normalizing the input image, the complex_bn (input) can accelerate the convergence process of the deep neural network. Normalization can reduce the problems of gradient extinction and gradient explosion, making it easier for the network to learn the effective features.

(3) Improving generalization capability: complex_BN (input) can improve the generalization ability of the model by reducing the correlation between features, reducing the risk of overfitting. The normalization operation is helpful to make the model more robust, and can process the changes of different scales, illumination conditions and the like.

(4) Enhancing image contrast: the normalization operation can enhance the contrast of the image, so that the image is clearer and fuller. This helps to improve the effect of image processing tasks such as object detection, image classification, etc.

2. Linear rectification function layer:

Complex_Relu(input)＝Relu(input.real)+i(input.imag)

wherein, complex_Relu (input) represents Complex ReLU operation on a Complex input. The specific operation is that the real part of the input takes the ReLU value, namely input. Real takes the ReLU value, then the imaginary part of the input takes the imaginary value, namely input. Imag takes the imaginary value, and finally the obtained real part and the imaginary part are added together to obtain the final result. In summary, complex_relu (input) divides an input Complex number into a real part and an imaginary part, performs Relu operation on the real part, takes an imaginary value on the imaginary part, and then adds the obtained results to obtain a final Complex value.

In image processing, a complex number is generally used to represent a pixel value in an image. The pixel value may be divided into a real part and an imaginary part, wherein the real part represents the luminance of the pixel and the imaginary part represents the color of the pixel. The image can be processed non-linearly by the Complex_Relu function to enhance the feature expression capability and robustness of the image. Specifically, the Complex_Relu function may respectively perform nonlinear processing on luminance information and color information in an image to enhance contrast and detail of the image. By performing the Relu operation on the luminance information, all pixel values smaller than zero can be set to zero, thereby enhancing the light-dark contrast of the image. And the color information is subjected to the Relu operation, so that negative values in the color can be removed, and the color vividness of the image can be enhanced. In short, the Complex_Relu function can play roles in enhancing image characteristics and improving image quality in image processing.

3. Convolution layer:

convolution of complex numbers: including a Real part (Real) and an imaginary part (Image);

I_freq ＝Real+iImag

convolution kernel: w=x+ iY

Convolution process: w is I_freq ＝(X*Real-Y*Imag)+i(Y*Real+X*Imag)

Wherein the layer has the function of realizing filtering operation in the frequency domain when processing the image. The convolution function converts the input image into a frequency domain representation, where X and Y represent the real and imaginary parts of the input image, respectively. By convolving the frequency domain representation, filtering operations in the frequency domain, such as denoising, sharpening, edge detection, etc., may be performed on the image.

In particular, the convolution function may achieve different filtering effects by multiplying by a frequency domain filter (W). The selection of the filter W determines the response to different frequency components in the frequency domain, thereby achieving different processing effects on the image. By adjusting parameters of the filter, operations such as smoothing, enhancing, edge detection and the like can be realized on the image.

In addition, a combination of Real and imaginary parts in the convolution function, i.e., (X Real-Y image) +i (Y real+x image), can be used to represent the amplitude and phase information of the image. In the frequency domain, the amplitude information indicates the intensities of the different frequency components in the image, and the phase information indicates the relative positions and correlations of the different frequency components in the image. By processing the amplitude and phase information, enhancement or suppression of specific frequency components of the image can be realized, thereby realizing different processing effects on the image.

The first loss function of the spatial domain neural network involved in the model training process is defined as follows, wherein the first loss function comprises a loss function L₁ And a loss function L₂ ：

Wherein Y is_i Represents true value, Z_i Representing the output values of the network allows for constraints that better strengthen the edge information;

wherein the method comprises the steps ofσ_y ，σ_z Mean and standard deviation of Y, Z are shown, respectively, cov (Y, Z) represents the covariance of Y, Z, M₁ ，M₂ Constant to prevent denominator from being 0;

L_spatial ＝L₁ +(1-λ)L₂

by combining loss functions L₁ And a loss function L₂ And obtaining a loss function of the spatial domain neural network, namely a first loss function.

L is the same as₁ The loss function refers to the absolute difference between the predicted value and the actual value, is insensitive to the abnormal value, and can effectively reduce the influence of the abnormal value on model training. However, the loss function L₁ Is sensitive to noise data and is susceptible to interference from noise data. Loss function L₂ Refers to the square difference between the predicted value and the actual value, which is relatively sensitive to outliers and is subject to outlier interference. However, L₂ The loss function is relatively insensitive to noise data and can effectively reduce the impact of noise data on model training.

By combining L₁ And L₂ The loss functions can comprehensively consider the advantages and disadvantages of the loss functions, so that the robustness and generalization capability of the model are improved. By adjusting the value of the parameter lambda, the weights of the two loss functions can be balanced, thereby obtaining better training results. When the value of λ approaches 1, the model is more prone to use L₁ A loss function; when the value of λ approaches 0, the model is more prone to use L₂ A loss function.

The second loss function of the frequency domain neural network involved in the model training process is defined as follows:

L_freq ＝BCE(Y.real,Z.real)+BCE(Y.imag,Z.imag)

determining a loss function of the frequency domain neural network model, namely a second loss function, by a BCE calculation method, wherein Y.real represents a real part of a true value, Y.imag represents an imaginary part of the true value, Z.real represents a real part of an output value of the network, and Z.imag represents an imaginary part of the output value of the network; it should be noted that the function of the loss function is used to measure the difference of spatial information between the generated image and the real image. The loss function is based on a Binary Cross Entropy (BCE) calculation method, which calculates differences in real and imaginary parts of the generated image and the real image, respectively. By calculating the difference between the real part and the imaginary part of the generated image and the real image, the degree of similarity between the generated image and the real image can be evaluated. This helps to optimize the generative model so that it can better generate images that resemble real images. In addition, the generation model is promoted to learn the spatial information of the real image, including the characteristics of outline, texture, shape and the like. By minimizing the loss function, the generated model may gradually approach the spatial information of the real image, thereby generating a more realistic and accurate image.

Combining the first loss function and the second loss function to obtain a combined loss function, combining a spatial domain neural network model obtained by training the combined loss function and a frequency domain neural network model to obtain a combined learning model, wherein the combined loss function is defined as follows:

L_total ＝L_spatial +L_freq

wherein L is_spatial As a first loss function, L_freq As a second loss function, L_total Is a joint loss function.

The embodiment of the application also provides a model training method, which comprises the following steps of: step S502, filtering the sample image in the airspace to obtain an airspace image; training a first initial neural network by adopting an airspace image to obtain an airspace neural network, wherein sample data are infrared images; step S504, performing frequency domain conversion on the sample image to obtain a frequency domain image; training a second initial neural network by adopting the frequency domain image to obtain a frequency domain neural network; and step S506, performing joint training on the air domain neural network and the frequency domain neural network to obtain a joint learning model.

Specifically, training a first initial neural network by adopting an airspace image to obtain the airspace neural network; training a second initial neural network by adopting a frequency domain image converted from a spatial domain image to obtain a frequency domain neural network; and performing joint training on the space domain neural network and the frequency domain neural network to obtain a joint learning model. The training adopts a loss function to train the input data, after the training data of each batch is sent into the model, a predicted value is output through forward propagation, and then the loss function calculates a difference value between the predicted value and a true value, namely the loss value. After obtaining the loss value, the model updates each parameter through back propagation to reduce the loss between the real value and the predicted value, so that the predicted value generated by the model is close to the real value.

The embodiment of the application provides an infrared image segmentation device, which comprises three processing modules, as shown in fig. 6, wherein a first processing module 60 is used for continuously acquiring airspace information of a target image in the process of acquiring the target image information, generating frequency domain information by changing the airspace information and performing image noise reduction processing; the second processing module 62 is configured to perform feature extraction on the target image data of the mobile phone, including airspace feature extraction and frequency domain feature extraction; and a third processing module 64, configured to jointly train the spatial domain features and the frequency domain features into a parametric model, and perform infrared image segmentation by using the trained model.

Note that each module in the above-described infrared image splitting apparatus may be a program module (for example, a set of program instructions for implementing a specific function), or may be a hardware module, and for the latter, it may be represented by the following form, but is not limited thereto: the expression forms of the modules are all a processor, or the functions of the modules are realized by one processor.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the related art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. An infrared image segmentation method, characterized in that the infrared image segmentation method comprises:

acquiring an infrared image to be segmented;

extracting the airspace characteristics of the infrared image by adopting an airspace neural network in a joint learning model;

extracting frequency domain features of the infrared image by adopting a frequency domain neural network in the joint learning model;

and dividing the infrared image according to the spatial domain features and the frequency domain features.

2. The method of claim 1, further comprising, prior to extracting spatial features of the infrared image using spatial neural networks in a joint learning model:

the joint learning model is established by:

filtering the sample image in the airspace to obtain an airspace image; training a first initial neural network by adopting the airspace image to obtain the airspace neural network, wherein the sample image is an infrared image;

performing frequency domain conversion on the sample image to obtain a frequency domain image; training a second initial neural network by adopting the frequency domain image to obtain the frequency domain neural network;

and performing joint training on the airspace neural network and the frequency domain neural network to obtain the joint learning model.

3. The method of claim 2, wherein the spatial domain neural network comprises: a plurality of encoders and decoders, wherein,

the encoder is adopted to carry out weighting processing on the airspace image, and a first weighted feature diagram is obtained;

and processing the first weighted feature map by adopting the decoder to obtain a second weighted feature map, and taking the second weighted feature map as the spatial domain feature.

4. A method according to claim 3, wherein weighting the spatial domain image with the encoder results in a first weighted feature map comprising:

extracting an original airspace feature map of an airspace image by adopting a batch normalization processing layer and a linear rectification function layer in the encoder; carrying out average pooling or maximum pooling operation on the original airspace feature map by adopting a first global pooling layer in the encoder to obtain global statistical information of each channel; carrying out convolution processing on the global statistical information by adopting a convolution layer in the encoder to obtain a first attention weight; and determining a first weighted feature map according to the first attention weight and the original airspace feature map.

5. A method according to claim 3, wherein processing the first weighted feature map with the decoder results in a second weighted feature map, comprising:

up-sampling the first weighted feature map by adopting an deconvolution layer in the decoder to obtain a deconvolution feature map; processing the deconvolution feature map sequentially through a second global pooling layer, a second convolution layer and a second activation layer to obtain an attention feature map; and determining a second weighted feature map according to the attention feature map and the deconvolution feature map, and taking the second weighted feature map as the airspace feature.

6. The method of claim 2, wherein during the joint training of the spatial domain neural network and the frequency domain neural network, the method further comprises:

determining a first loss function and a second loss function which correspond to the airspace neural network and the frequency domain neural network respectively;

and determining a joint loss function of the joint learning model according to the first loss function and the second loss function, and training the joint learning model according to the joint loss function.

7. A model training method, characterized in that the model training method comprises:

filtering the sample image in the airspace to obtain an airspace image; training the first initial neural network by adopting the airspace image to obtain an airspace neural network, wherein the sample image is an infrared image;

performing frequency domain conversion on the sample image to obtain a frequency domain image; training a second initial neural network by adopting the frequency domain image to obtain a frequency domain neural network;

and performing joint training on the airspace neural network and the frequency domain neural network to obtain a joint learning model.

8. An infrared image segmentation apparatus, comprising:

the first processing module is used for continuously collecting the airspace information of the target image in the process of collecting the target image information, generating frequency domain information by changing the airspace information, and carrying out image noise reduction processing;

the second processing module is used for extracting the characteristics of the target image data of the mobile phone, including airspace characteristic extraction and frequency domain characteristic extraction;

and the third processing module is used for jointly training the space domain characteristics and the frequency domain characteristics into a parameter model and carrying out infrared image segmentation by using the trained model.

9. A non-volatile storage medium, wherein a program is stored in the non-volatile storage medium, and wherein the program, when executed, controls a device in which the non-volatile storage medium is located to perform the infrared image segmentation method of any one of claims 1 to 6.

10. An electronic device, comprising: a memory and a processor for executing a program stored in the memory, wherein the program is executed to perform the infrared image segmentation method of any one of claims 1 to 6.