Movatterモバイル変換


[0]ホーム

URL:


CN119110084A - A high compression ratio image compression method based on optimal transmission mapping - Google Patents

A high compression ratio image compression method based on optimal transmission mapping
Download PDF

Info

Publication number
CN119110084A
CN119110084ACN202411210997.3ACN202411210997ACN119110084ACN 119110084 ACN119110084 ACN 119110084ACN 202411210997 ACN202411210997 ACN 202411210997ACN 119110084 ACN119110084 ACN 119110084A
Authority
CN
China
Prior art keywords
image
mapping
layer
optimal transmission
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411210997.3A
Other languages
Chinese (zh)
Inventor
章敏
蓝锦青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Higher Research Institute Of Shanghai Zhejiang University
Original Assignee
Higher Research Institute Of Shanghai Zhejiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Higher Research Institute Of Shanghai Zhejiang UniversityfiledCriticalHigher Research Institute Of Shanghai Zhejiang University
Priority to CN202411210997.3ApriorityCriticalpatent/CN119110084A/en
Publication of CN119110084ApublicationCriticalpatent/CN119110084A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于最优传输映射的高压缩率图像压缩方法,适用于网络传输、图像存储和图像传输等应用场景,能够显著提高图像传输效率,节约网络带宽和存储资源。本申请可以基于生成式对抗学习或扩散模型进行压缩,前者利用生成式对抗学习对基于最优传输映射的图像压缩模型进行训练。该模型包含特征提取层、映射层、量化编码层和解码重构层。映射层通过基于凸映射的最优传输映射实现图像特征的映射和优化;后者的图像压缩模型同样模型包含特征提取层、映射层、量化编码层和解码重构层,特征提取层负责将预处理后的图像编码为低维特征表示,通过扩散模型的去噪网络,将量化后的特征还原为图像特征,并将其解码为接近原始图像的高质量压缩图像。

The present invention discloses a high-compression image compression method based on optimal transmission mapping, which is suitable for application scenarios such as network transmission, image storage and image transmission, and can significantly improve image transmission efficiency and save network bandwidth and storage resources. The present application can be compressed based on generative adversarial learning or a diffusion model. The former uses generative adversarial learning to train an image compression model based on an optimal transmission mapping. The model includes a feature extraction layer, a mapping layer, a quantization coding layer and a decoding and reconstruction layer. The mapping layer realizes the mapping and optimization of image features through an optimal transmission mapping based on a convex mapping; the image compression model of the latter also includes a feature extraction layer, a mapping layer, a quantization coding layer and a decoding and reconstruction layer. The feature extraction layer is responsible for encoding the preprocessed image into a low-dimensional feature representation, and through the denoising network of the diffusion model, the quantized features are restored to image features, and decoded into a high-quality compressed image close to the original image.

Description

High-compression-rate image compression method based on optimal transmission mapping
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a high-compression-rate image compression method based on optimal transmission mapping.
Background
Image compression is a key technique for reducing the size of image files to save storage space and reduce transmission costs. Common image compression methods include lossy compression and lossless compression. Lossy compression achieves higher compression rates by sacrificing a degree of image quality, while lossless compression maintains the integrity of the image, but the compression rate is relatively low.
The development of high compression rate image compression methods aims at achieving higher compression rates, i.e. reducing the size of image files while maintaining image quality. The methods utilize redundant information existing in the image or perform proper transformation on the image by optimizing coding algorithm, data compression technology or image transformation and other means, thereby improving the compression rate. The research and application of the high-compression-rate image compression method have important significance in the fields of digital image processing, computer vision, image transmission and the like.
Rate distortion optimization is a key issue in image compression. Rate-distortion optimization aims at minimizing the distortion of the compressed image by optimizing the coding scheme or algorithm for a given compression rate. Distortion can be understood as a loss of image quality, so the goal of rate-distortion optimization is to minimize the degree of distortion of the image while maintaining the compression rate requirements. This involves the study of image quality assessment and optimization algorithms to balance the trade-off between compression rate and image quality.
In the current information age, the generation and transmission of images is becoming more and more common and frequent. From digital photography to medical imaging, from video transmission to social media sharing, there is an increasing need for image compression. Conventional compression methods have achieved some achievement, but with the rise of high definition images and large-scale image data, the demand for higher compression rates and better image quality has also grown.
Based on the above-mentioned demands for higher compression rate and better image quality, the prior art has the following technical problems:
[1] Distortion problems conventional image compression methods such as JPEG introduce significant distortion at high compression rates, particularly with respect to preserving detail and edges. Such distortion is unacceptable for certain application scenarios, such as medical images and satellite images. Therefore, the technical problem to be solved by the invention is how to realize high compression rate and reduce the distortion degree of the image at the same time, thereby improving the image quality.
[2] The tradeoff between compression rate and image quality existing image compression methods typically have a trade-off relationship between compression rate and image quality. Higher compression rates tend to be accompanied by lower image quality, while better image quality may sacrifice some compression rate. The invention aims to solve the technical problem of maintaining better image quality while achieving high compression rate and realizing high-quality image compression effect.
[3] Processing speed and computational complexity some existing image compression methods may require complex computation and a large amount of computational resources, resulting in long compression time, which is unfavorable for real-time application or large-scale image processing. Therefore, the technical problem to be solved by the invention is how to increase the processing speed of image compression and reduce the computational complexity so as to adapt to the requirements of practical application.
[4] Applicability of application fields the existing image compression method has different applicability in different application fields. Some methods may perform well on certain fields or types of images, but may perform poorly on other fields or types of images. The invention aims to solve the technical problem of how to design a general high-compression-rate image compression method which is suitable for various application fields and various types of images.
Disclosure of Invention
An object of an embodiment of the present application is to provide a high compression rate image compression method based on an optimal transmission map, so as to solve the above-mentioned problems in the related art.
According to a first aspect of an embodiment of the present application, there is provided a high compression rate image compression method based on an optimal transmission map, including:
acquiring an original image to be compressed and preprocessing the original image;
training an image compression model based on optimal transmission mapping by adopting a mode of generating an anti-learning or diffusion model based on the preprocessed original image, wherein the image compression model based on the optimal transmission mapping comprises a feature extraction layer, a mapping layer, a quantization coding layer and a decoding reconstruction layer, and the mapping layer adopts the optimal transmission mapping based on convex mapping to realize feature mapping;
and acquiring an image to be compressed, and compressing the image by using a trained image compression model.
Further, the preprocessing includes denoising, image enhancement, and color space conversion.
Further, in the image compression model based on the optimal transmission mapping:
The feature extraction layer is used for extracting features of the preprocessed image;
The mapping layer is used for mapping the extracted image features onto feature space distribution based on optimal transmission mapping;
The quantization coding layer is used for quantizing and coding the mapped characteristics;
the decoding and reconstructing layer is used for decoding and reconstructing the coded features to obtain a compressed image.
Further, in the mapping layer, a transmission matrix is constructed by utilizing convex mapping, and the transmission matrix is optimized, wherein the optimization problem is as follows:
where λ is a regularization parameter, c (x, y) represents a transmission cost from point x to point y, x and y are points in the source and target feature spaces, respectively, T (x, y) represents a transmission amount from x to y, and N and M represent the number of samples of the source and target distributions, respectively.
Further, in the decoding reconstruction layer, decoding reconstruction is realized by a denoising network of a diffusion model or performing inverse operation of a mapping layer and a quantized coding layer.
Further, the training process of the image compression model based on the optimal transmission mapping comprises the following steps:
S21, constructing an initial transmission matrix;
S22, a feature extraction layer performs feature extraction on the preprocessed original image, a mapping layer performs optimization of a transmission matrix by using the extracted features, maps the features by using an optimal transmission matrix obtained by optimization, and a quantization coding layer performs quantization and coding on the mapped features by using the optimal transmission matrix obtained by optimization to obtain a compressed image feature representation, and a decoding reconstruction layer performs decoding reconstruction on the compressed image feature representation to obtain a compressed image;
S23, calculating training loss based on the compressed image and the response preprocessed original image, and updating parameters of the image compression model through a back propagation algorithm;
And S24, repeating the steps S22-S23 until the model converges.
Further, in step S23, if the decoding reconstruction layer adopts the inverse operations of the mapping layer and the quantized coding layer, training is performed by means of generating type countermeasure learning, the generator performs countermeasure training with the image compression model and the discriminator, calculates training loss by using the countermeasure loss, and updates model parameters by using a counter propagation algorithm;
If the decoding reconstruction layer adopts a denoising network of the diffusion model, adding noise to the compressed image characteristic representation through the forward process of the diffusion model, denoising through the reverse process of the diffusion model, calculating training loss based on the mean square error of a denoising target, and updating model parameters by using a reverse propagation algorithm.
According to a second aspect of an embodiment of the present application, there is provided a high compression rate image compression apparatus based on an optimal transmission map, including:
The preprocessing module is used for acquiring an original image to be compressed and preprocessing the original image;
the training module is used for training an image compression model based on optimal transmission mapping in a mode of generating an anti-learning or diffusion model based on the preprocessed original image, wherein the image compression model based on the optimal transmission mapping comprises a feature extraction layer, a mapping layer, a quantization coding layer and a decoding reconstruction layer, and the mapping layer realizes feature mapping by adopting the optimal transmission mapping based on convex mapping;
and the compression module is used for acquiring an image to be compressed and compressing the image by using the trained image compression model.
According to a third aspect of an embodiment of the present application, there is provided an electronic apparatus including:
one or more processors;
A memory for storing one or more programs;
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.
According to a fourth aspect of embodiments of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method according to the first aspect.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
[1] the invention adopts the technical scheme based on the optimal transmission mapping, and realizes the feature mapping by optimizing the transmission matrix, so that the image can realize higher compression rate while keeping higher quality. Compared with the traditional method, the method and the device can obviously reduce the data volume of the compressed image and improve the compression efficiency of the image.
[2] The invention can effectively maintain the visual quality and detail information of the image under high compression rate through the steps of preprocessing, feature extraction, reconstruction and the like. The application of the optimal transmission mapping enables the reconstructed image to have better visual effect and image fidelity, and compared with the traditional compression method, the method can reduce the problems of image distortion, artifacts, blocking effect and the like, and provide a clearer and truer compressed image.
[3] The invention utilizes the optimal transmission mapping principle to carry out feature mapping and carries out quantization coding on the mapped features, thereby reducing redundancy and repeatability of image data in the transmission and storage processes. Compared with the traditional method, the method can obviously improve the image transmission efficiency, save network bandwidth and storage resources, and is suitable for application scenes such as network transmission, image storage, image transmission and the like.
[4] The technical scheme of the invention can adapt to images with different types and characteristics, and has certain adaptability and flexibility. By optimizing the transmission matrix and the feature extraction process, the invention can optimize and adjust the characteristics of different images so as to realize better compression effect and image quality.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating a high compression rate image compression method based on an optimal transmission map according to an exemplary embodiment.
Fig. 2 is a block diagram illustrating a high compression rate image compression apparatus based on an optimal transmission map according to an exemplary embodiment.
Fig. 3 is a schematic diagram of an electronic device shown according to an example embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
Fig. 1 is a flowchart illustrating a high compression rate image compression method based on an optimal transmission map according to an exemplary embodiment, and the method is applied to a terminal as shown in fig. 1, and may include the steps of:
s1, acquiring an original image to be compressed and preprocessing the original image;
specifically, the preprocessing includes the steps of denoising, image enhancement, color space conversion, and the like. In one embodiment, the denoising process may use a convolutional self-encoder to learn a noise model of the image and use the learned model to remove noise from the image. Image enhancement can use an adaptive histogram equalization algorithm to enhance the contrast and visual effect of an image by adjusting the distribution of image pixels. Color space conversion is the conversion of an image from RGB space to YCbCr space to separate luminance and chrominance information.
S2, training an image compression model based on optimal transmission mapping by adopting a mode of generating an anti-learning or diffusion model based on the preprocessed original image, wherein the image compression model based on the optimal transmission mapping comprises a feature extraction layer, a mapping layer, a quantization coding layer and a decoding reconstruction layer, and the mapping layer adopts the optimal transmission mapping based on convex mapping to realize feature mapping;
specifically, the image compression model based on the optimal transmission mapping includes:
(1) The feature extraction layer is used for extracting features of the preprocessed image;
In one embodiment, a pre-trained CNN model (ResNet) may be used to extract features of the image. The CNN model we use is focused differently for the GAN model and the diffusion model, as will be described in more detail below.
(2) Mapping layer, based on optimal transmission mapping, mapping the extracted image feature to feature space distribution;
In particular, the optimal transmission mapping is used to solve the optimal mapping problem between two probability distributions. It can be envisioned that the process of precisely moving a stack of points from one distribution to another distribution minimizes the overall cost of the movement. In image processing, the optimal transmission mapping can realize optimal deformation or alignment of images by optimizing the distribution of pixels, so that the image quality and the visualization effect are improved.
The optimal transmission mapping minimizes the transmission cost (typically measured in terms of a "distance metric", such as a Wasserstein distance) between the source and target features by optimizing the transmission matrix. The transmission matrix represents the probability transitions from the source distribution to the target distribution. In particular, each element of the transmission matrix represents a transmission probability from the source feature to the target feature. In the application, an adaptive transmission matrix is designed, and the optimization is carried out according to the importance and the correlation of the characteristics. This means that the construction process of the transmission matrix takes into account the correlation between the features and their extent of contribution to the final image quality. For example, some features may be more important to the visual effect of the image and therefore may be given higher weight in the transmission matrix. In particular, the optimal transmission mapping between features is achieved using convex mapping in optimal transmission theory. A convex map is a special map that maintains a convex structure that can maintain distance relationships in feature space, thereby better preserving structural information of the features.
The construction and optimization process of the transmission matrix usually involves numerical optimization methods, such as iterative methods or gradient descent methods. In the optimization process, the distance between the source and target features, and the correlation and importance between the features are considered, so that the elements of the transmission matrix are adjusted to minimize the transmission cost.
In the application, the construction and optimization process of the transmission matrix is as follows:
Assuming that the source feature distribution is P and the target feature distribution is Q. These two distributions can be represented by probability density functions p (x) and q (y) in the feature space, where x and y are points in the source and target feature spaces, respectively.
The optimal transmission problem can be described as finding a transmission matrix T that minimizes the transmission cost from the source distribution P to the target distribution Q. In general, this cost may be represented by a Wasserstein distance or other metric:
Where c (X, Y) represents the cost of transmission from point X to point Y (e.g., squared euclidean distance), X and Y represent the source and target feature spaces, respectively, T (X, Y) represents the amount of transmission from X to Y, pi (X, Y) is a joint distribution function or joint probability distribution that represents the probability transition from source distribution X to target distribution Y, pi (X, Y) being defined as the amount of "quality" or "probability" transmitted between source point X and target point Y. It must satisfy the constraint of marginal distribution:
for each source point x, the integral of pi (x, y) over all y should be equal to the probability that the source is distributed at x;
For each target point y, the integral of pi (x, y) over all x should be equal to the probability that the target is distributed at y.
The key to constructing the transmission matrix using convex mapping is to guarantee the convexity of the mapping process. Specifically, the convex map φ:X→Y satisfies the following condition:
Phi (lambda X1+(1-λ)x2)≤λφ(x1)+(1-λ)φ(x2), for all X1,x2 E X and lambda E [0,1]
In the optimization process of the transmission matrix T, the following optimization problem needs to be solved:
Wherein N and M represent the number of samples of the source distribution and the target distribution, respectively;
The constraint conditions are as follows:
Where p (xi) and q (yj) represent the weights (i.e., probability densities) of the source and target features, respectively, at the respective locations.
Since the optimal transmission problem is typically a high-dimensional optimization problem, directly solving the transmission matrix can be very complex. The solution may be performed using iterative methods such as Sinkhorn-Knopp algorithm. The Sinkhorn-Knopp algorithm converts the optimization problem into a series of solvable small problems by introducing entropy regularization terms:
Where λ is the regularization parameter.
And after the optimization process is finished, the output transmission matrix T is the result of the optimal transmission mapping. The matrix contains probability transition information from the source feature to the target feature, which can be used for subsequent feature reconstruction and image compression.
In practice, the computation of the transmission matrix is typically done by numerical optimization techniques. The solution of the optimal transmission problem can be performed using an OT library (e.g., POT library) in Python, or a custom optimal transmission algorithm can be manually implemented based on a deep learning framework (e.g., pyTorch or TensorFlow).
(3) A quantization coding layer for quantizing and coding the mapped features;
In particular, quantization converts continuous eigenvalues into discrete quantization indices to reduce the complexity of the data representation. The encoding represents the quantized features in an encoded manner to achieve compression of the data. In specific implementation, the discrete feature can be obtained by using a vector quantization (Vector Quantization, VQ) method, and then the coding is performed according to the probability distribution of the discrete feature by using a self-adaptive arithmetic coding, huffman coding or arithmetic coding method, so as to improve the compression efficiency.
(4) Decoding and reconstructing the coded features to obtain a compressed image;
Specifically, the decoding reconstruction can be performed through a denoising network of the diffusion model, or the inverse operation can be performed according to the mapping layer and the quantized coding layer, so as to realize the decoding reconstruction, wherein the specific process of the decoding reconstruction is as follows:
a decoding process, if adaptive arithmetic coding or the like is adopted, the decoding process may involve a corresponding decoding algorithm, such as adaptive arithmetic decoding;
The method comprises the steps of reconstructing the image to obtain a high-quality compressed image by reversely optimizing transmission mapping and quantizing the inverse process, and finally reconstructing and recovering the image according to the restored characteristic and preprocessing information, wherein the reversely optimizing transmission mapping is realized by reversely mapping the decoded characteristic back to the original characteristic space through inverse mapping, which possibly involves reversely transforming the decoded characteristic according to an optimizing transmission mapping algorithm used during compression to restore the characteristic similar to the original image, and the inverse quantizing process is realized by reversely converting the inversely mapped characteristic to a continuous original characteristic value, wherein in the specific implementation, the inverse quantizing process can be completed by mapping the quantizing index back to the range of the original characteristic value. For example, if uniform quantization is used, the inverse of quantization is to map discrete quantization indices back to the original continuous range of eigenvalues;
finally, image reconstruction and restoration are performed by combining the restored features with the original image information preprocessed in step S1, involving mapping the restored features back to the original image space and applying the inverse of the preprocessing step to generate a high quality compressed image.
The Model can be trained by adopting a Diffusion Model or a generated type resist learning (GAN) mode correspondingly so as to ensure that the generated compressed image has high quality and high compression rate. Specifically, in generating contrast learning, a generator (i.e., an entire image compression model) performs contrast training with a discriminator that attempts to distinguish between a compressed image and an original image, and the generator attempts to generate a compressed image of high quality. The goal of the generator is to spoof the arbiter so that it considers the compressed image as the original image. In the diffusion model training process, a Mean Square Error (MSE) loss or a perceptual loss may be used for evaluating the difference between the compressed image and the original image.
In particular, the model training process may include:
S21, constructing an initial transmission matrix;
Specifically, an initial transmission matrix is constructed using initial parameters before training is started. This initial transmission matrix may be based on preliminary statistics of the data or random initialization. The initial transmission matrix t_0 is used to guide how the model performs feature mapping in the initial stage.
S22, a feature extraction layer performs feature extraction on the preprocessed original image, a mapping layer performs optimization of a transmission matrix by using the extracted features, maps the features by using an optimal transmission matrix obtained by optimization, and a quantization coding layer performs quantization and coding on the mapped features by using the optimal transmission matrix obtained by optimization to obtain a compressed image feature representation, and a decoding reconstruction layer performs decoding reconstruction on the compressed image feature representation to obtain a compressed image;
Specifically, the training process begins with a feature extraction portion of the model. The feature extraction network (e.g., a pre-trained CNN model, such as ResNet) performs feature extraction on the input image to generate a hidden spatial representation. The feature extraction layer in GAN is mainly used for generation and classification. The generator gradually converts the low-dimensional noise into a high-resolution image, and the arbiter gradually extracts features in the image to judge the authenticity of the image. In this process, the design goal of the feature extraction layer is to build or de-pattern the image step by step, generating or distinguishing image details step by step from coarse to fine. Feature extraction layer in Diffusion, focusing on capturing and preserving global semantic information of image, and supporting addition and removal of noise in Diffusion process. The feature extraction layer is designed to encode the image into a feature representation that is suitable for processing for efficient denoising in high-dimensional noise space. At this time, the parameters of the feature extraction network may be updated. After feature extraction, an update of the current transmission matrix is calculated using the current feature representation and the optimal transmission theory. Here, the optimization of the transmission matrix may be achieved by a numerical method (e.g., sinkhorn-Knopp algorithm). In updating the transmission matrix, a distance measure (e.g., wasserstein distance) between the currently extracted feature distribution and the target distribution is considered. The parameters associated with the transmission matrix are also updated in this process. The quantization and coding layer discretizes and codes the features according to the latest transmission matrix. The output of this step is a compressed image feature representation. And the decoding reconstruction layer carries out decoding reconstruction on the compressed characteristics to reconstruct a compressed image.
S23, calculating training loss based on the compressed image and the response preprocessed original image, and updating parameters of the image compression model through a back propagation algorithm;
specifically, if the decoding reconstruction layer adopts the inverse operation of the mapping layer and the quantization coding layer, the decoding reconstruction layer can be trained by a mode of generating type countermeasure learning, the generator (namely, the whole image compression model) performs countermeasure training with the discriminator, the generator tries to generate a compressed image with high quality, and the discriminator tries to distinguish the compressed image from the original image. The goal of the generator is to spoof the arbiter so that it considers the compressed image as the original image, calculate training losses using the countermeasures, and update the model parameters with the back propagation algorithm.
If the decoding reconstruction layer adopts a denoising network of a diffusion model, the decoding reconstruction layer is performed by using a training mode of the diffusion model, and the decoding reconstruction layer comprises two main processes:
1. a Forward Process (Forward Process/Diffusion Process) that adds progressively noise to the image features to produce a series of progressively noised representations, in the present application for progressively adding noise to the encoded features (compressed image representations) to diffuse them into a simple gaussian noise distribution, by progressively adding noise to produce a series of intermediate representations from the original features to the gaussian noise;
The forward process is a markov chain process, starting from the original feature x0 (i.e., the compressed image representation), adding noise step by step, generating a series of intermediate representations x1,x2,…,xT, and finally obtaining noise xT.
Given the original feature x0, the noisy feature xt of step t is generated by:
Where βt is a predefined constant that increases with time step t (known as the noise scheduler), typically taking values between (0, 1).
Depending on the nature of the Markov chain, the above process can be represented as a one-step Gaussian distribution:
wherein,
Xt may be generated from x0 in one step:
Wherein the method comprises the steps ofIs noise sampled from a standard normal distribution.
2. Reverse Process (Reverse Process/Denoising Process) is a decoding reconstruction Process that progressively denoises from the noisy representation, reverts to the original image features, in the present application, the decoding reconstruction layer (i.e., the denoising network of the diffusion model).
The inverse process is a learned denoising process used to gradually reduce xT to x0. The core task of the diffusion model is to learn the conditional probability pθ(xt-1|xt), which is achieved by a neural network (e.g., UNet).
For the t-th step reversal process, it is approximately the following form:
Where μθ(xt, t) and Σθ(xt, t) are the mean and covariance matrices of the neural network parameterizations.
In the simplest case, Σθ(xt, t) can be assumed to be constant, so that the training goal of the reverse process is to learn the denoising network eθ(xt, t) to estimate the noise e:
here eθ(xt, t) is the output of the network, representing an estimate of the added noise e.
The loss function used in the training process is typically based on the Mean Square Error (MSE) of the denoising target, as follows:
By minimizing this loss function, the denoising network eθ can more accurately predict noise, thereby denoising gradually in the reverse process, and finally restoring high-quality image features.
And S24, repeating the steps S22-S23 until the model converges, namely the loss function is not obviously reduced any more or reaches the preset training times.
In summary, the present application proposes two image compression schemes:
scheme 1 image compression method based on generative countermeasure learning
First, an original image to be compressed is acquired and preprocessed, including operations such as denoising, image enhancement, color space conversion, and the like. Then, based on the preprocessed image, an image compression model based on the optimal transmission map is trained using generative resist learning. The model comprises a feature extraction layer, a mapping layer, a quantization coding layer and a decoding reconstruction layer. The mapping layer realizes mapping and optimization of image features through optimal transmission mapping based on convex mapping. The trained model is capable of efficiently compressing the input image and reducing data redundancy through quantization coding. The decoding and reconstructing layer in the compression process maps the compressed features back to the original image space, and reconstructs a high-quality image.
Scheme 2 image compression method based on diffusion model
The scheme also firstly preprocesses the image to be compressed. Then, the image compression model based on the optimal transmission mapping is trained by adopting a diffusion model. The feature extraction layer in the diffusion model is responsible for encoding the preprocessed image into a low-dimensional feature representation. The mapping layer then achieves the optimization and mapping of these features in feature space by means of an optimal transmission mapping based on convex mapping. In the compression process, the quantization coding layer quantizes the mapped features to reduce data redundancy. Finally, in the decoding and reconstruction stage, the quantized features are restored to image features by the inverse denoising process of the diffusion model (i.e., a step-wise denoising network), and decoded into a high-quality compressed image close to the original image.
In the application, the optimization of the transmission matrix and the training of other parts of the model are alternately performed. The dynamic optimization method can continuously adjust the transmission matrix to better match the change of the feature space and optimize the feature extraction and compression process, the loss function not only comprises reconstruction errors, but also comprises regularization terms of transmission cost, which is helpful for learning better feature mapping, and the transmission matrix is recalculated or updated according to the current feature representation and target distribution in each iteration so as to ensure that the learning direction of the model is continuously optimized.
S3, obtaining an image to be compressed, and compressing the image by using a trained image compression model;
Specifically, the feature extraction layer, the mapping layer and the quantization coding layer in the trained image compression model are utilized for compression, and the image to be compressed and the original image to be compressed in the S1 should have the same or similar categories.
The present application also provides an embodiment of a high-compression-rate image compression apparatus based on the optimal transmission map, corresponding to the foregoing embodiment of the high-compression-rate image compression method based on the optimal transmission map.
Fig. 2 is a block diagram illustrating a high compression rate image compression apparatus based on an optimal transmission map according to an exemplary embodiment. Referring to fig. 2, the apparatus may include:
A preprocessing module 21, configured to acquire an original image to be compressed and preprocess the original image;
The training module 22 is configured to train an image compression model based on an optimal transmission mapping in a manner of generating an anti-learning or diffusion model based on the preprocessed original image, where the image compression model based on the optimal transmission mapping includes a feature extraction layer, a mapping layer, a quantization coding layer, and a decoding reconstruction layer, and the mapping layer implements feature mapping by adopting an optimal transmission mapping based on convex mapping;
the compression module 23 is configured to obtain an image to be compressed, and compress the image by using the trained image compression model.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.
Accordingly, the present application also provides a computer program product comprising a computer program/instructions which, when executed by a processor, implements a high compression rate image compression method based on an optimal transmission map as described above.
Correspondingly, the application further provides electronic equipment, which comprises one or more processors, a memory and a control unit, wherein the memory is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the high-compression-rate image compression method based on the optimal transmission mapping. As shown in fig. 3, a hardware structure diagram of an arbitrary device with data processing capability, where the deep learning dataset access system is located, is provided in an embodiment of the present application, except for the processor, the memory and the network interface shown in fig. 3, where the arbitrary device with data processing capability is located in the embodiment, generally, according to the actual function of the arbitrary device with data processing capability, other hardware may also be included, which is not described herein again.
Accordingly, the present application also provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a high compression rate image compression method based on an optimal transmission map as described above. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any device having data processing capabilities. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims (10)

CN202411210997.3A2024-08-302024-08-30 A high compression ratio image compression method based on optimal transmission mappingPendingCN119110084A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411210997.3ACN119110084A (en)2024-08-302024-08-30 A high compression ratio image compression method based on optimal transmission mapping

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411210997.3ACN119110084A (en)2024-08-302024-08-30 A high compression ratio image compression method based on optimal transmission mapping

Publications (1)

Publication NumberPublication Date
CN119110084Atrue CN119110084A (en)2024-12-10

Family

ID=93718211

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411210997.3APendingCN119110084A (en)2024-08-302024-08-30 A high compression ratio image compression method based on optimal transmission mapping

Country Status (1)

CountryLink
CN (1)CN119110084A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120017833A (en)*2025-04-152025-05-16北京铁力山科技股份有限公司 Image transmission method, device, equipment and storage medium at extremely low bit rate
CN120510143A (en)*2025-07-182025-08-19厦门工学院Self-adaptive feature fusion method and device based on optimal transmission

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111950635A (en)*2020-08-122020-11-17温州大学 A Robust Feature Learning Method Based on Hierarchical Feature Alignment
US20220122348A1 (en)*2020-02-042022-04-21University Of Shanghai For Science And TechnologyAdversarial Optimization Method for Training Process of Generative Adversarial Network
CN116957045A (en)*2023-09-212023-10-27第六镜视觉科技(西安)有限公司Neural network quantization method and system based on optimal transmission theory and electronic equipment
CN117998086A (en)*2023-12-252024-05-07武汉理工大学Lightweight image compression method and terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20220122348A1 (en)*2020-02-042022-04-21University Of Shanghai For Science And TechnologyAdversarial Optimization Method for Training Process of Generative Adversarial Network
CN111950635A (en)*2020-08-122020-11-17温州大学 A Robust Feature Learning Method Based on Hierarchical Feature Alignment
CN116957045A (en)*2023-09-212023-10-27第六镜视觉科技(西安)有限公司Neural network quantization method and system based on optimal transmission theory and electronic equipment
CN117998086A (en)*2023-12-252024-05-07武汉理工大学Lightweight image compression method and terminal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NA LEI等: "MODE COLLAPSE AND REGULARITY OF OPTIMALTRANSPORTATION MAPS", ARXIV, 8 February 2019 (2019-02-08)*
ZIHANG LI等: "IMAGE COMPRESSION BASED ON IMPORTANCE USING OPTIMAL MASS TRANSPORTATION MAP", IEEE XPLORE, 18 October 2022 (2022-10-18), pages 3*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120017833A (en)*2025-04-152025-05-16北京铁力山科技股份有限公司 Image transmission method, device, equipment and storage medium at extremely low bit rate
CN120510143A (en)*2025-07-182025-08-19厦门工学院Self-adaptive feature fusion method and device based on optimal transmission

Similar Documents

PublicationPublication DateTitle
US11153566B1 (en)Variable bit rate generative compression method based on adversarial learning
Johnston et al.Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks
Theis et al.Lossy image compression with compressive autoencoders
CN119011851B (en) Video compression method and system based on variational autoencoder with improved entropy model
CN119110084A (en) A high compression ratio image compression method based on optimal transmission mapping
US11893762B2 (en)Method and data processing system for lossy image or video encoding, transmission and decoding
CN119031147B (en)Video coding and decoding acceleration method and system based on learning task perception mechanism
CN110753225A (en)Video compression method and device and terminal equipment
KR102245682B1 (en)Apparatus for compressing image, learning apparatus and method thereof
CN115311144A (en)Wavelet domain-based standard flow super-resolution image reconstruction method
WO2020261314A1 (en)Image encoding method and image decoding method
WO2022229495A1 (en)A method, an apparatus and a computer program product for video encoding and video decoding
Han et al.Toward variable-rate generative compression by reducing the channel redundancy
KR20240160607A (en) Visual data processing method, device and medium
CN119520808B (en) A low bit rate video coding method based on sparse representation
CN114037071B (en)Method for acquiring neural network for image preprocessing to resist JPGE compression distortion
Huszar et al.Lossy image compression with compressive autoencoders
CN119052478A (en)Image coding method, image reconstruction method and device
CN111107377A (en)Depth image compression method, device, equipment and storage medium
Akbari et al.Downsampling based image coding using dual dictionary learning and sparse representations
CN117689742A (en) A multi-rate image compression and transmission method based on deep learning
CN114882133B (en) Image encoding and decoding method, system, device and medium
SivakotiGANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization
US20250317605A1 (en)Progressive generative face video compression with bandwidth intelligence
US12327385B2 (en)End-to-end deep generative network for low bitrate image coding

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp