技术领域Technical Field
本发明涉及多媒体处理技术领域,具体地涉及一种图像压缩方法、图像压缩模型的训练方法及装置。The present invention relates to the field of multimedia processing technology, and in particular to an image compression method, an image compression model training method and a device.
背景技术Background technique
图像压缩是推动图像传输和存储进步的重要基础技术,在多媒体通信、医疗成像、遥感、计算机视觉等多个领域发挥着不可或缺的作用。考虑到人眼是大部分图像信息的最终接收端,相关人员提出面向人眼感知的端到端图像编码方法,通过考虑人类视觉系统的感知特性,提高压缩图像的感知质量。然而这些方法,没有考虑到利用感知先验信息去除图像中感知冗余,编码效率仍有待提高。Image compression is an important basic technology that promotes the progress of image transmission and storage, and plays an indispensable role in many fields such as multimedia communications, medical imaging, remote sensing, and computer vision. Considering that the human eye is the final receiver of most image information, relevant researchers have proposed end-to-end image coding methods for human eye perception, which improve the perceptual quality of compressed images by considering the perceptual characteristics of the human visual system. However, these methods do not consider the use of perceptual prior information to remove perceptual redundancy in the image, and the coding efficiency still needs to be improved.
发明内容Summary of the invention
鉴于上述问题,本发明提供了一种图像压缩方法、图像压缩模型的训练方法及装置。In view of the above problems, the present invention provides an image compression method, and a training method and device for an image compression model.
根据本发明的第一个方面,提供了一种图像压缩方法,包括:According to a first aspect of the present invention, there is provided an image compression method, comprising:
获取待处理图像以及与待处理图像对应的最小可察失真图像,其中,最小可察失真图像是基于可察失真预测模型处理待处理图像得到的;Acquire an image to be processed and a minimum observable distortion image corresponding to the image to be processed, wherein the minimum observable distortion image is obtained by processing the image to be processed based on the observable distortion prediction model;
将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征;Inputting the image to be processed and the image with minimum perceptible distortion into the encoder, and outputting the initial features of the image and the initial features of perceptible distortion;
将可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的;Inputting the initial features of the perceptible distortion into the perceptible distortion image feature compression sub-model, and outputting the perceptible distortion image compression features, wherein the perceptible distortion image feature compression sub-model is constructed based on a hyper-prior image compression network;
将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征;Input the perceptible distortion image compression features and the image initial features into the entropy coding sub-model, and output the image compression features;
将图像压缩特征和可察失真图像压缩特征输入至解码器,输出压缩后的重建图像。The image compression features and the perceptible distortion image compression features are input into the decoder, and the compressed reconstructed image is output.
可选的,熵编码子模型包括自回归层、量化层、算数编解码器层、反量化层;其中,将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征,包括:Optionally, the entropy coding sub-model includes an autoregressive layer, a quantization layer, an arithmetic codec layer, and an inverse quantization layer; wherein the perceptibly distorted image compression features and the image initial features are input into the entropy coding sub-model, and the image compression features are output, including:
将可察失真图像压缩特征和图像初始特征输入至量化层,输出量化特征;Input the perceptible distorted image compression features and the image initial features into the quantization layer, and output the quantized features;
将图像初始特征输入至自回归层,输出编码参数和解码参数;Input the initial features of the image into the autoregressive layer and output the encoding parameters and decoding parameters;
根据算数编解码器层处理编码参数、解码参数和量化特征,得到图像中间特征;Process the encoding parameters, decoding parameters and quantization features according to the arithmetic codec layer to obtain the intermediate features of the image;
将可察失真图像压缩特征和图像中间特征输入至反量化层,输出图像压缩特征。The perceptibly distorted image compression features and the image intermediate features are input into the inverse quantization layer, and the image compression features are output.
可选的,算数编解码器层包括算数编码器子层和算数解码器子层;Optionally, the arithmetic codec layer includes an arithmetic encoder sublayer and an arithmetic decoder sublayer;
其中,根据算数编解码器层处理编码参数、解码参数和量化特征,得到图像中间特征,包括:Among them, the encoding parameters, decoding parameters and quantization features are processed according to the arithmetic codec layer to obtain the intermediate features of the image, including:
根据算数编码器子层处理编码参数和量化特征,得到图像编码特征;Processing coding parameters and quantization features according to the arithmetic encoder sublayer to obtain image coding features;
根据算数解码器子层处理解码参数和图像编码特征,得到图像中间特征。The decoding parameters and image coding features are processed according to the arithmetic decoder sublayer to obtain the intermediate features of the image.
可选的,编码器包括至少一个编码子模型,编码子模型包括第一特征提取层、第二特征提取层、第三特征提取层、第四特征提取层、特征变换 层、第一注意力网络层、第二注意力网络层;Optionally, the encoder includes at least one encoding sub-model, and the encoding sub-model includes a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, a fourth feature extraction layer, a feature transformation layer, a first attention network layer, and a second attention network layer;
其中,将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征,包括:The image to be processed and the image with minimum perceptible distortion are input to the encoder, and the initial features of the image and the initial features of perceptible distortion are output, including:
根据第一特征提取层对待处理图像进行特征提取,得到图像第一特征;Perform feature extraction on the image to be processed according to the first feature extraction layer to obtain a first feature of the image;
根据第二特征提取层对最小可察失真图像进行特征提取,得到可察失真图像第一特征;Extracting features of the minimum perceptible distortion image according to the second feature extraction layer to obtain a first feature of the perceptible distortion image;
将图像第一特征和可察失真图像第一特征输入至特征变换层,输出图像第二特征和可察失真图像第二特征;Input the first feature of the image and the first feature of the perceptibly distorted image into the feature transformation layer, and output the second feature of the image and the second feature of the perceptibly distorted image;
根据第三特征提取层对图像第二特征进行特征提取,得到图像第三特征;Extract the second feature of the image according to the third feature extraction layer to obtain the third feature of the image;
根据第一注意力网络层处理图像第三特征,得到图像初始特征;Process the third feature of the image according to the first attention network layer to obtain the initial feature of the image;
根据第四特征提取层对可察失真图像第二特征进行特征提取,得到可察失真图像第三特征;Extracting the second feature of the perceptible distorted image according to the fourth feature extraction layer to obtain a third feature of the perceptible distorted image;
根据第二注意力网络层处理可察失真图像第三特征,得到可察失真初始特征。The third feature of the perceptible distortion image is processed according to the second attention network layer to obtain the initial feature of the perceptible distortion.
可选的,特征变换层包括正切激活子层、卷积子层、第一门控子层、第二门控子层;Optionally, the feature transformation layer includes a tangent activation sublayer, a convolution sublayer, a first gating sublayer, and a second gating sublayer;
其中,将图像第一特征和可察失真图像第一特征输入至特征变换层,输出图像第二特征和可察失真图像第二特征,包括:The first feature of the image and the first feature of the perceptibly distorted image are input into the feature transformation layer, and the second feature of the image and the second feature of the perceptibly distorted image are output, including:
将可察失真图像第一特征输入至卷积子层,输出可察失真图像第二特征;Input the first feature of the perceptibly distorted image into the convolution sublayer, and output the second feature of the perceptibly distorted image;
将图像第一特征输入至卷积子层,输出图像第一卷积特征;Input the first feature of the image to the convolution sublayer and output the first convolution feature of the image;
根据第一门控子层处理图像第一卷积特征和可察失真图像第二特征,得到加权特征;Processing the first convolution feature of the image and the second feature of the perceptibly distorted image according to the first gating sublayer to obtain a weighted feature;
根据第二门控子层处理图像第一卷积特征和可察失真图像第二特征,得到权重;Processing the first convolution feature of the image and the second feature of the perceptibly distorted image according to the second gating sublayer to obtain a weight;
根据正切激活子层处理加权特征、权重和可察失真图像第二特征,得到增强特征;Processing the weighted features, the weights and the second feature of the perceptibly distorted image according to the tangent activation sublayer to obtain an enhanced feature;
根据增强特征、权重和图像第一卷积特征,得到图像第二特征。The second feature of the image is obtained according to the enhanced feature, the weight and the first convolution feature of the image.
本发明的第二方面提供了一种图像压缩模型的训练方法,图像压缩模型包括:编码器、可察失真图像特征压缩子模型、熵编码子模型和解码器,训练方法包括:A second aspect of the present invention provides a method for training an image compression model, the image compression model comprising: an encoder, a perceptibly distorted image feature compression sub-model, an entropy coding sub-model and a decoder, the training method comprising:
获取训练样本,训练样本包括样本图像以及与样本图像对应的样本最小可察失真图像,其中,样本最小可察失真图像是基于可察失真预测模型处理样本图像得到的;Acquire a training sample, the training sample including a sample image and a sample minimum observable distortion image corresponding to the sample image, wherein the sample minimum observable distortion image is obtained by processing the sample image based on a perceptible distortion prediction model;
将样本图像和样本最小可察失真图像输入至编码器,输出样本图像初始特征和样本可察失真初始特征;Input the sample image and the sample minimum observable distortion image into the encoder, and output the sample image initial features and the sample observable distortion initial features;
将样本可察失真初始特征输入至可察失真图像特征压缩子模型,输出样本可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的;Inputting the sample's initial perceptible distortion features into the perceptible distortion image feature compression sub-model, and outputting the sample's perceptible distortion image compression features, wherein the perceptible distortion image feature compression sub-model is constructed based on a hyper-prior image compression network;
将样本可察失真图像压缩特征和样本图像初始特征输入至熵编码子模型 ,输出样本图像压缩特征;Input the sample observable distortion image compression features and the sample image initial features into the entropy coding sub-model, and output the sample image compression features;
将样本图像压缩特征和样本可察失真图像压缩特征输入至解码器,输出压缩后的样本重建图像;Input the sample image compression features and the sample observable distortion image compression features into the decoder, and output the compressed sample reconstructed image;
根据样本图像、样本重建图像和样本可察失真初始特征,训练图像压缩模型,得到训练后的图像压缩模型。The image compression model is trained according to the sample image, the sample reconstructed image and the initial features of the sample observable distortion to obtain a trained image compression model.
可选的,根据样本图像、样本重建图像和样本可察失真初始特征,训练图像压缩模型,得到训练后的图像压缩模型,包括:Optionally, training an image compression model according to the sample image, the sample reconstructed image and the sample observable distortion initial features to obtain a trained image compression model includes:
根据损失函数处理样本图像和样本重建图像,得到失真损失值;Process the sample image and the sample reconstructed image according to the loss function to obtain a distortion loss value;
根据样本图像初始特征和样本可察失真初始特征,确定码率损失值;Determine a bit rate loss value according to the sample image initial features and the sample observable distortion initial features;
根据失真损失值和码率损失值,训练图像压缩模型,得到训练后的图像压缩模型。According to the distortion loss value and the bit rate loss value, the image compression model is trained to obtain a trained image compression model.
本发明的第三方面提供了一种图像压缩装置,包括:A third aspect of the present invention provides an image compression device, comprising:
第一获取模块,用于获取待处理图像以及与待处理图像对应的最小可察失真图像,其中,最小可察失真图像是基于可察失真预测模型处理待处理图像得到的;A first acquisition module is used to acquire an image to be processed and a minimum observable distortion image corresponding to the image to be processed, wherein the minimum observable distortion image is obtained by processing the image to be processed based on the observable distortion prediction model;
第一编码模块,用于将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征;A first encoding module, used for inputting the image to be processed and the minimum perceptible distortion image into an encoder, and outputting the initial features of the image and the initial features of the perceptible distortion;
第一处理模块,用于将可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的;A first processing module is used to input the initial features of the perceptible distortion into the perceptible distortion image feature compression sub-model, and output the perceptible distortion image compression features, wherein the perceptible distortion image feature compression sub-model is constructed based on a hyper-prior image compression network;
第一压缩模块,用于将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征;The first compression module is used to input the perceptible distortion image compression feature and the image initial feature into the entropy coding sub-model, and output the image compression feature;
第一解码模块,用于将图像压缩特征和可察失真图像压缩特征输入至解码器,输出压缩后的重建图像。The first decoding module is used to input the image compression feature and the perceptible distortion image compression feature into a decoder, and output a compressed reconstructed image.
本发明的第四方面提供了一种图像压缩模型的训练装置,包括:A fourth aspect of the present invention provides a training device for an image compression model, comprising:
第二获取模块,用于获取训练样本,训练样本包括样本图像以及与样本图像对应的样本最小可察失真图像,其中,样本最小可察失真图像是基于可察失真预测模型处理样本图像得到的;A second acquisition module is used to acquire training samples, where the training samples include sample images and sample minimum observable distortion images corresponding to the sample images, wherein the sample minimum observable distortion images are obtained by processing the sample images based on the observable distortion prediction model;
第二编码模块,用于将样本图像和样本最小可察失真图像输入至编码器,输出样本图像初始特征和样本可察失真初始特征;A second encoding module is used to input the sample image and the sample minimum observable distortion image into an encoder, and output the sample image initial features and the sample observable distortion initial features;
第二处理模块,用于将样本可察失真初始特征输入至可察失真图像特征压缩子模型,输出样本可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的;The second processing module is used to input the sample detectable distortion initial feature into the detectable distortion image feature compression sub-model, and output the sample detectable distortion image compression feature, wherein the detectable distortion image feature compression sub-model is constructed based on the hyper-prior image compression network;
第二压缩模块,用于将样本可察失真图像压缩特征和样本图像初始特征输入至熵编码子模型 ,输出样本图像压缩特征;The second compression module is used to input the sample detectable distortion image compression feature and the sample image initial feature into the entropy coding sub-model, and output the sample image compression feature;
第二解码模块,用于将样本图像压缩特征和样本可察失真图像压缩特征输入至解码器,输出压缩后的样本重建图像;A second decoding module is used to input the sample image compression feature and the sample detectable distortion image compression feature into a decoder, and output a compressed sample reconstructed image;
训练模块,用于根据样本图像、样本重建图像和样本可察失真初始特征,训练图像压缩模型,得到训练后的图像压缩模型。The training module is used to train the image compression model according to the sample image, the sample reconstructed image and the initial features of the sample observable distortion to obtain the trained image compression model.
根据本发明提供的图像压缩方法、图像压缩模型的训练方法及装置,通过编码器处理待处理图像和最小可察失真图像,得到图像初始特征和可察失真初始特征;根据可察失真图像特征压缩子模型处理可察失真初始特征,得到可察失真图像压缩特征;再利用熵编码子模型 处理可察失真图像压缩特征和图像初始特征,得到图像压缩特征,从而生成重建图像。由于编码器可以学习待处理图像和最小可察失真图像之间的像素相关性,有效丢弃了部分图像感知不敏感区域的图像特征,并且增强感知敏感区域的图像特征,进而有效消除感知冗余;其次,在熵编码子模型 中利用可察失真图像压缩特征来动态调整图像初始特征中像素的量化步长,进一步去除了图像感知冗余,提高了图像压缩编码效率,解决了压缩图像的感知质量差和编码效率低的问题。According to the image compression method, image compression model training method and device provided by the present invention, the image to be processed and the minimum perceptible distortion image are processed by an encoder to obtain the initial image features and the initial perceptible distortion features; the initial perceptible distortion features are processed according to the perceptible distortion image feature compression sub-model to obtain the perceptible distortion image compression features; the perceptible distortion image compression features and the initial image features are processed by the entropy coding sub-model to obtain the image compression features, thereby generating a reconstructed image. Since the encoder can learn the pixel correlation between the image to be processed and the minimum perceptible distortion image, the image features of some image perception insensitive areas are effectively discarded, and the image features of the perceptually sensitive areas are enhanced, thereby effectively eliminating perceptual redundancy; secondly, the perceptible distortion image compression features are used in the entropy coding sub-model to dynamically adjust the quantization step size of the pixels in the initial image features, further removing the image perception redundancy, improving the image compression coding efficiency, and solving the problems of poor perceptual quality and low coding efficiency of compressed images.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过以下参照附图对本发明实施例的描述,本发明的上述内容以及其他目的、特征和优点将更为清楚,在附图中:The above contents and other objects, features and advantages of the present invention will become more apparent through the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
图1示出了根据本发明实施例的图像压缩方法的流程图;FIG1 shows a flow chart of an image compression method according to an embodiment of the present invention;
图2示出了根据本发明实施例的生成重建图像的示例示意图;FIG2 shows an example schematic diagram of generating a reconstructed image according to an embodiment of the present invention;
图3示出了根据本发明实施例的图像压缩模型的训练方法的流程图;FIG3 shows a flow chart of a method for training an image compression model according to an embodiment of the present invention;
图4示出了根据本发明实施例的图像压缩装置的结构框图;FIG4 shows a structural block diagram of an image compression device according to an embodiment of the present invention;
图5示出了根据本发明实施例的图像压缩模型的训练装置的结构框图。FIG5 shows a structural block diagram of a training device for an image compression model according to an embodiment of the present invention.
具体实施方式Detailed ways
以下,将参照附图来描述本发明的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本发明的范围。在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本发明实施例的全面理解。然而,明显地,一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。Below, embodiments of the present invention will be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the present invention. In the following detailed description, for ease of explanation, many specific details are set forth to provide a comprehensive understanding of embodiments of the present invention. However, it is apparent that one or more embodiments may also be implemented without these specific details. In addition, in the following description, descriptions of known structures and technologies are omitted to avoid unnecessary confusion of concepts of the present invention.
在此使用的术语仅仅是为了描述具体实施例,而并非意在限制本发明。在此使用的术语“包括”、“包含”等表明了特征、步骤、操作和/或部件的存在,但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terms used herein are only for describing specific embodiments and are not intended to limit the present invention. The terms "comprise", "include", etc. used herein indicate the existence of features, steps, operations and/or components, but do not exclude the existence or addition of one or more other features, steps, operations or components.
在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义,除非另外定义。应注意,这里使用的术语应解释为具有与本说明书的上下文相一致的含义,而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the meanings commonly understood by those skilled in the art unless otherwise defined. It should be noted that the terms used herein should be interpreted as having a meaning consistent with the context of this specification and should not be interpreted in an idealized or overly rigid manner.
在使用类似于“A、B和C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。When using expressions such as "at least one of A, B, and C, etc.", they should generally be interpreted according to the meaning of the expression commonly understood by those skilled in the art (for example, "a system having at least one of A, B, and C" should include but is not limited to a system having A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc.).
在实现本发明的过程中发现,现有图像压缩标准主要侧重于从信号保真度的角度优化速率失真性能,在对图像进行大压缩比编码时,可能会丢失大量的高频信息,从而导致压缩图像的感知质量显著下降。此外面向人眼感知的端到端图像编码方法,通过考虑人类视觉系统的感知特性,提高压缩图像的感知质量。然而这些方法,没有考虑到利用感知先验信息去除图像中感知冗余,编码效率仍有待提高。In the process of implementing the present invention, it is found that the existing image compression standards mainly focus on optimizing rate-distortion performance from the perspective of signal fidelity. When encoding images at a high compression ratio, a large amount of high-frequency information may be lost, resulting in a significant decrease in the perceived quality of the compressed image. In addition, end-to-end image coding methods for human eye perception improve the perceived quality of compressed images by considering the perceptual characteristics of the human visual system. However, these methods do not consider the use of perceptual prior information to remove perceptual redundancy in images, and the coding efficiency still needs to be improved.
有鉴于此,本发明的实施例提供了一种图像压缩方法、图像压缩模型的训练方法及装置。该方法包括:获取待处理图像以及与待处理图像对应的最小可察失真图像,其中,最小可察失真图像是基于可察失真预测模型处理待处理图像得到的;将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征;将可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的;将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征;将图像压缩特征和可察失真图像压缩特征输入至解码器,输出压缩后的重建图像。In view of this, an embodiment of the present invention provides an image compression method, an image compression model training method and a device. The method includes: obtaining an image to be processed and a minimum observable distortion image corresponding to the image to be processed, wherein the minimum observable distortion image is obtained by processing the image to be processed based on a perceptible distortion prediction model; inputting the image to be processed and the minimum observable distortion image into an encoder, outputting image initial features and perceptible distortion initial features; inputting the perceptible distortion initial features into a perceptible distortion image feature compression submodel, outputting perceptible distortion image compression features, wherein the perceptible distortion image feature compression submodel is constructed based on a hyper-prior image compression network; inputting perceptible distortion image compression features and image initial features into an entropy coding submodel, outputting image compression features; inputting image compression features and perceptible distortion image compression features into a decoder, and outputting a compressed reconstructed image.
在本发明的技术方案中,所涉及的用户信息(包括但不限于用户个人信息、用户图像信息、用户设备信息,例如位置信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、存储、使用、加工、传输、提供、公开和应用等处理,均遵守相关法律法规和标准,采取了必要保密措施,不违背公序良俗,并提供有相应的操作入口,供用户选择授权或者拒绝。In the technical solution of the present invention, the user information (including but not limited to user personal information, user image information, user device information, such as location information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved are all information and data authorized by the user or fully authorized by all parties, and the collection, storage, use, processing, transmission, provision, disclosure and application of the relevant data comply with relevant laws, regulations and standards, take necessary confidentiality measures, do not violate public order and good morals, and provide corresponding operation entrances for users to choose to authorize or refuse.
图1示出了根据本发明实施例的图像压缩方法的流程图。FIG1 shows a flow chart of an image compression method according to an embodiment of the present invention.
如图1所示,该方法100包括操作S110~操作S150。As shown in FIG. 1 , the method 100 includes operations S110 to S150 .
在操作S110,获取待处理图像以及与待处理图像对应的最小可察失真图像。In operation S110, an image to be processed and a minimum perceptible distortion image corresponding to the image to be processed are acquired.
在操作S120,将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征。In operation S120, the image to be processed and the image with minimum perceptible distortion are input to an encoder, and initial features of the image and initial features of perceptible distortion are output.
在操作S130,将可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征。In operation S130 , the perceptible distortion initial feature is input into the perceptible distortion image feature compression sub-model, and the perceptible distortion image compression feature is output.
在操作S140,将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征。In operation S140, the perceptible distortion image compression feature and the image initial feature are input into an entropy coding sub-model, and the image compression feature is output.
在操作S150,将图像压缩特征和可察失真图像压缩特征输入至解码器,输出压缩后的重建图像。In operation S150, the image compression feature and the perceptible distortion image compression feature are input to a decoder, and a compressed reconstructed image is output.
可选的,最小可察失真图像是基于可察失真预测模型处理待处理图像得到的。Optionally, the minimum perceptible distortion image is obtained by processing the image to be processed based on the perceptible distortion prediction model.
可选的,待处理图像表征待重建的原始图像,例如,可以为地理环境照片等。最小可察失真图像可以为最小可觉察误差(Just Noticeable Distortion,JND)图像。Optionally, the image to be processed represents an original image to be reconstructed, for example, it may be a geographical environment photo, etc. The minimum observable distortion image may be a Just Noticeable Distortion (JND) image.
在一实施例中,可察失真预测模型如公式(1)所示:In one embodiment, the perceptible distortion prediction model is shown in formula (1):
(1); (1);
其中,表征待处理图像中的一个像素,/>表征与像素/>对应的可察失真信息, 表征像素/>的亮度信息,/>表征像素/>的视觉遮蔽信息,/>表征亮度信息和视觉遮蔽信息重叠部分的增益衰减参数,例如,/>可以为0.3。in, Represents a pixel in the image to be processed, /> Representation and Pixels/> The corresponding observable distortion information, Characterizing pixels/> Brightness information, /> Characterizing pixels/> Visual occlusion information, /> Characterize the gain attenuation parameter of the overlapping part of brightness information and visual masking information, for example, It can be 0.3.
可选的,利用可察失真预测模型处理待处理图像中的每个像素,得到与每个像素各自对应的可察失真信息,从而生成的最小可察失真图像。Optionally, the perceptible distortion prediction model is used to process each pixel in the image to be processed to obtain perceptible distortion information corresponding to each pixel, thereby generating an image with minimum perceptible distortion.
可选的,对每个待处理图像和与待处理图像对应的最小可察失真图像随机划分出一个图像块组,图像块组包括来自待处理图像划分的图像块和来自最小可察失真图像相应位置划分的图像块,每个图像块分辨率可以为256的整数倍,如256*256。将图像块组输入至编码器,输出图像初始特征和可察失真初始特征。Optionally, an image block group is randomly divided from each image to be processed and the minimum observable distortion image corresponding to the image to be processed, the image block group includes image blocks from the image to be processed and image blocks from the corresponding positions of the minimum observable distortion image, and the resolution of each image block can be an integer multiple of 256, such as 256*256. The image block group is input to the encoder, and the initial image features and the initial observable distortion features are output.
可选的,图像初始特征表征基于编码器处理来自待处理图像划分的图像块得到的特征;可察失真初始特征表征基于编码器处理来自最小可察失真图像相应位置划分的图像块得到的特征。Optionally, the image initial feature representation is based on features obtained by the encoder processing image blocks divided from the image to be processed; the observable distortion initial feature representation is based on features obtained by the encoder processing image blocks divided from corresponding positions of the minimum observable distortion image.
可选的,可察失真图像特征压缩子模型可以是基于超先验图像压缩网络构建的。对可察失真初始特征进行通道拼接,得到满足可察失真图像特征压缩子模型可处理尺寸要求的可察失真初始特征,进而将通道拼接后的可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征。Optionally, the perceptible distortion image feature compression sub-model can be constructed based on a hyper-prior image compression network. Channel splicing is performed on the perceptible distortion initial features to obtain perceptible distortion initial features that meet the processable size requirements of the perceptible distortion image feature compression sub-model, and then the perceptible distortion initial features after channel splicing are input into the perceptible distortion image feature compression sub-model to output perceptible distortion image compression features.
可选的,利用可察失真图像特征压缩子模型对可察失真初始特征进行编码和解码处理,得到可察失真图像压缩特征。Optionally, the perceptible distortion image feature compression sub-model is used to encode and decode the perceptible distortion initial feature to obtain the perceptible distortion image compression feature.
可选的,在熵编码子模型 中,利用可察失真图像压缩特征调整图像初始特征量化步长,进而引导图像初始特征的量化和反量化处理过程,得到图像压缩特征。图像压缩特征表征图像初始特征压缩后的特征。Optionally, in the entropy coding sub-model, the perceptible distortion image compression feature is used to adjust the quantization step of the initial image feature, thereby guiding the quantization and dequantization process of the initial image feature to obtain the image compression feature. The image compression feature represents the feature of the initial image feature after compression.
可选的,解码器与编码器的内部结构可以是对称的,利用解码器处理图像压缩特征和可察失真图像压缩特征,得到压缩后的重建图像。Optionally, the internal structures of the decoder and the encoder may be symmetrical, and the decoder is used to process the image compression features and the perceptible distortion image compression features to obtain a compressed reconstructed image.
可选的,由于编码器可以学习待处理图像和最小可察失真图像之间的像素相关性,有效丢弃了部分图像感知不敏感区域的图像特征,并且增强感知敏感区域的图像特征,进而有效消除感知冗余;其次,在熵编码子模型 中利用可察失真图像压缩特征来动态调整图像初始特征中像素的量化步长,进一步去除了图像感知冗余,提高了图像压缩编码效率,解决了压缩图像的感知质量差和编码效率低的问题。Optionally, since the encoder can learn the pixel correlation between the image to be processed and the image with minimum detectable distortion, it effectively discards the image features of some image perceptually insensitive areas and enhances the image features of perceptually sensitive areas, thereby effectively eliminating perceptual redundancy; secondly, the perceptually distorted image compression features are used in the entropy coding sub-model to dynamically adjust the quantization step size of the pixels in the initial features of the image, further removing the image perceptual redundancy, improving the image compression coding efficiency, and solving the problems of poor perceptual quality and low coding efficiency of compressed images.
可选的,将可察失真图像压缩特征和图像初始特征输入至熵编码子模型,输出图像压缩特征,包括:Optionally, the perceptibly distorted image compression features and the image initial features are input into the entropy coding sub-model, and the image compression features are output, including:
将可察失真图像压缩特征和图像初始特征输入至量化层,输出量化特征;将图像初始特征输入至自回归层,输出编码参数和解码参数;根据算数编解码器层处理编码参数、解码参数和量化特征,得到图像中间特征;将可察失真图像压缩特征和图像中间特征输入至反量化层,输出图像压缩特征。The perceptibly distorted image compression features and the initial image features are input into the quantization layer, and the quantization features are output; the initial image features are input into the autoregressive layer, and the encoding parameters and decoding parameters are output; the encoding parameters, decoding parameters and quantization features are processed according to the arithmetic codec layer to obtain the intermediate features of the image; the perceptibly distorted image compression features and the intermediate features of the image are input into the inverse quantization layer, and the image compression features are output.
可选的,熵编码子模型包括自回归层、量化层、算数编解码器层、反量化层。Optionally, the entropy coding sub-model includes an autoregressive layer, a quantization layer, an arithmetic codec layer, and an inverse quantization layer.
可选的,在量化层利用可察失真图像压缩特征对图像初始特征进行量化,得到图像初始特征对应的量化特征。Optionally, the initial features of the image are quantized using the perceptible distortion image compression features at the quantization layer to obtain quantized features corresponding to the initial features of the image.
在一实施例中,量化特征如公式(2)所示:In one embodiment, the quantified feature As shown in formula (2):
(2); (2);
其中,表征量化层,表征取整函数,/表征像素间除法操作,表征图像初始特征,表征可察失真图像压缩特征。in, Representation quantization layer, represents the rounding function, / represents the pixel division operation, Characterize the initial features of the image, Characterizing image compression features with observable distortion.
可选的,自回归层可以是基于自回归熵参数估计模型构建的。利用自回归层处理图像初始特征,得到编码参数、解码参数,编码参数和解码参数均可以为自回归层处理图像初始特征得到的均值、方差。Optionally, the autoregressive layer can be constructed based on an autoregressive entropy parameter estimation model. The autoregressive layer is used to process the initial features of the image to obtain encoding parameters and decoding parameters, and the encoding parameters and decoding parameters can both be the mean and variance obtained by processing the initial features of the image by the autoregressive layer.
可选的,根据算数编解码器层处理编码参数、解码参数和量化特征,得到图像中间特征。Optionally, encoding parameters, decoding parameters and quantization features are processed according to an arithmetic codec layer to obtain intermediate features of the image.
在一实施例中,图像压缩特征如公式(3)所示:In one embodiment, the image compression feature As shown in formula (3):
(3); (3);
其中,表征反量化层,表征像素间乘法操作,表征图像中间特征,表征可察失真图像压缩特征。in, Characterize the dequantization layer, represents the multiplication operation between pixels, Characterize the intermediate features of the image, Characterizing image compression features with observable distortion.
可选的,将可察失真图像压缩特征作为量化表动态引导图像初始特征的量化和反量化过程,进而确保量化步长的缩放因子在量化和反量化过程中保持一致,有效去除了图像感知冗余。此外,熵编码子模型能准确地给感知敏感度不同的区域分配不同的编码比特,从而在不明显降低感知质量的情况下节省更多的编码比特。Optionally, the perceptible distortion image compression features are used as quantization tables to dynamically guide the quantization and dequantization process of the image initial features, thereby ensuring that the scaling factor of the quantization step size remains consistent during the quantization and dequantization process, effectively removing image perceptual redundancy. In addition, the entropy coding sub-model can accurately assign different coding bits to areas with different perceptual sensitivities, thereby saving more coding bits without significantly reducing the perceptual quality.
可选的,算数编解码器层包括算数编码器子层和算数解码器子层。Optionally, the arithmetic codec layer includes an arithmetic encoder sublayer and an arithmetic decoder sublayer.
可选的,根据算数编解码器层处理编码参数、解码参数和量化特征,得到图像中间特征,包括:根据算数编码器子层处理编码参数和量化特征,得到图像编码特征;根据算数解码器子层处理解码参数和图像编码特征,得到图像中间特征。Optionally, the encoding parameters, decoding parameters and quantization features are processed according to the arithmetic codec layer to obtain intermediate image features, including: processing the encoding parameters and quantization features according to the arithmetic encoder sublayer to obtain image coding features; processing the decoding parameters and image coding features according to the arithmetic decoder sublayer to obtain intermediate image features.
可选的,将编码参数和量化特征输入至算数编码器子层进行编码处理,得到图像编码特征。Optionally, the encoding parameters and quantization features are input into the arithmetic encoder sublayer for encoding processing to obtain image encoding features.
可选的,将解码参数和图像编码特征输入至算数解码器子层进行解码处理,得到图像中间特征。图像中间特征表征量化特征进行压缩后的特征。Optionally, the decoding parameters and the image coding features are input into the arithmetic decoder sublayer for decoding processing to obtain the image intermediate features. The image intermediate features represent the features after the quantized features are compressed.
可选的,编码器包括至少一个编码子模型,编码子模型包括第一特征提取层、第二特征提取层、第三特征提取层、第四特征提取层、特征变换层、第一注意力网络层、第二注意力网络层。Optionally, the encoder includes at least one encoding sub-model, and the encoding sub-model includes a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, a fourth feature extraction layer, a feature transformation layer, a first attention network layer, and a second attention network layer.
可选的,第一注意力网络层、第二注意力网络层的参数相同,第一特征提取层和第二特征提取层的参数相同,第三特征提取层和第四特征提取层的参数相同,特征提取层可以是基于图像特征提取算法构建得到的,特征变换层可以是基于特征变换模型(FeatureTransform Module,FTM)构建的,第一注意力网络层和第二注意力网络层均可以是基于非局部注意力模型构建得到的。Optionally, the parameters of the first attention network layer and the second attention network layer are the same, the parameters of the first feature extraction layer and the second feature extraction layer are the same, the parameters of the third feature extraction layer and the fourth feature extraction layer are the same, the feature extraction layer can be constructed based on the image feature extraction algorithm, the feature transformation layer can be constructed based on the feature transformation model (FeatureTransform Module, FTM), and the first attention network layer and the second attention network layer can both be constructed based on the non-local attention model.
可选的,编码器中的每个编码子模型的网络结构相同。Optionally, the network structure of each encoding sub-model in the encoder is the same.
可选的,将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征,包括:Optionally, the image to be processed and the image with minimum perceptible distortion are input to an encoder, and the initial features of the image and the initial features of perceptible distortion are output, including:
根据第一特征提取层对待处理图像进行特征提取,得到图像第一特征;根据第二特征提取层对最小可察失真图像进行特征提取,得到可察失真图像第一特征;将图像第一特征和可察失真图像第一特征输入至特征变换层,输出图像第二特征和可察失真图像第二特征;根据第三特征提取层对图像第二特征进行特征提取,得到图像第三特征;根据第一注意力网络层处理图像第三特征,得到图像初始特征;根据第四特征提取层对可察失真图像第二特征进行特征提取,得到可察失真图像第三特征;根据第二注意力网络层处理可察失真图像第三特征,得到可察失真初始特征。According to the first feature extraction layer, feature extraction is performed on the image to be processed to obtain the first feature of the image; according to the second feature extraction layer, feature extraction is performed on the minimum perceptible distortion image to obtain the first feature of the perceptible distortion image; the first feature of the image and the first feature of the perceptible distortion image are input into the feature transformation layer, and the second feature of the image and the second feature of the perceptible distortion image are output; according to the third feature extraction layer, feature extraction is performed on the second feature of the image to obtain the third feature of the image; according to the first attention network layer, the third feature of the image is processed to obtain the initial feature of the image; according to the fourth feature extraction layer, feature extraction is performed on the second feature of the perceptible distortion image to obtain the third feature of the perceptible distortion image; according to the second attention network layer, the third feature of the perceptible distortion image is processed to obtain the initial feature of the perceptible distortion.
可选的,根据第一特征提取层对来自待处理图像划分的图像块进行特征提取,得到图像第一特征;根据第二特征提取层对来自最小可察失真图像相应位置划分的图像块进行特征提取,得到可察失真图像第一特征。图像第一特征和可察失真图像第一特征表征图像块的浅层次特征。Optionally, feature extraction is performed on the image blocks divided from the image to be processed according to the first feature extraction layer to obtain the first feature of the image; and feature extraction is performed on the image blocks divided from the corresponding positions of the minimum perceptible distortion image according to the second feature extraction layer to obtain the first feature of the perceptible distortion image. The first feature of the image and the first feature of the perceptible distortion image represent the shallow features of the image block.
可选的,根据特征变换层处理可察失真图像第一特征,得到深层次的可察失真图像第二特征;在特征变换层根据可察失真图像第一特征对图像第一特征进行特征变换,得到深层次的图像第二特征。Optionally, the first feature of the perceptibly distorted image is processed according to the feature transformation layer to obtain a second feature of the perceptibly distorted image at a deeper level; and the first feature of the image is feature transformed according to the first feature of the perceptibly distorted image at the feature transformation layer to obtain a second feature of the image at a deeper level.
可选的,相较于图像第二特征,图像第三特征具有更深层次的特征;相较于可察失真图像第二特征,可察失真图像第三特征具有更深层次的特征。Optionally, compared with the second image feature, the third image feature has a deeper level of feature; compared with the second image feature with perceptible distortion, the third image feature with perceptible distortion has a deeper level of feature.
可选的,第一注意力网络层和第二注意力网络层可以学习特征通道层面上的全局重点特征。相较于图像第三特征,图像初始特征中没有感知冗余;相较于可察失真图像第三特征,可察失真初始特征具有更深层次的特征。Optionally, the first attention network layer and the second attention network layer can learn global key features at the feature channel level. Compared with the third image feature, there is no perceptual redundancy in the initial image feature; compared with the third image feature of the perceptible distortion, the initial feature of the perceptible distortion has deeper features.
例如,以上述编码器由第一编码子模型和第二编码子模型构成为例,将待处理图像和最小可察失真图像输入至编码器的第一编码子模型中,输出与第一编码子模型对应的图像初始特征和可察失真初始特征;再将与第一编码子模型对应的图像初始特征和可察失真初始特征输入至第二编码子模型,输出与第二编码子模型对应的图像初始特征和可察失真初始特征,将与第二编码子模型对应的图像初始特征和可察失真初始特征确定为编码器输出的图像初始特征和可察失真初始特征。For example, taking the above-mentioned encoder composed of the first encoding sub-model and the second encoding sub-model as an example, the image to be processed and the minimum observable distortion image are input into the first encoding sub-model of the encoder, and the image initial features and the observable distortion initial features corresponding to the first encoding sub-model are output; then the image initial features and the observable distortion initial features corresponding to the first encoding sub-model are input into the second encoding sub-model, and the image initial features and the observable distortion initial features corresponding to the second encoding sub-model are output, and the image initial features and the observable distortion initial features corresponding to the second encoding sub-model are determined as the image initial features and the observable distortion initial features output by the encoder.
可选的,特征变换层包括正切激活子层、卷积子层、第一门控子层、第二门控子层。Optionally, the feature transformation layer includes a tangent activation sublayer, a convolution sublayer, a first gating sublayer, and a second gating sublayer.
可选的,正切激活子层可以是基于正切激活函数构建得到的,第一门控子层可以是基于记忆门控单元构建得到的,第二门控子层可以是基于遗忘门控单元构建得到的,卷积子层可以是基于卷积核大小为3x3的卷积网络构建得到的。Optionally, the tangent activation sublayer can be constructed based on the tangent activation function, the first gating sublayer can be constructed based on the memory gating unit, the second gating sublayer can be constructed based on the forgetting gating unit, and the convolution sublayer can be constructed based on a convolution network with a convolution kernel size of 3x3.
可选的,将图像第一特征和可察失真图像第一特征输入至特征变换层,输出图像第二特征和可察失真图像第二特征,包括:Optionally, inputting the first image feature and the first perceptibly distorted image feature into a feature transformation layer, and outputting the second image feature and the second perceptibly distorted image feature, comprises:
将可察失真图像第一特征输入至卷积子层,输出可察失真图像第二特征;将图像第一特征输入至卷积子层,输出图像第一卷积特征;根据第一门控子层处理图像第一卷积特征和可察失真图像第二特征,得到加权特征;根据第二门控子层处理图像第一卷积特征和可察失真图像第二特征,得到权重;根据正切激活子层处理加权特征、权重和可察失真图像第二特征,得到增强特征;根据增强特征、权重和图像第一卷积特征,得到图像第二特征。The first feature of the perceptibly distorted image is input into the convolution sublayer, and the second feature of the perceptibly distorted image is output; the first feature of the image is input into the convolution sublayer, and the first convolution feature of the image is output; the first convolution feature of the image and the second feature of the perceptibly distorted image are processed according to the first gating sublayer to obtain a weighted feature; the first convolution feature of the image and the second feature of the perceptibly distorted image are processed according to the second gating sublayer to obtain a weight; the weighted feature, the weight and the second feature of the perceptibly distorted image are processed according to the tangent activation sublayer to obtain an enhanced feature; the second feature of the image is obtained according to the enhanced feature, the weight and the first convolution feature of the image.
可选的,利用卷积子层对可察失真图像第一特征进行卷积运算,得到可察失真图像第二特征;利用卷积子层对图像第一特征进行卷积运算,得到图像第一卷积特征。Optionally, a convolution sublayer is used to perform a convolution operation on the first feature of the perceptibly distorted image to obtain a second feature of the perceptibly distorted image; and a convolution sublayer is used to perform a convolution operation on the first feature of the image to obtain a first convolution feature of the image.
可选的,对图像第一卷积特征和可察失真图像第二特征分别进行卷积操作后,再根据第一门控子层处理卷积操作后得到的特征,得到加权特征。Optionally, after performing convolution operations on the first convolution feature of the image and the second feature of the perceptibly distorted image respectively, features obtained after the convolution operation are processed according to the first gating sublayer to obtain weighted features.
在一实施例中,加权特征如公式(4)所示:In one embodiment, the weighted features As shown in formula (4):
(4); (4);
其中,表征Sigmoid激活函数,表征卷积子层,表征像素间乘法操作,⊗表征拼接操作,表征图像第一卷积特征,表征可察失真图像第二特征。in, Characterize the Sigmoid activation function, Representing the convolutional sublayer, represents the multiplication operation between pixels, ⊗ represents the concatenation operation, Characterize the first convolution feature of the image, Characterize the second feature of the perceptible distorted image.
可选的,对图像第一卷积特征和可察失真图像第二特征分别进行卷积操作后,再根据第二门控子层处理卷积操作后得到的特征,得到权重。Optionally, after performing convolution operations on the first convolution feature of the image and the second feature of the perceptibly distorted image respectively, the weights are obtained based on the features obtained after the convolution operation is processed by the second gating sublayer.
在一实施例中,权重如公式(5)所示:In one embodiment, the weight As shown in formula (5):
(5); (5);
其中,表征Sigmoid激活函数,表征卷积子层,表征与可察失真图像第二特征尺寸相同的单元矩阵,⊗表征拼接操作,表征图像第一卷积特征,表征可察失真图像第二特征。in, Characterize the Sigmoid activation function, Representing the convolutional sublayer, represents the unit matrix with the same size as the second characteristic of the perceptible distorted image, ⊗ represents the splicing operation, Characterize the first convolution feature of the image, Characterize the second feature of the observable distorted image.
在一实施例中,增强特征如公式(6)所示:In one embodiment, the enhancement feature As shown in formula (6):
(6); (6);
其中,表征正切激活函数,表征卷积子层,表征与可察失真图像第二特征尺寸相同的单元矩阵,表征权重,表征加权特征,表征可察失真图像第二特征,⊗表征拼接操作。in, Characterize the tangent activation function, Representing the convolutional sublayer, Characterizes the unit matrix with the same size as the second characteristic of the perceptible distorted image, Representation weight, Characterize weighted features, The second feature of the image with detectable distortion is represented, and ⊗ represents the splicing operation.
可选的,增强特征表征易于人眼察觉的特征。Optionally, the enhanced feature representations are features that are easily perceived by the human eye.
在一实施例中,图像第二特征如公式(7)所示:In one embodiment, the second feature of the image As shown in formula (7):
(7); (7);
其中,表征像素间乘法操作,表征权重,表征图像第一卷积特征,表征增强特征。in, represents the multiplication operation between pixels, Representation weight, Characterize the first convolution feature of the image, Characterize enhanced features.
可选的,对图像第一卷积特征和权重进行像素间乘法操作,可以避免梯度爆炸问题。Optionally, pixel-by-pixel multiplication operations are performed on the first convolutional features and weights of the image to avoid the gradient explosion problem.
可选的,利用第一门控子层和第二门控子层来对图像第一卷积特征和可察失真图像第二特征的各部分特征区域进行掩膜。在第二门控子层中输出的权重向量可以丢弃感知不敏感区域的信息;第一门控子层可以用于保留感知敏感信息,将加权特征与可察失真图像第二特征进一步结合的结果乘以权重,生成易于人眼察觉的增强特征。Optionally, the first gating sublayer and the second gating sublayer are used to mask the respective feature regions of the first convolution feature of the image and the second feature of the perceptually distorted image. The weight vector output in the second gating sublayer can discard the information of the perceptually insensitive region; the first gating sublayer can be used to retain the perceptually sensitive information, and the result of further combining the weighted feature with the second feature of the perceptually distorted image is multiplied by the weight to generate an enhanced feature that is easily perceived by the human eye.
图2示出了根据本发明实施例的生成重建图像的示例示意图。FIG. 2 shows an exemplary schematic diagram of generating a reconstructed image according to an embodiment of the present invention.
如图2所示,图像压缩模型包括编码器、熵编码、解码器,编码器包括两个编码子模型,对应的,解码器包括两个解码子模型,编码子模型由特征提取层、注意力网络层和特征变换层构成,特征提取层是指第一特征提取层、第二特征提取层、第三特征提取层、第四特征提取层,注意力网络层是指第一注意力网络层和第二注意力网络层,特征变换层由正切激活子层 、卷积子层、第一门控子层 、第二门控子层构成。熵编码由熵编码子模型和可察失真图像特征压缩子模型构成,熵编码子模型 由自回归层、量化层Q、算数编解码器层(算数编码器子层AE、算数解码器子层AD)、反量化层IQ构成。As shown in Figure 2, the image compression model includes an encoder, entropy coding, and a decoder. The encoder includes two encoding sub-models. Correspondingly, the decoder includes two decoding sub-models. The encoding sub-model consists of a feature extraction layer, an attention network layer, and a feature transformation layer. The feature extraction layer refers to the first feature extraction layer, the second feature extraction layer, the third feature extraction layer, and the fourth feature extraction layer. The attention network layer refers to the first attention network layer and the second attention network layer. The feature transformation layer consists of a tangent activation sub-layer, a convolution sub-layer, a first gating sub-layer, and a second gating sub-layer. Entropy coding consists of an entropy coding sub-model and a perceptible distortion image feature compression sub-model. The entropy coding sub-model consists of an autoregressive layer, a quantization layer Q, an arithmetic encoder-decoder layer (arithmetic encoder sub-layer AE, arithmetic decoder sub-layer AD), and an inverse quantization layer IQ.
可选的,获取待处理图像以及与待处理图像对应的可察失真图像;将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征;将可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征;将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征;将图像压缩特征和可察失真图像压缩特征输入至解码器,输出压缩后的重建图像。Optionally, obtain an image to be processed and a detectable distorted image corresponding to the image to be processed; input the image to be processed and the image with minimum detectable distortion into an encoder, and output the initial features of the image. and initial characteristics of observable distortion ; Input the initial features of the perceptible distortion into the perceptible distortion image feature compression sub-model, and output the perceptible distortion image compression features ; Compress the observable distortion image features and the initial features of the image Input to the entropy coding sub-model and output image compression features ; Compress the image features and observable distortion image compression features Input to the decoder and output the compressed reconstructed image.
图3示出了根据本发明实施例的图像压缩模型的训练方法的流程图。FIG3 shows a flow chart of a method for training an image compression model according to an embodiment of the present invention.
如图3所示,训练方法300包括操作S310~操作S360。As shown in FIG. 3 , the training method 300 includes operations S310 to S360 .
在操作S310,获取训练样本,训练样本包括样本图像以及与样本图像对应的样本最小可察失真图像。In operation S310, a training sample is obtained, where the training sample includes a sample image and a sample minimum observable distortion image corresponding to the sample image.
在操作S320,将样本图像和样本最小可察失真图像输入至编码器,输出样本图像初始特征和样本可察失真初始特征。In operation S320, a sample image and a sample minimum observable distortion image are input to an encoder, and a sample image initial feature and a sample minimum observable distortion initial feature are output.
在操作S330,将样本可察失真初始特征输入至可察失真图像特征压缩子模型,输出样本可察失真图像压缩特征。In operation S330 , the sample perceptible distortion initial feature is input into the perceptible distortion image feature compression sub-model, and the sample perceptible distortion image compression feature is output.
在操作S340,将样本可察失真图像压缩特征和样本图像初始特征输入至熵编码子模型,输出样本图像压缩特征。In operation S340, the sample perceptibly distorted image compression feature and the sample image initial feature are input into an entropy coding sub-model, and the sample image compression feature is output.
在操作S350,将样本图像压缩特征和样本可察失真图像压缩特征输入至解码器,输出压缩后的样本重建图像。In operation S350, the sample image compression feature and the sample observable distortion image compression feature are input to a decoder, and a compressed sample reconstructed image is output.
在操作S360,根据样本图像、样本重建图像和样本可察失真初始特征,训练图像压缩模型,得到训练后的图像压缩模型。In operation S360, an image compression model is trained according to the sample image, the sample reconstructed image, and the sample observable distortion initial features to obtain a trained image compression model.
可选的,图像压缩模型包括:编码器、可察失真图像特征压缩子模型、熵编码子模型和解码器。Optionally, the image compression model includes: an encoder, a perceptible distortion image feature compression sub-model, an entropy coding sub-model and a decoder.
可选的,样本图像可以来自Flicker 2W数据集,数据集共包含20745张样本图像,利用可察失真预测模型处理每张样本图像,得到与每张样本图像对应的样本最小可察失真图像,训练样本中包括20745张样本图像和20745张样本最小可察失真图像。Optionally, the sample images may come from the Flicker 2W dataset, which contains a total of 20,745 sample images. Each sample image is processed using the perceptible distortion prediction model to obtain a sample minimum perceptible distortion image corresponding to each sample image. The training samples include 20,745 sample images and 20,745 sample minimum perceptible distortion images.
可选的,对每个样本图像和与样本图像对应的样本最小可察失真图像随机划分出一个样本图像块组,样本图像块组包括来自样本图像划分的图像块和来自样本最小可察失真图像相应位置划分的图像块,每个图像块分辨率可以为256×256,将20745个样本图像块组输入至编码器,输出样本图像初始特征和样本可察失真初始特征。Optionally, a sample image block group is randomly divided for each sample image and the sample minimum observable distortion image corresponding to the sample image, the sample image block group includes image blocks from the sample image division and image blocks from the sample minimum observable distortion image division at corresponding positions, the resolution of each image block can be 256×256, 20745 sample image block groups are input into the encoder, and the sample image initial features and the sample observable distortion initial features are output.
可选的,样本最小可察失真图像是基于可察失真预测模型处理样本图像得到的。对样本可察失真初始特征进行通道拼接,得到满足可察失真图像特征压缩子模型可处理尺寸要求的样本可察失真初始特征,进而将通道拼接后的样本可察失真初始特征输入至可察失真图像特征压缩子模型,输出样本可察失真图像压缩特征。Optionally, the sample minimum perceptible distortion image is obtained by processing the sample image based on the perceptible distortion prediction model. Channel splicing is performed on the sample perceptible distortion initial features to obtain the sample perceptible distortion initial features that meet the processable size requirements of the perceptible distortion image feature compression sub-model, and then the channel spliced sample perceptible distortion initial features are input into the perceptible distortion image feature compression sub-model to output the sample perceptible distortion image compression features.
可选的,在熵编码子模型中,利用样本可察失真图像压缩特征调整图像初始特征量化步长,进而引导图像初始特征的量化和反量化处理过程,得到样本图像压缩特征。由于量化过程的取整函数不是连续可导的,因此,在训练过程采用具有可变上下限的均匀噪声来模拟量化过程,均匀噪声的变化范围与量化步长相同。Optionally, in the entropy coding sub-model, the sample observable distortion image compression feature is used to adjust the quantization step of the initial image feature, thereby guiding the quantization and dequantization process of the initial image feature to obtain the sample image compression feature. Since the rounding function of the quantization process is not continuously differentiable, uniform noise with variable upper and lower limits is used to simulate the quantization process during the training process, and the variation range of the uniform noise is the same as the quantization step.
可选的,解码器与编码器的内部结构可以是对称的,利用解码器处理样本图像压缩特征和样本可察失真图像压缩特征,得到压缩后的样本重建图像Optionally, the internal structures of the decoder and the encoder may be symmetrical, and the decoder is used to process the sample image compression features and the sample detectable distortion image compression features to obtain the compressed sample reconstructed image.
例如,以上述编码器由第一编码子模型和第二编码子模型构成为例,第一编码子模型和第二编码子模型的结构相同。将样本图像和样本最小可察失真图像输入至编码器的第一编码子模型中,输出与第一编码子模型对应的样本图像初始特征和样本可察失真初始特征;再将与第一编码子模型对应的样本图像初始特征和样本可察失真初始特征输入至第二编码子模型,输出与第二编码子模型对应的样本图像初始特征和样本可察失真初始特征,将与第二编码子模型对应的样本图像初始特征和样本可察失真初始特征确定为编码器输出的样本图像初始特征和样本可察失真初始特征。For example, taking the above encoder composed of the first encoding sub-model and the second encoding sub-model as an example, the structures of the first encoding sub-model and the second encoding sub-model are the same. The sample image and the sample minimum observable distortion image are input into the first encoding sub-model of the encoder, and the sample image initial features and the sample observable distortion initial features corresponding to the first encoding sub-model are output; the sample image initial features and the sample observable distortion initial features corresponding to the first encoding sub-model are input into the second encoding sub-model, and the sample image initial features and the sample observable distortion initial features corresponding to the second encoding sub-model are output, and the sample image initial features and the sample observable distortion initial features corresponding to the second encoding sub-model are determined as the sample image initial features and the sample observable distortion initial features output by the encoder.
对应的,解码器中包括第一解码子模型和第二解码子模型构成为例,第一编码子模型与第一解码子模型对称,第二编码子模型与第二解码子模型对称,将样本图像压缩特征和样本可察失真图像压缩特征输入至第二解码子模型,输出与第二解码子模型对应的样本重建图像,再将与第二解码子模型对应的样本重建图像输入至第一解码子模型,输出与第一解码子模型对应的样本重建图像,将与第一解码子模型对应的样本重建图像确定为解码器输出的压缩后的样本重建图像。Correspondingly, taking the decoder including the first decoding sub-model and the second decoding sub-model as an example, the first encoding sub-model is symmetrical with the first decoding sub-model, and the second encoding sub-model is symmetrical with the second decoding sub-model, the sample image compression features and the sample detectable distortion image compression features are input into the second decoding sub-model, and the sample reconstructed image corresponding to the second decoding sub-model is output, and then the sample reconstructed image corresponding to the second decoding sub-model is input into the first decoding sub-model, and the sample reconstructed image corresponding to the first decoding sub-model is output, and the sample reconstructed image corresponding to the first decoding sub-model is determined as the compressed sample reconstructed image output by the decoder.
可选的,可以根据样本图像、样本重建图像和样本可察失真初始特征评估模型的性能,进而调整图像压缩模型的参数,迭代训练图像压缩模型。Optionally, the performance of the model can be evaluated based on the sample image, the sample reconstructed image and the initial features of the sample observable distortion, and then the parameters of the image compression model can be adjusted to iteratively train the image compression model.
可选的,根据样本图像、样本重建图像和样本可察失真初始特征,训练图像压缩模型,得到训练后的图像压缩模型,包括:Optionally, training an image compression model according to the sample image, the sample reconstructed image and the sample observable distortion initial features to obtain a trained image compression model includes:
根据损失函数处理样本图像和样本重建图像,得到失真损失值;根据样本图像初始特征和样本可察失真初始特征,确定码率损失值;根据失真损失值和码率损失值,训练图像压缩模型,得到训练后的图像压缩模型。The sample image and the sample reconstructed image are processed according to the loss function to obtain the distortion loss value; the bit rate loss value is determined according to the initial features of the sample image and the initial features of the sample observable distortion; the image compression model is trained according to the distortion loss value and the bit rate loss value to obtain the trained image compression model.
可选的,损失函数可以包括均方误差(Mean Square Error,MSE)和图像感知相似度(Learned Perceptual Image Patch Similarity,LPIPS),例如,分配比为 1:1000,损失函数由MSE+1000*LPIPS构成。根据损失函数处理样本图像和样本重建图像,得到失真损失值。Optionally, the loss function may include mean square error (MSE) and learned perceptual image patch similarity (LPIPS). For example, when the allocation ratio is 1:1000, the loss function is composed of MSE+1000*LPIPS. The sample image and the sample reconstructed image are processed according to the loss function to obtain a distortion loss value.
可选的,码率损失值包括基于处理样本可察失真初始特征产生的码率损失和基于处理样本图像初始特征以及处理与样本图像初始特征对应的样本量化特征产生的码率损失。Optionally, the bit rate loss value includes a bit rate loss generated based on processing initial features of observable distortion of the sample and a bit rate loss generated based on processing initial features of the sample image and processing sample quantization features corresponding to the initial features of the sample image.
例如,量化层处理样本可察失真图像压缩特征和样本图像初始特征得到样本量化特征,自回归层处理样本图像初始特征得到样本自回归特征。码率损失值包括可察失真图像特征压缩子模型处理样本可察失真初始特征时产生的码率损失、自回归层处理样本图像初始特征时产生的码率损失以及算数编解码器层处理样本自回归特征和样本量化特征时产生的码率损失。For example, the quantization layer processes the sample observable distortion image compression features and the sample image initial features to obtain the sample quantization features, and the autoregression layer processes the sample image initial features to obtain the sample autoregression features. The bit rate loss value includes the bit rate loss generated when the observable distortion image feature compression sub-model processes the sample observable distortion initial features, the bit rate loss generated when the autoregression layer processes the sample image initial features, and the bit rate loss generated when the arithmetic codec layer processes the sample autoregression features and the sample quantization features.
可选的,在训练不同码率点的模型时,损失函数中失真损失值的权重分别设置为0.002、0.004、0.008和0.016。图像压缩模型的优化器可以为Adaptive Moment Estimation(Adam),在最初的50000次迭代中,学习率初始设置为 0.000l,然后每20000次迭代学习率降低一半,根据失真损失值和码率损失值得到模型的率失真性能,反向传播更新模型参数,直到模型的率失真性能达到预设阈值,模型可以设置固定训练50轮,服务器保存训练过程中率失真性能最优的模型及参数。Optionally, when training models at different bitrates, the weights of the distortion loss values in the loss function are set to 0.002, 0.004, 0.008, and 0.016, respectively. The optimizer for the image compression model can be Adaptive Moment Estimation (Adam). In the first 50,000 iterations, the learning rate is initially set to 0.000l, and then the learning rate is reduced by half every 20,000 iterations. The rate-distortion performance of the model is obtained based on the distortion loss value and the bitrate loss value, and the model parameters are updated by backpropagation until the rate-distortion performance of the model reaches a preset threshold. The model can be set to train for 50 rounds, and the server saves the model and parameters with the best rate-distortion performance during the training process.
可选的,测试图像可以来自Kodak、Tecnick和BSD500数据集,对测试图像进行处理得到测试最小可察失真图像。测试的硬件平台是有24GB显存的NVIDIA 3090Ti GPU,运行环境为Ubuntu 20.04 LTS。在三个数据集上使用图像感知相似度(Learned PerceptualImage Patch Similarity,LPIPS)、图像质量相似度(Fréchet Inception Distance,FID)和图像无偏估计(Kernel Inception Distance,KID)对图像进行评估,以比较图像压缩方法和六种现有方法的编码性能。为了精确量化编码性能的提高,利用(BD-rate)来评估比特率的节省,采用了Bjontegaard Delta rate -Learned Perceptual Image PatchSimilarity(BD LPIPS)、Bjontegaard Delta rate - Fréchet Inception Distance(BD-FID)和Bjontegaard Delta rate - Kernel Inception Distance(BD-KID)来评估图像质量,数值越小,编码效果越好。Optionally, the test images can be from the Kodak, Tecnick, and BSD500 datasets, and the test images are processed to obtain the minimum observable distortion images for the test. The hardware platform for the test is an NVIDIA 3090Ti GPU with 24GB of video memory, and the operating environment is Ubuntu 20.04 LTS. The images are evaluated on the three datasets using Learned Perceptual Image Patch Similarity (LPIPS), Image Quality Similarity (Fréchet Inception Distance, FID), and Image Unbiased Estimation (Kernel Inception Distance, KID) to compare the encoding performance of the image compression method and six existing methods. In order to accurately quantify the improvement in encoding performance, (BD-rate) is used to evaluate the bit rate savings, and Bjontegaard Delta rate -Learned Perceptual Image PatchSimilarity (BD LPIPS), Bjontegaard Delta rate - Fréchet Inception Distance (BD-FID) and Bjontegaard Delta rate - Kernel Inception Distance (BD-KID) are used to evaluate image quality. The smaller the value, the better the encoding effect.
表1示意性示出了现有方法和本发明实施例提供的方法在3种数据集上的编码性能表。Table 1 schematically shows the encoding performance of the existing method and the method provided by the embodiment of the present invention on three data sets.
可选的,方法1、方法2、方法4和方法5中的模型均是基于生成对抗网络(Generative Adversarial Network,GAN)的生成式图像压缩模型构建得到的;方法3中的模型基于Transformer的图像压缩网络构建得到的;方法4在损失函数中结合对抗损失和风格损失,方法5在损失函数中增加了内容自适应损失;方法6中的模型是基于解耦架构的压缩网络构建得到的。Optionally, the models in method 1, method 2, method 4 and method 5 are all constructed based on a generative image compression model of a generative adversarial network (GAN); the model in method 3 is constructed based on an image compression network of a Transformer; method 4 combines adversarial loss and style loss in the loss function, and method 5 adds content adaptive loss to the loss function; the model in method 6 is constructed based on a compression network with a decoupled architecture.
可选的,根据表1可以看出,本发明在三个评价指标(LPIPS、FID、KID)中都取得了最佳的平均编码性能。在三个测试数据集的LPIPS 方面, 本发明的BD-LPIPS 和BD-rate分别为(-0.059,-68.979%)、(-0.034,64.139%)和(-0.056,-64.600%),实现了最优的编码性能,此外,本发明在三个测试数据集的 BD-KID和BD-rate方面的平均值分别为 (-0.008, -66.684%)、(-0.001,-61.160%) 和 (-0.033,-75.983%),编码性能最优。在三个测试数据集的 FID方面,本发明也实现了最大的BD-rate节省率和与其他对比方法相当的BD-FID,即(-43.923,-57.787%)、(-11.034,-45.333%)和(-72.960,-58.042%)。这些结果表明在以人类视觉感知为目标的图像压缩中,相较于其它六种现有方法,本发明有效去除了图像感知冗余,较大提高了图像压缩编码效率。Optionally, it can be seen from Table 1 that the present invention achieves the best average coding performance in the three evaluation indicators (LPIPS, FID, KID). In terms of LPIPS of the three test data sets, the BD-LPIPS and BD-rate of the present invention are (-0.059, -68.979%), (-0.034, 64.139%) and (-0.056, -64.600%), respectively, achieving the best coding performance. In addition, the average values of BD-KID and BD-rate of the present invention in the three test data sets are (-0.008, -66.684%), (-0.001, -61.160%) and (-0.033, -75.983%), respectively, with the best coding performance. In terms of FID of the three test data sets, the present invention also achieves the largest BD-rate saving rate and BD-FID comparable to other comparison methods, namely (-43.923, -57.787%), (-11.034, -45.333%) and (-72.960, -58.042%). These results show that in image compression targeting human visual perception, compared with the other six existing methods, the present invention effectively removes image perception redundancy and greatly improves image compression coding efficiency.
基于上述图像压缩方法,本发明还提供了一种图像压缩装置。以下将结合图4对该装置进行详细描述。Based on the above image compression method, the present invention further provides an image compression device, which will be described in detail below in conjunction with FIG.
图4示出了根据本发明实施例的图像压缩装置的结构框图。FIG. 4 shows a structural block diagram of an image compression device according to an embodiment of the present invention.
如图4所示,该实施例的图像压缩装置400包括第一获取模块410、第一编码模块420、第一处理模块430、第一压缩模块440和第一解码模块450。As shown in FIG. 4 , the image compression device 400 of this embodiment includes a first acquisition module 410 , a first encoding module 420 , a first processing module 430 , a first compression module 440 and a first decoding module 450 .
第一获取模块410,用于获取待处理图像以及与待处理图像对应的最小可察失真图像,其中,最小可察失真图像是基于可察失真预测模型处理待处理图像得到的。在一实施例中,第一获取模块410可以用于执行前文描述的操作S110,在此不再赘述。The first acquisition module 410 is used to acquire the image to be processed and the minimum observable distortion image corresponding to the image to be processed, wherein the minimum observable distortion image is obtained by processing the image to be processed based on the observable distortion prediction model. In one embodiment, the first acquisition module 410 can be used to perform the operation S110 described above, which will not be repeated here.
第一编码模块420,用于将待处理图像和最小可察失真图像输入至编码器,输出图像初始特征和可察失真初始特征。在一实施例中,第一编码模块420可以用于执行前文描述的操作S120,在此不再赘述。The first encoding module 420 is used to input the image to be processed and the minimum perceptible distortion image into the encoder, and output the image initial features and the perceptible distortion initial features. In one embodiment, the first encoding module 420 can be used to perform the operation S120 described above, which will not be repeated here.
第一处理模块430,用于将可察失真初始特征输入至可察失真图像特征压缩子模型,输出可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的。在一实施例中,第一处理模块430可以用于执行前文描述的操作S130,在此不再赘述。The first processing module 430 is used to input the initial feature of the perceptible distortion into the perceptible distortion image feature compression sub-model, and output the perceptible distortion image compression feature, wherein the perceptible distortion image feature compression sub-model is constructed based on the hyper-prior image compression network. In one embodiment, the first processing module 430 can be used to perform the operation S130 described above, which will not be repeated here.
第一压缩模块440,用于将可察失真图像压缩特征和图像初始特征输入至熵编码子模型 ,输出图像压缩特征。在一实施例中,第一压缩模块440可以用于执行前文描述的操作S140,在此不再赘述。The first compression module 440 is used to input the perceptible distortion image compression feature and the image initial feature into the entropy coding sub-model, and output the image compression feature. In one embodiment, the first compression module 440 can be used to perform the operation S140 described above, which will not be repeated here.
第一解码模块450,用于将图像压缩特征和可察失真图像压缩特征输入至解码器,输出压缩后的重建图像。在一实施例中,第一解码模块450可以用于执行前文描述的操作S150,在此不再赘述。The first decoding module 450 is used to input the image compression feature and the perceptible distortion image compression feature into the decoder, and output the compressed reconstructed image. In one embodiment, the first decoding module 450 can be used to perform the operation S150 described above, which will not be described in detail here.
可选的,第一压缩模块440包括量化子模块、自回归子模块、压缩子模块和反量化子模块。Optionally, the first compression module 440 includes a quantization submodule, an autoregression submodule, a compression submodule and a dequantization submodule.
量化子模块,用于将可察失真图像压缩特征和图像初始特征输入至量化层,输出量化特征。The quantization submodule is used to input the compression features of the perceptibly distorted image and the initial features of the image into the quantization layer and output the quantized features.
自回归子模块,用于将图像初始特征输入至自回归层,输出编码参数和解码参数。The autoregressive submodule is used to input the initial features of the image into the autoregressive layer and output encoding parameters and decoding parameters.
压缩子模块,用于根据算数编解码器层处理编码参数、解码参数和量化特征,得到图像中间特征。The compression submodule is used to process the encoding parameters, decoding parameters and quantization features according to the arithmetic codec layer to obtain the intermediate features of the image.
反量化子模块,用于将可察失真图像压缩特征和图像中间特征输入至反量化层,输出图像压缩特征。The inverse quantization submodule is used to input the perceptible distortion image compression features and the image intermediate features into the inverse quantization layer and output the image compression features.
可选的,压缩子模块包括算数编码单元和算数解码单元。Optionally, the compression submodule includes an arithmetic encoding unit and an arithmetic decoding unit.
算数编码单元,用于根据算数编码器子层处理编码参数和量化特征,得到图像编码特征。The arithmetic coding unit is used to process the coding parameters and quantization features according to the arithmetic encoder sublayer to obtain image coding features.
算数解码单元,用于根据算数解码器子层处理解码参数和图像编码特征,得到图像中间特征。The arithmetic decoding unit is used to process the decoding parameters and the image coding features according to the arithmetic decoder sublayer to obtain the intermediate features of the image.
可选的,第一编码模块420包括第一编码子模块、第二编码子模块、第三编码子模块、第四编码子模块、第五编码子模块、第六编码子模块和第七编码子模块。Optionally, the first encoding module 420 includes a first encoding submodule, a second encoding submodule, a third encoding submodule, a fourth encoding submodule, a fifth encoding submodule, a sixth encoding submodule and a seventh encoding submodule.
第一编码子模块,用于根据第一特征提取层对待处理图像进行特征提取,得到图像第一特征。The first encoding submodule is used to extract features from the image to be processed according to the first feature extraction layer to obtain a first feature of the image.
第二编码子模块,用于根据第二特征提取层对最小可察失真图像进行特征提取,得到可察失真图像第一特征。The second encoding submodule is used to extract features of the minimum perceptible distortion image according to the second feature extraction layer to obtain a first feature of the perceptible distortion image.
第三编码子模块,用于将图像第一特征和可察失真图像第一特征输入至特征变换层,输出图像第二特征和可察失真图像第二特征。The third encoding submodule is used to input the first image feature and the first perceptibly distorted image feature into the feature transformation layer, and output the second image feature and the second perceptibly distorted image feature.
第四编码子模块,用于根据第三特征提取层对图像第二特征进行特征提取,得到图像第三特征。The fourth encoding submodule is used to extract the second feature of the image according to the third feature extraction layer to obtain the third feature of the image.
第五编码子模块,用于根据第一注意力网络层处理图像第三特征,得到图像初始特征。The fifth encoding submodule is used to process the third feature of the image according to the first attention network layer to obtain the initial feature of the image.
第六编码子模块,用于根据第四特征提取层对可察失真图像第二特征进行特征提取,得到可察失真图像第三特征。The sixth encoding submodule is used to extract the second feature of the perceptibly distorted image according to the fourth feature extraction layer to obtain the third feature of the perceptibly distorted image.
第七编码子模块,用于根据第二注意力网络层处理可察失真图像第三特征,得到可察失真初始特征。The seventh encoding submodule is used to process the third feature of the perceptible distortion image according to the second attention network layer to obtain the initial feature of the perceptible distortion.
可选的,第三编码子模块包括第一编码单元、第二编码单元、第三编码单元、第四编码单元、第五编码单元和第六编码单元。Optionally, the third encoding submodule includes a first encoding unit, a second encoding unit, a third encoding unit, a fourth encoding unit, a fifth encoding unit and a sixth encoding unit.
第一编码单元,用于将可察失真图像第一特征输入至卷积子层,输出可察失真图像第二特征。The first encoding unit is used to input the first feature of the perceptibly distorted image into the convolution sublayer, and output the second feature of the perceptibly distorted image.
第二编码单元,用于将图像第一特征输入至卷积子层,输出图像第一卷积特征。The second encoding unit is used to input the first feature of the image into the convolution sublayer and output the first convolution feature of the image.
第三编码单元,用于根据第一门控子层处理图像第一卷积特征和可察失真图像第二特征,得到加权特征。The third encoding unit is used to process the first convolution feature of the image and the second feature of the perceptibly distorted image according to the first gating sublayer to obtain a weighted feature.
第四编码单元,用于根据第二门控子层处理图像第一卷积特征和可察失真图像第二特征,得到权重。The fourth encoding unit is used to process the first convolution feature of the image and the second feature of the perceptibly distorted image according to the second gating sublayer to obtain a weight.
第五编码单元,用于根据正切激活子层处理加权特征、权重和可察失真图像第二特征,得到增强特征。The fifth encoding unit is used to process the weighted features, the weights and the second features of the perceptible distorted image according to the tangent activation sublayer to obtain enhanced features.
第六编码单元,用于根据增强特征、权重和图像第一卷积特征,得到图像第二特征。The sixth encoding unit is used to obtain the second feature of the image according to the enhanced feature, the weight and the first convolution feature of the image.
基于上述图像压缩模型的训练方法,本发明还提供了一种图像压缩模型的训练装置。以下将结合图5对该装置进行详细描述。Based on the above-mentioned image compression model training method, the present invention also provides an image compression model training device, which will be described in detail below in conjunction with FIG.
图5示出了根据本发明实施例的图像压缩装置的结构框图。FIG5 shows a structural block diagram of an image compression device according to an embodiment of the present invention.
如图5所示,该实施例的图像压缩模型的训练装置500包括第二获取模块510、第二编码模块520、第二处理模块530、第二压缩模块540、第二解码模块550和训练模块560。As shown in FIG. 5 , the image compression model training device 500 of this embodiment includes a second acquisition module 510 , a second encoding module 520 , a second processing module 530 , a second compression module 540 , a second decoding module 550 and a training module 560 .
第二获取模块510,用于获取训练样本,训练样本包括样本图像以及与样本图像对应的样本最小可察失真图像,其中,样本最小可察失真图像是基于可察失真预测模型处理样本图像得到的。在一实施例中,第二获取模块510可以用于执行前文描述的操作S310,在此不再赘述。The second acquisition module 510 is used to acquire a training sample, wherein the training sample includes a sample image and a sample minimum observable distortion image corresponding to the sample image, wherein the sample minimum observable distortion image is obtained by processing the sample image based on the observable distortion prediction model. In one embodiment, the second acquisition module 510 can be used to perform the operation S310 described above, which will not be described in detail herein.
第二编码模块520,用于将样本图像和样本最小可察失真图像输入至编码器,输出样本图像初始特征和样本可察失真初始特征。在一实施例中,第二编码模块520可以用于执行前文描述的操作S320,在此不再赘述。The second encoding module 520 is used to input the sample image and the sample minimum observable distortion image into the encoder, and output the sample image initial features and the sample observable distortion initial features. In one embodiment, the second encoding module 520 can be used to perform the operation S320 described above, which will not be repeated here.
第二处理模块530,用于将样本可察失真初始特征输入至可察失真图像特征压缩子模型,输出样本可察失真图像压缩特征,其中,可察失真图像特征压缩子模型是基于超先验图像压缩网络构建的。在一实施例中,第二处理模块530可以用于执行前文描述的操作S330,在此不再赘述。The second processing module 530 is used to input the sample perceptible distortion initial feature into the perceptible distortion image feature compression sub-model, and output the sample perceptible distortion image compression feature, wherein the perceptible distortion image feature compression sub-model is constructed based on the hyper-prior image compression network. In one embodiment, the second processing module 530 can be used to perform the operation S330 described above, which will not be repeated here.
第二压缩模块540,用于将样本可察失真图像压缩特征和样本图像初始特征输入至熵编码子模型 ,输出样本图像压缩特征。在一实施例中,第二压缩模块540可以用于执行前文描述的操作S340,在此不再赘述。The second compression module 540 is used to input the sample perceptibly distorted image compression feature and the sample image initial feature into the entropy coding sub-model, and output the sample image compression feature. In one embodiment, the second compression module 540 can be used to perform the operation S340 described above, which will not be repeated here.
第二解码模块550,用于将样本图像压缩特征和样本可察失真图像压缩特征输入至解码器,输出压缩后的样本重建图像。在一实施例中,第二解码模块550可以用于执行前文描述的操作S350,在此不再赘述。The second decoding module 550 is used to input the sample image compression feature and the sample perceptible distortion image compression feature into the decoder, and output the compressed sample reconstructed image. In one embodiment, the second decoding module 550 can be used to perform the operation S350 described above, which will not be repeated here.
训练模块560,用于根据样本图像、样本重建图像和样本可察失真初始特征,训练图像压缩模型,得到训练后的图像压缩模型。在一实施例中,训练模块560可以用于执行前文描述的操作S360,在此不再赘述。The training module 560 is used to train the image compression model according to the sample image, the sample reconstructed image and the sample observable distortion initial features to obtain the trained image compression model. In one embodiment, the training module 560 can be used to perform the operation S360 described above, which will not be described in detail here.
可选的,训练模块560包括第一训练子模块、第二训练子模块和第三训练子模块。Optionally, the training module 560 includes a first training submodule, a second training submodule and a third training submodule.
第一训练子模块,用于根据损失函数处理样本图像和样本重建图像,得到失真损失值。The first training submodule is used to process the sample image and the sample reconstructed image according to the loss function to obtain a distortion loss value.
第二训练子模块,用于根据样本图像初始特征和样本最小初始特征,确定码率损失值。The second training submodule is used to determine the bit rate loss value according to the sample image initial feature and the sample minimum initial feature.
第三训练子模块,用于根据失真损失值和码率损失值,训练图像压缩模型,得到训练后的图像压缩模型。The third training submodule is used to train the image compression model according to the distortion loss value and the bit rate loss value to obtain the trained image compression model.
可选的,模块、子模块、单元、子单元中的任意多个模块可以合并在一个模块中实现,或者其中的任意一个模块可以被拆分成多个模块。或者,这些模块中的一个或多个模块的至少部分功能可以与其他模块的至少部分功能相结合,并在一个模块中实现。可选的,模块、子模块、单元、子单元中的至少一个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,模块、子模块、单元、子单元中的至少一个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。Optionally, any multiple modules among the modules, submodules, units, and subunits can be combined into one module for implementation, or any one of the modules can be split into multiple modules. Alternatively, at least part of the functions of one or more of these modules can be combined with at least part of the functions of other modules and implemented in one module. Optionally, at least one of the modules, submodules, units, and subunits can be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application-specific integrated circuit (ASIC), or can be implemented by hardware or firmware such as any other reasonable way of integrating or packaging the circuit, or implemented in any one of the three implementation methods of software, hardware, and firmware, or in any appropriate combination of any of them. Alternatively, at least one of the modules, submodules, units, and subunits can be at least partially implemented as a computer program module, and when the computer program module is run, the corresponding function can be executed.
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。本领域技术人员可以理解,本发明的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合,即使这样的组合或结合没有明确记载于本发明中。特别地,在不脱离本发明精神和教导的情况下,本发明的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本发明的范围。The flowcharts and block diagrams in the accompanying drawings illustrate the possible architecture, functions and operations of the systems, methods and computer program products according to various embodiments of the present invention. In this regard, each box in the flowchart or block diagram may represent a module, a program segment, or a part of a code, and the above-mentioned module, program segment, or a part of the code contains one or more executable instructions for implementing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box may also occur in an order different from that marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram or flowchart, and the combination of boxes in the block diagram or flowchart, can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions. It can be understood by those skilled in the art that the features recorded in the various embodiments and/or claims of the present invention can be combined and/or combined in various ways, even if such a combination or combination is not explicitly recorded in the present invention. In particular, without departing from the spirit and teachings of the present invention, the features described in the various embodiments and/or claims of the present invention may be combined and/or combined in a variety of ways. All of these combinations and/or combinations fall within the scope of the present invention.
以上对本发明的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本发明的范围。尽管在以上分别描述了各实施例,但是这并不意味着各个实施例中的措施不能有利地结合使用。本发明的范围由所附权利要求及其等同物限定。不脱离本发明的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本发明的范围之内。The embodiments of the present invention are described above. However, these embodiments are only for the purpose of illustration, and are not intended to limit the scope of the present invention. Although the embodiments are described above, this does not mean that the measures in the various embodiments cannot be used in combination. The scope of the present invention is defined by the appended claims and their equivalents. Without departing from the scope of the present invention, those skilled in the art may make various substitutions and modifications, which should all fall within the scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410612667.0ACN118200573B (en) | 2024-05-17 | 2024-05-17 | Image compression method, training method and device of image compression model |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410612667.0ACN118200573B (en) | 2024-05-17 | 2024-05-17 | Image compression method, training method and device of image compression model |
| Publication Number | Publication Date |
|---|---|
| CN118200573Atrue CN118200573A (en) | 2024-06-14 |
| CN118200573B CN118200573B (en) | 2024-08-23 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410612667.0AActiveCN118200573B (en) | 2024-05-17 | 2024-05-17 | Image compression method, training method and device of image compression model |
| Country | Link |
|---|---|
| CN (1) | CN118200573B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102202349A (en)* | 2011-05-18 | 2011-09-28 | 杭州电子科技大学 | Wireless sensor networks data compression method based on self-adaptive optimal zero suppression |
| CN103414889A (en)* | 2013-04-09 | 2013-11-27 | 宁波大学 | Stereoscopic video bitrate control method based on binocular just-noticeable distortion |
| US20140169451A1 (en)* | 2012-12-13 | 2014-06-19 | Mitsubishi Electric Research Laboratories, Inc. | Perceptually Coding Images and Videos |
| CN104219525A (en)* | 2014-09-01 | 2014-12-17 | 国家广播电影电视总局广播科学研究院 | Perceptual video coding method based on saliency and just noticeable distortion |
| US20180075581A1 (en)* | 2016-09-15 | 2018-03-15 | Twitter, Inc. | Super resolution using a generative adversarial network |
| CN108665460A (en)* | 2018-05-23 | 2018-10-16 | 浙江科技学院 | Image Quality Evaluation Method Based on Combined Neural Network and Classification Neural Network |
| CN109218654A (en)* | 2018-10-19 | 2019-01-15 | 视联动力信息技术股份有限公司 | A kind of view networking conference control method and system |
| CN110049271A (en)* | 2019-03-19 | 2019-07-23 | 视联动力信息技术股份有限公司 | A kind of view networking conferencing information methods of exhibiting and device |
| CN110062234A (en)* | 2019-04-29 | 2019-07-26 | 同济大学 | A kind of perception method for video coding based on the just discernable distortion in region |
| KR20190127090A (en)* | 2018-05-03 | 2019-11-13 | 한국전자통신연구원 | Method and Apparatus for Just Noticeable Quantization Distortion based Perceptual Video Coding using Machine Learning |
| WO2020080698A1 (en)* | 2018-10-19 | 2020-04-23 | 삼성전자 주식회사 | Method and device for evaluating subjective quality of video |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102202349A (en)* | 2011-05-18 | 2011-09-28 | 杭州电子科技大学 | Wireless sensor networks data compression method based on self-adaptive optimal zero suppression |
| US20140169451A1 (en)* | 2012-12-13 | 2014-06-19 | Mitsubishi Electric Research Laboratories, Inc. | Perceptually Coding Images and Videos |
| CN103414889A (en)* | 2013-04-09 | 2013-11-27 | 宁波大学 | Stereoscopic video bitrate control method based on binocular just-noticeable distortion |
| CN104219525A (en)* | 2014-09-01 | 2014-12-17 | 国家广播电影电视总局广播科学研究院 | Perceptual video coding method based on saliency and just noticeable distortion |
| US20180075581A1 (en)* | 2016-09-15 | 2018-03-15 | Twitter, Inc. | Super resolution using a generative adversarial network |
| KR20190127090A (en)* | 2018-05-03 | 2019-11-13 | 한국전자통신연구원 | Method and Apparatus for Just Noticeable Quantization Distortion based Perceptual Video Coding using Machine Learning |
| CN108665460A (en)* | 2018-05-23 | 2018-10-16 | 浙江科技学院 | Image Quality Evaluation Method Based on Combined Neural Network and Classification Neural Network |
| CN109218654A (en)* | 2018-10-19 | 2019-01-15 | 视联动力信息技术股份有限公司 | A kind of view networking conference control method and system |
| WO2020080698A1 (en)* | 2018-10-19 | 2020-04-23 | 삼성전자 주식회사 | Method and device for evaluating subjective quality of video |
| CN110049271A (en)* | 2019-03-19 | 2019-07-23 | 视联动力信息技术股份有限公司 | A kind of view networking conferencing information methods of exhibiting and device |
| CN110062234A (en)* | 2019-04-29 | 2019-07-26 | 同济大学 | A kind of perception method for video coding based on the just discernable distortion in region |
| Title |
|---|
| ZHENG WANG: "Compressed Video Quality Metric Based on Just-Noticeable-Difference and Saliency-aware Blocking Detection", 《2021 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC)》, 17 January 2022 (2022-01-17)* |
| 王铖伃,严利民: "一种基于位平面和位运动估计的视频数据压缩算法", 《复旦学报(自然科学版)》, 29 February 2024 (2024-02-29)* |
| 许晴晴;郁梅;杜宝祯;: "结合感兴趣区域恰可感知编码失真的会议视频编码方法", 数据通信, no. 03, 28 June 2017 (2017-06-28)* |
| Publication number | Publication date |
|---|---|
| CN118200573B (en) | 2024-08-23 |
| Publication | Publication Date | Title |
|---|---|---|
| CN110072119B (en) | Content-aware video self-adaptive transmission method based on deep learning network | |
| Liu et al. | Perceptual reduced-reference visual quality assessment for contrast alteration | |
| Zhang et al. | Fine-grained quality assessment for compressed images | |
| KR100809409B1 (en) | Decoding apparatus, inverse quantization method, and computer readable medium recorded with the program | |
| CN116233445B (en) | Video codec processing method, device, computer equipment and storage medium | |
| CN113192147A (en) | Method, system, storage medium, computer device and application for significance compression | |
| Duan et al. | Optimizing JPEG quantization table for low bit rate mobile visual search | |
| CN115063492B (en) | Method for generating countermeasure sample for resisting JPEG compression | |
| CN117459737B (en) | An image preprocessing network training method and image preprocessing method | |
| Li et al. | Recent advances and challenges in video quality assessment | |
| CN106664404A (en) | Block segmentation mode processing method in video coding and relevant apparatus | |
| CN108616757B (en) | Video watermark embedding and extracting method capable of extracting watermark after copying | |
| JP2006270375A (en) | Decoder, dequantization method, distribution determining method and program | |
| CN117478886A (en) | Multimedia data encoding method, device, electronic equipment and storage medium | |
| CN120111245A (en) | A video compression intelligent preprocessing method, system, device and medium based on frequency domain perception optimization | |
| Ma et al. | Correcting diffusion-based perceptual image compression with privileged end-to-end decoder | |
| CN118200573B (en) | Image compression method, training method and device of image compression model | |
| CN113256744B (en) | Image coding and decoding method and system | |
| CN118055247A (en) | Human-machine-vision-combined scalable image compression system and method | |
| WO2023082773A1 (en) | Video encoding method and apparatus, video decoding method and apparatus, and device, storage medium and computer program | |
| JP4645948B2 (en) | Decoding device and program | |
| Zhang et al. | Frequency-Aware Learned Image Compression Using Channel-Wise Attention and Restormer | |
| CN117915096B (en) | Target identification high-precision high-resolution video coding method and system for AI large model | |
| CN119312844A (en) | Decoding network model, quantization method of decoding network model and related device | |
| Qi et al. | Investigation of Image Compression Based on Semantic Network and Deep Residual Variational Auto-Encoder |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |