Movatterモバイル変換


[0]ホーム

URL:


CN113132755B - Method and system for encoding extensible man-machine cooperative image and method for training decoder - Google Patents

Method and system for encoding extensible man-machine cooperative image and method for training decoder
Download PDF

Info

Publication number
CN113132755B
CN113132755BCN201911415561.7ACN201911415561ACN113132755BCN 113132755 BCN113132755 BCN 113132755BCN 201911415561 ACN201911415561 ACN 201911415561ACN 113132755 BCN113132755 BCN 113132755B
Authority
CN
China
Prior art keywords
edge map
code stream
auxiliary information
image
compact representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911415561.7A
Other languages
Chinese (zh)
Other versions
CN113132755A (en
Inventor
刘家瑛
胡越予
杨帅
王德昭
郭宗明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking UniversityfiledCriticalPeking University
Priority to CN201911415561.7ApriorityCriticalpatent/CN113132755B/en
Publication of CN113132755ApublicationCriticalpatent/CN113132755A/en
Application grantedgrantedCritical
Publication of CN113132755BpublicationCriticalpatent/CN113132755B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses an extensible man-machine cooperative image coding method and system. The method comprises the following steps: extracting an edge image of each sample picture and vectorizing the edge image to be used as compact representation of a driving machine vision task; extracting key points in the vectorized edge image to serve as auxiliary information; respectively performing entropy coding lossless compression on the compact representation and the auxiliary information to obtain two paths of code streams; preliminarily decoding the two paths of code streams to obtain an edge graph and auxiliary information; inputting the edge graph obtained by decoding and auxiliary information into a neural network to perform forward calculation of the network; performing loss function calculation according to the obtained calculation result and the corresponding original picture, and reversely transmitting the calculated loss to a neural network for network weight updating until the neural network converges to obtain a double-path code stream decoder; acquiring an edge image and auxiliary information of an image to be processed, and coding and compressing the edge image and the auxiliary information to obtain two paths of code streams; and the double-path code stream decoder decodes the received code stream and reconstructs an image.

Description

Translated fromChinese
可扩展人机协同图像编码方法及系统、解码器训练方法Scalable human-machine collaborative image coding method and system, decoder training method

技术领域technical field

本发明属于图像编码领域,涉及一种可扩展人机协同图像编码方法及编码系统,本发明可以同时提升图像在人眼视觉以及机器视觉下的质量。The invention belongs to the field of image coding, and relates to a scalable human-machine collaborative image coding method and coding system. The invention can simultaneously improve the quality of images under human vision and machine vision.

背景技术Background technique

在数字图像的使用传播过程中,有损图像压缩是一项不可或缺的关键技术。传统有损图像压缩方案通过对图像进行变换得到紧凑表示从而继续量化、熵编码来进行压缩,极大地降低了数字图像在存储和传输过程中的开销,使得数字图像在日常生活得以被普遍使用。In the process of using and spreading digital images, lossy image compression is an indispensable key technology. The traditional lossy image compression scheme transforms the image to obtain a compact representation and continues to quantify and entropy encode it for compression, which greatly reduces the overhead of digital images in the process of storage and transmission, and enables digital images to be widely used in daily life.

随着计算机视觉技术的发展,越来越多的应用场景中需要考虑图像在机器视觉下的质量,也就是有损压缩后的图像在机器视觉任务下仍然可以保持与无损图像相当的性能。但是传统有损图像压缩方案仅针对人眼视觉进行优化,无法保证机器视觉下的质量。而如果仅仅考虑对机器视觉任务的特征进行压缩,不保证图像的恢复重建,则无法在人眼视觉下进行观察。With the development of computer vision technology, more and more application scenarios need to consider the quality of images under machine vision, that is, images after lossy compression can still maintain comparable performance to lossless images under machine vision tasks. However, traditional lossy image compression schemes are only optimized for human vision and cannot guarantee the quality under machine vision. However, if only the features of the machine vision task are considered to be compressed, and the restoration and reconstruction of the image are not guaranteed, it cannot be observed under human vision.

为了同时保证人眼视觉以及机器视觉下的性能,本发明提出了一个可扩展的人机协同图像编码系统。在此之上,根据需求的不同,可以通过传输解码不同等级的码流,得到只针对机器视觉的重建图像以及针对人眼视觉的重建图像。In order to ensure the performance under human vision and machine vision at the same time, the present invention proposes an extensible human-machine cooperative image coding system. On top of this, according to different requirements, different levels of code streams can be transmitted and decoded to obtain reconstructed images only for machine vision and reconstructed images for human vision.

发明内容SUMMARY OF THE INVENTION

本发明在上述技术背景下,设计了一种可扩展的人机协同图像编码方法及编码系统。不同于传统视觉的单一码流,本发明提出了一个可扩展的编码框架,同时生成两路码流——视觉驱动的紧凑表示码流以及辅助信息码流,从而根据不同任务需求进行解码重建。本发明的解码器采取了一个生成模型,可以针对不同等级的码流进行解码。对于视觉驱动的紧凑表示码流,生成针对机器视觉的重建图像。对于视觉驱动的紧凑表示码流以及辅助信息码流,生成针对人眼视觉的重建图像。整体框架如附图1所示。Under the above technical background, the present invention designs an extensible human-machine collaborative image coding method and coding system. Different from the single code stream of traditional vision, the present invention proposes an extensible coding framework, and generates two code streams at the same time - a vision-driven compact representation code stream and an auxiliary information code stream, so as to perform decoding and reconstruction according to different task requirements. The decoder of the present invention adopts a generative model, and can decode code streams of different levels. For vision-driven compact representation codestreams, generate reconstructed images for machine vision. For vision-driven compact representation codestreams and auxiliary information codestreams, reconstructed images for human vision are generated. The overall frame is shown in Figure 1.

本发明的技术方案为:The technical scheme of the present invention is:

一种可扩展人机协同图像编码方法,其步骤包括:A scalable human-machine collaborative image coding method, the steps of which include:

1)提取各样本图片的边缘图;1) Extract the edge map of each sample image;

2)利用贝塞尔曲线对边缘图进行矢量化,作为驱动机器视觉任务的紧凑表示;然后在矢量化后的边缘图中进行关键点提取,将提取的关键点作为辅助信息;2) Use the Bezier curve to vectorize the edge map as a compact representation for driving machine vision tasks; then extract key points in the vectorized edge map, and use the extracted key points as auxiliary information;

3)对所述紧凑表示和所述辅助信息分别进行熵编码无损压缩,获得两路码流;3) Entropy coding and lossless compression are respectively performed on the compact representation and the auxiliary information to obtain two code streams;

4)对两路码流进行初步解码,获得边缘图以及辅助信息;4) Preliminarily decode the two code streams to obtain edge maps and auxiliary information;

5)对于需生成针对人眼视觉的重建图像任务,则将解码得到的边缘图以及辅助信息输入生成神经网络中,进行网络的前向计算;对于需生成针对机器视觉的重建图像任务,则将解码得到的边缘图输入生成神经网络中,进行网络的前向计算;5) For the task of generating reconstructed images for human vision, input the decoded edge map and auxiliary information into the generating neural network, and perform forward calculation of the network; for the task of generating reconstructed images for machine vision, then The decoded edge map is input into the generating neural network, and the forward calculation of the network is performed;

6)步骤5)得到的计算结果与对应原始图片进行损失函数计算,并将计算的损失反向传播到神经网络进行网络权值更新;6) The calculation result obtained in step 5) and the corresponding original picture are subjected to loss function calculation, and the calculated loss is back-propagated to the neural network to update the network weights;

7)重复步骤2)~6)直到神经网络的损失收敛,得到针对人眼视觉重建图像任务的双路码流解码器或针对机器视觉重建图像任务的紧凑表示码流解码器;7) Repeat steps 2) to 6) until the loss of the neural network converges, and obtain a dual-channel code stream decoder for the task of reconstructing images with human vision or a compact representation code stream decoder for the task of reconstructing images with machine vision;

8)对于一待处理图像I,获取该图像I的边缘图和辅助信息并编码压缩后得到两路码流,8) for a to-be-processed image I, obtain the edge map and auxiliary information of this image I and obtain two code streams after encoding and compressing,

分别记为BE和BCare denoted asBE andBC respectively;

9)根据任务需求选择双路码流解码器或紧凑表示码流解码器对收到的码流解码,重建图像。9) Select a dual-channel code stream decoder or a compact representation code stream decoder according to the task requirements to decode the received code stream and reconstruct the image.

进一步的,提取所述关键点的方法为:若所述边缘图矢量化后的线为直线段,则使用直线模式提取关键点,否则使用贝塞尔曲线模式提取关键点。Further, the method for extracting the key points is as follows: if the vectorized line of the edge map is a straight line segment, use the straight line mode to extract the key points; otherwise, use the Bezier curve mode to extract the key points.

进一步的,使用所述直线模式提取关键点的方法为:若直线段与水平线夹角大于设定角度,则过线段中点在水平线上左右等距采样两个颜色值;若小于或等于设定角度,则过线段中点在竖直线上上下等距采样两个颜色值;使用所述贝塞尔曲线模式提取关键点的方法为:将平行于贝塞尔曲线起点与终点连线的线与贝塞尔曲线的切点记下,若切线与水平线夹角大于设定角度,则过切点在水平线上采样曲线内的一个颜色值;若小于或等于设定角度,则过切点在竖直线上采样曲线内的一个颜色值。Further, the method for extracting key points using the straight line mode is: if the angle between the straight line segment and the horizontal line is greater than the set angle, then sampling two color values equidistantly left and right on the horizontal line through the midpoint of the line segment; angle, then sample two color values equidistantly up and down the vertical line through the midpoint of the line segment; the method for extracting key points using the Bezier curve mode is: connect a line parallel to the start point and end point of the Bezier curve Note down the tangent point with the Bezier curve. If the angle between the tangent and the horizontal line is greater than the set angle, the tangent point is a color value in the sampling curve on the horizontal line; if it is less than or equal to the set angle, the tangent point is at A color value within the sample curve on the vertical line.

进一步的,所述设定角度为45°。Further, the set angle is 45°.

进一步的,对于机器视觉任务,步骤8)中将边缘图对应的码流BE发送给紧凑表示码流解码器,步骤9)中,紧凑表示码流解码器对码流BE解码得到矢量化后的边缘图E并对其进行前向传递,得到解码图像

Figure BDA0002351110570000021
对于人眼视觉任务,步骤8)将码流BE和BC发送给双路码流解码器,步骤9)中,双路码流解码器对码流BE和BC解码得到E和C并对其进行前向传递,得到解码图像
Figure BDA0002351110570000022
Further, for the machine vision task, in step 8), the code stream BE corresponding to the edge map is sent to the compact representation code stream decoder, and in step 9), the compact representation code stream decoder decodes the code stream BE to obtain a vectorized representation. After the edge map E and pass it forward, get the decoded image
Figure BDA0002351110570000021
For the human eye vision task, step 8) sends the code streams BE and BC to the dual-channel code stream decoder, and in step 9), the dual-channel code stream decoder decodes the code streams BE and BC to obtain E and C and forward pass it to get the decoded image
Figure BDA0002351110570000022

一种双路码流解码器训练生成方法,其步骤包括:A method for training and generating a dual-channel code stream decoder, the steps of which include:

1)提取各样本图片的边缘图;1) Extract the edge map of each sample image;

2)利用贝塞尔曲线对边缘图进行矢量化,作为驱动机器视觉任务的紧凑表示;然后在矢量化后的边缘图中进行关键点提取,将提取的关键点作为辅助信息;2) Use the Bezier curve to vectorize the edge map as a compact representation for driving machine vision tasks; then extract key points in the vectorized edge map, and use the extracted key points as auxiliary information;

3)对所述紧凑表示和所述辅助信息分别进行熵编码无损压缩,获得两路码流;3) Entropy coding and lossless compression are respectively performed on the compact representation and the auxiliary information to obtain two code streams;

4)对两路码流进行初步解码,获得边缘图以及辅助信息;4) Preliminarily decode the two code streams to obtain edge maps and auxiliary information;

5)将解码得到的边缘图以及辅助信息输入生成神经网络中,进行网络的前向计算;5) Input the edge map and auxiliary information obtained by decoding into the generating neural network, and perform forward calculation of the network;

6)步骤5)得到的计算结果与对应原始图片进行损失函数计算,并将计算的损失反向传播到神经网络进行网络权值更新;6) The calculation result obtained in step 5) and the corresponding original picture are subjected to loss function calculation, and the calculated loss is back-propagated to the neural network to update the network weights;

7)重复步骤2)~6)直到神经网络的损失收敛,得到针对人眼视觉重建图像任务的双路码流解码器。7) Steps 2) to 6) are repeated until the loss of the neural network converges, and a dual-channel code stream decoder for the task of reconstructing images for human vision is obtained.

一种紧凑表示码流解码器训练生成方法,其步骤包括:A method for training and generating a compact representation code stream decoder, the steps of which include:

1)提取各样本图片的边缘图,并对边缘图进行矢量化作为驱动机器视觉任务的紧凑表示;1) Extract the edge map of each sample image, and vectorize the edge map as a compact representation for driving machine vision tasks;

2)对所述紧凑表示进行熵编码无损压缩,获得一路码流;2) Entropy coding lossless compression is performed on the compact representation to obtain a code stream;

3)对码流进行初步解码,获得边缘图;3) Preliminarily decode the code stream to obtain an edge map;

4)将解码得到的边缘图输入生成神经网络中,进行网络的前向计算;4) Input the edge map obtained by decoding into the generating neural network, and perform the forward calculation of the network;

5)根据步骤4)得到的计算结果与对应原始图片进行损失函数计算,并将计算的损失反向传播到神经网络进行网络权值更新;5) Calculate the loss function according to the calculation result obtained in step 4) and the corresponding original picture, and back-propagate the calculated loss to the neural network to update the network weights;

6)重复步骤2)~5)直到神经网络的损失收敛,得到针对机器视觉重建图像任务的紧凑表示码流解码器。6) Repeat steps 2) to 5) until the loss of the neural network converges to obtain a compact representation code stream decoder for the task of reconstructing images in machine vision.

一种可扩展人机协同图像编码系统,其特征在于,包括编码器、双路码流解码器和紧凑表示码流解码器;其中,An extensible human-machine collaborative image coding system, characterized in that it includes an encoder, a dual-channel code stream decoder and a compact representation code stream decoder; wherein,

编码器,用于提取图片的边缘图;然后利用贝塞尔曲线对边缘图进行矢量化,作为驱动机器视觉任务的紧凑表示;然后在矢量化后的边缘图中进行关键点提取,将提取的关键点作为辅助信息;然后对所述紧凑表示和所述辅助信息分别进行熵编码无损压缩,获得两路码流;The encoder is used to extract the edge map of the picture; then use the Bezier curve to vectorize the edge map as a compact representation for driving machine vision tasks; then perform key point extraction in the vectorized edge map, and extract the extracted The key points are used as auxiliary information; then the compact representation and the auxiliary information are respectively subjected to entropy coding lossless compression to obtain two code streams;

双路码流解码器,用于对两路码流进行解码得到边缘图以及辅助信息,然后对解码得到的边缘图以及辅助信息进行前向传递,得到用于人眼视觉重建图像任务的解码图像;Two-channel code stream decoder is used to decode the two-channel code stream to obtain the edge map and auxiliary information, and then forward the decoded edge map and auxiliary information to obtain the decoded image for the task of human visual reconstruction. ;

紧凑表示码流解码器,用于对边缘图对应的码流进行解码得到边缘图,然后对解码得到的边缘图进行前向传递,得到用于机器视觉重建图像任务的解码图像。The compact representation code stream decoder is used to decode the code stream corresponding to the edge map to obtain the edge map, and then forward the decoded edge map to obtain the decoded image for the image reconstruction task of machine vision.

本发明使用贝塞尔曲线提取图像边缘图并矢量化作为驱动机器视觉任务的紧凑表示,并且根据矢量化的边缘图中各直线以及曲线的位置、参数等信息,得到关键点坐标,在原图像上提取关键点,进行编码生成对应两路码流,如附图2所示。The present invention uses Bezier curve to extract image edge map and vectorize it as a compact representation for driving machine vision tasks, and obtains key point coordinates according to information such as the positions, parameters and other information of each line and curve in the vectorized edge map, which is displayed on the original image. Extract key points, and perform coding to generate corresponding two-channel code streams, as shown in FIG. 2 .

接下来描述本发明方法的主要步骤。The main steps of the method of the present invention are described next.

步骤1:收集一批图片,进行边缘图的提取,收集的图片则被视作网络输出的目标被保存。Step 1: Collect a batch of pictures, extract the edge map, and save the collected pictures as the target output by the network.

步骤2:利用贝塞尔曲线对边缘图进行矢量化。在矢量化后的边缘图中采样关键点作为辅助信息(边缘图在经过矢量化之后,被表示为直线和曲线;根据直线和曲线的位置,参数等信息,可计算出关键点的坐标,该坐标被用于在原采集得到的图像中提取关键点)。关键点的提取分为两种模式:直线模式和贝塞尔曲线模式。若矢量化后的线为直线段,则使用直线模式,反之使用贝塞尔曲线模式。直线模式下,若该直线段与水平线夹角大于45°,则过线段中点在水平线上左右等距采样两个颜色值记下;若小于等于45°,则过线段中点在竖直线上上下等距采样两个颜色值记下。贝塞尔曲线模式下,将平行于贝塞尔曲线起点终点连线的线与该贝塞尔曲线的切点记下,在贝塞尔曲线模式下,一段边缘使用贝塞尔曲线进行描述,如说明书附图2(c)中所示,贝塞尔曲线的起点为Ps,终点为Pt,连接Ps和Pt得到直线PsPt,做该直线PsPt和曲线的切线,取切点。若切线与水平线夹角大于45°,则过切点在水平线上采样曲线内的一个颜色值记下;若小于等于45°,则过切点在竖直线上采样曲线内的一个颜色值记下。Step 2: Vectorize the edge map using Bezier curves. Sampling key points in the vectorized edge map as auxiliary information (after the edge map is vectorized, it is represented as a straight line and a curve; The coordinates are used to extract keypoints in the original acquired image). The extraction of key points is divided into two modes: straight line mode and Bezier curve mode. If the vectorized line is a straight line segment, use the straight line mode, otherwise use the Bezier curve mode. In the straight line mode, if the angle between the straight line segment and the horizontal line is greater than 45°, the midpoint of the line segment is equidistantly sampled on the left and right of the horizontal line and two color values are recorded and recorded; if it is less than or equal to 45°, the midpoint of the line segment is on the vertical line. Sample two color values equidistantly up and down. In Bezier curve mode, write down the tangent point between the line parallel to the start point and end point of the Bezier curve and the Bezier curve. In Bezier curve mode, an edge is described by a Bezier curve. As shown in Figure 2(c) of the specification, the starting point of the Bezier curve is Ps, and the end point is Pt. Connect Ps and Pt to obtain a straight line PsPt, and make a tangent between the straight line PsPt and the curve, and take the tangent point. If the angle between the tangent line and the horizontal line is greater than 45°, record a color value in the sampling curve of the overcut point on the horizontal line; if it is less than or equal to 45°, record a color value in the sampling curve of the overcut point on the vertical line Down.

步骤3:对作为紧凑表示的矢量化的边缘图以及关键点辅助信息进行熵编码无损压缩,获得两路码流。Step 3: Perform entropy coding lossless compression on the vectorized edge map as a compact representation and the auxiliary information of key points to obtain two code streams.

步骤4:对两路码流进行初步解码,获得边缘图以及关键点辅助信息。Step 4: Preliminarily decode the two code streams to obtain edge maps and auxiliary information of key points.

步骤5:对于双路码流的解码器,将边缘图以及对应关键点辅助信息作为输入送入对应生成神经网络(可以为Pixel2Pixel网络)中,进行网络的前向计算;对于针对视觉驱动的紧凑表示码流的解码器,将边缘图作为输入送入对应生成神经网络中,进行网络的前向计算。Step 5: For the decoder of the dual code stream, the edge map and the auxiliary information of the corresponding key points are sent as input into the corresponding generative neural network (which can be the Pixel2Pixel network), and the forward calculation of the network is performed; for the vision-driven compact The decoder representing the code stream takes the edge map as input into the corresponding generative neural network, and performs the forward calculation of the network.

步骤6:步骤5得到计算结果,与原始图像进行损失函数计算。Step 6: In step 5, the calculation result is obtained, and the loss function is calculated with the original image.

步骤7:将计算的损失反向传播到两个网络神经网络各层,以更新权值,在下次迭代中使得结果更接近目标效果。Step 7: Back-propagate the calculated loss to the layers of the two neural networks to update the weights to make the result closer to the target effect in the next iteration.

步骤8:重复步骤2-步骤7直到两个神经网络的损失收敛。由此得到了针对双路码流的解码器网络以及针对视觉驱动的紧凑表示码流的解码器网络。Step 8: Repeat steps 2-7 until the losses of both neural networks converge. This results in a decoder network for dual codestreams and a decoder network for visually-driven compact representation of codestreams.

与现有技术相比,本发明的积极效果为:Compared with the prior art, the positive effects of the present invention are:

本发明为一种可扩展的图像有损压缩方案,不仅保证了人眼视觉质量,也保证了机器视觉任务的性能。与传统的图像有损压缩方法中仅输出单一码流不同,本发明中的压缩方案生成两部分码流:视觉驱动的紧凑表示码流以及辅助信息码流。具体而言,本发明使用贝塞尔曲线表示图像的边缘信息作为基础码流,在此基础上提取图像中的关键点作为补充码流,使用以上两种码流表征图像,从而对图像进行高效压缩。此外,本发明采用生成神经网络模型构建解码器,通过输入基础码流或者共同输入基础和补充个码流,分别生成出针对机器视觉的图像以及针对人眼视觉的图像,两者的重建质量都达到了优异效果。The present invention is an extensible image lossy compression scheme, which not only ensures the visual quality of human eyes, but also ensures the performance of machine vision tasks. Unlike traditional image lossy compression methods that only output a single code stream, the compression scheme in the present invention generates two code streams: a visually driven compact representation code stream and an auxiliary information code stream. Specifically, the present invention uses the Bezier curve to represent the edge information of the image as the basic code stream, and on this basis, extracts key points in the image as the supplementary code stream, and uses the above two code streams to characterize the image, so as to efficiently perform the image processing. compression. In addition, the present invention uses a generative neural network model to construct a decoder, and generates images for machine vision and images for human vision by inputting the basic code stream or jointly inputting the basic and supplementary code streams, and the reconstruction quality of both is the same. Excellent results were achieved.

以下数据展示了本方法对比现有的JPEG图像压缩方法的性能改进。本测试衡量在极低码率下,在人脸关键点检测任务上,不同的方法的准确度(错误率),以及由被试打分的人眼主观质量:The following data demonstrate the performance improvement of this method over existing JPEG image compression methods. This test measures the accuracy (error rate) of different methods on the face key point detection task at a very low bit rate, as well as the subjective quality of the human eye scored by the subjects:

Figure BDA0002351110570000051
Figure BDA0002351110570000051

可见,在更低码率下,本发明能够达到更好的性能。It can be seen that at a lower code rate, the present invention can achieve better performance.

附图说明Description of drawings

图1为可扩展的人机协同图像编码器的框架。Figure 1 shows the framework of a scalable human-machine collaborative image encoder.

图2为矢量化的图像边缘图关键点辅助信息提取方法;Fig. 2 is a method for extracting auxiliary information of key points of a vectorized image edge map;

(a)矢量化边缘图,(b)直线(>450),(c)直线(﹤450),(d)贝塞尔曲线。(a) Vectorized edge map, (b) straight line (>450 ), (c) straight line (<450 ), (d) Bezier curve.

具体实施方式Detailed ways

为了对本发明的技术方法进一步阐述,下面结合说明书附图和具体实例,对本发明的可扩展的人机协同图像编码器进行进一步的详细说明。In order to further illustrate the technical method of the present invention, the scalable human-machine collaborative image encoder of the present invention will be further described in detail below with reference to the accompanying drawings and specific examples of the present invention.

本实例将重点详细阐述该技术方法中编码器编码流程的和解码器生成网络的训练过程。假设目前我们已经构建了所需的解码器生成网络,并且有N张训练图像{I1,I2,…,IN}作为训练数据。This example will focus on the detailed description of the encoding process of the encoder and the training process of the decoder generation network in this technical method. Suppose so far we have built the required decoder generation network and haveN training images {I1 ,I2 ,...,IN } as training data.

一、训练过程:1. Training process:

步骤1:将{I1,I2,…,IN}中的每一张图像边缘图经过矢量化后的图记为{E1,E2,…,EN},对应关键点辅助信息记为{C1,C2,…,CN}。Step 1: Denote the vectorized image of each image edge map in {I1 ,I2 ,...,IN } as {E1 ,E2 ,...,EN }, corresponding to the auxiliary information of key points Denoted as {C1 ,C2 ,…,CN }.

步骤2:根据附图1所示,将{E1,E2,…,EN}和{C1,C2,…,CN}送入生成网络中进行前向传递。对于针对机器视觉任务的解码器生成网络,输入仅为{E1,E2,…,EN}。Step 2: According to Fig. 1, {E1 , E2 ,..., EN } and {C1 , C2 ,..., CN } are sent into the generation network for forward transmission. For decoder-generating networks for machine vision tasks, the input is simply {E1 ,E2 ,…,EN }.

步骤3:前向传递得到输出

Figure BDA0002351110570000052
计算输出与{I1,I2,…,IN}的损失误差。Step 3: Forward pass to get output
Figure BDA0002351110570000052
Calculates the loss error of the output and {I1 ,I2 ,...,IN }.

步骤4:获得误差值后,对网络进行误差值的反向传播,以训练网络更新模型权值。Step 4: After the error value is obtained, back-propagate the error value to the network to train the network to update the model weights.

步骤5:重复步骤1-步骤4直到神经网络收敛。Step 5: Repeat steps 1-4 until the neural network converges.

二、编解码过程:Second, the encoding and decoding process:

步骤1:提取图像的I的边缘图,并将该边缘图通过贝塞尔曲线矢量化后的图存下,记为E。Step 1: Extract the edge map of I of the image, and save the edge map as a vectorized image by Bezier curve, denoted as E.

步骤2:根据矢量化后的边缘图提取关键点辅助信息。通过遍历其所有线段,根据其线段模式采样关键点。将提取出的关键点辅助信息记为C。Step 2: Extract key point auxiliary information according to the vectorized edge map. Samples keypoints according to their segment pattern by iterating over all their segments. Denote the extracted auxiliary information of key points as C.

步骤3:按照可扩展矢量图形(Scalable Vectore Graphic,SVG)格式对E进行编码,再与C进行熵编码,得到两路码流,分别记为BE和BCStep 3: encode E according to the Scalable Vectore Graphic (SVG) format, and then perform entropy encoding with C to obtain two code streams, which are denoted as BE and BC respectively.

步骤4:根据需求选择解码器对不同等级码流解码。对于机器视觉任务,仅需要解码器解码BE,得到矢量化后的边缘图E。将其输入进对应网络进行前向传递,得到解码图像

Figure BDA0002351110570000061
对于人眼视觉任务,需要解码BE和BC,得到E和C,并送入对应生成网络进行前向传递,得到解码图像
Figure BDA0002351110570000062
Step 4: Select a decoder to decode different levels of code streams according to requirements. For machine vision tasks, only the decoder needs to decode BE to get the vectorized edge mapE. Input it into the corresponding network for forward pass to get the decoded image
Figure BDA0002351110570000061
For human vision tasks, it is necessary to decode BE and BC to obtain E and C, and send them to the corresponding generation network for forward transmission to obtain the decoded image
Figure BDA0002351110570000062

显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims (8)

Translated fromChinese
1.一种可扩展人机协同图像编码方法,其步骤包括:1. A scalable human-machine collaborative image coding method, the steps comprising:1)提取各样本图片的边缘图;1) Extract the edge map of each sample image;2)利用贝塞尔曲线对边缘图进行矢量化,作为驱动机器视觉任务的紧凑表示;然后在矢量化后的边缘图中进行关键点提取,将提取的关键点作为辅助信息;2) Use the Bezier curve to vectorize the edge map as a compact representation for driving machine vision tasks; then extract key points in the vectorized edge map, and use the extracted key points as auxiliary information;3)对所述紧凑表示和所述辅助信息分别进行熵编码无损压缩,获得两路码流;3) Entropy coding and lossless compression are respectively performed on the compact representation and the auxiliary information to obtain two code streams;4)对两路码流进行初步解码,获得边缘图以及辅助信息;4) Preliminarily decode the two code streams to obtain edge maps and auxiliary information;5)对于需生成针对人眼视觉的重建图像任务,则将解码得到的边缘图以及辅助信息输入生成神经网络中,进行网络的前向计算;对于需生成针对机器视觉的重建图像任务,则将解码得到的边缘图输入生成神经网络中,进行网络的前向计算;5) For the task of generating reconstructed images for human vision, input the decoded edge map and auxiliary information into the generating neural network, and perform forward calculation of the network; for the task of generating reconstructed images for machine vision, then The decoded edge map is input into the generating neural network, and the forward calculation of the network is performed;6)根据步骤5)得到的计算结果与对应原始图片进行损失函数计算,并将计算的损失反向传播到神经网络进行网络权值更新;6) Calculate the loss function according to the calculation result obtained in step 5) and the corresponding original picture, and back-propagate the calculated loss to the neural network to update the network weights;7)重复步骤2)~6)直到神经网络的损失收敛,得到针对人眼视觉重建图像任务的双路码流解码器或针对机器视觉重建图像任务的紧凑表示码流解码器;7) Repeat steps 2) to 6) until the loss of the neural network converges, and obtain a dual-channel code stream decoder for the task of reconstructing images with human vision or a compact representation code stream decoder for the task of reconstructing images with machine vision;8)对于一待处理图像I,提取图像I的边缘图,并将该边缘图通过贝塞尔曲线矢量化后的图存下,记为E;根据矢量化后的边缘图提取关键点辅助信息,将提取出的关键点辅助信息记为C;按照可扩展矢量图形格式对E进行编码,再与C进行熵编码,得到两路码流,分别记为BE和BC8) For a to-be-processed image I, extract the edge map of the image I, and save the edge map through the vectorized image of the Bezier curve, denoted as E; Extract key point auxiliary information according to the vectorized edge map , the extracted key point auxiliary information is denoted as C; E is encoded according to the scalable vector graphics format, and then entropy coding is carried out with C to obtain two code streams, which are denoted as BE and BC respectively;9)根据任务需求选择双路码流解码器或紧凑表示码流解码器对收到的码流解码,重建图像。9) Select a dual-channel code stream decoder or a compact representation code stream decoder according to the task requirements to decode the received code stream and reconstruct the image.2.如权利要求1所述的方法,其特征在于,提取所述关键点的方法为:若所述边缘图矢量化后的线为直线段,则使用直线模式提取关键点,否则使用贝塞尔曲线模式提取关键点。2. The method according to claim 1, wherein the method for extracting the key points is: if the vectorized line of the edge map is a straight line segment, use a straight line mode to extract the key points, otherwise use a Besse Extract keypoints in Er curve mode.3.如权利要求2所述的方法,其特征在于,使用所述直线模式提取关键点的方法为:若直线段与水平线夹角大于设定角度,则过线段中点在水平线上左右等距采样两个颜色值;若小于或等于设定角度,则过线段中点在竖直线上上下等距采样两个颜色值;使用所述贝塞尔曲线模式提取关键点的方法为:将平行于贝塞尔曲线起点与终点连线的线与贝塞尔曲线的切点记下,若切线与水平线夹角大于设定角度,则过切点在水平线上采样曲线内的一个颜色值;若小于或等于设定角度,则过切点在竖直线上采样曲线内的一个颜色值。3. The method according to claim 2, wherein the method for extracting key points using the straight line mode is: if the included angle between the straight line segment and the horizontal line is greater than the set angle, then the midpoint of the line segment is equidistant from left to right on the horizontal line. Two color values are sampled; if it is less than or equal to the set angle, the two color values are sampled at equal distances up and down the vertical line through the midpoint of the line segment; the method for extracting key points using the Bezier curve mode is: parallel Record the tangent point between the line connecting the start point and the end point of the Bezier curve and the Bezier curve. If the angle between the tangent line and the horizontal line is greater than the set angle, a color value in the curve will be sampled on the horizontal line through the tangent point; Less than or equal to the set angle, a color value within the sampling curve of the overtangent point is on the vertical line.4.如权利要求3所述的方法,其特征在于,所述设定角度为45°。4. The method of claim 3, wherein the set angle is 45°.5.如权利要求1所述的方法,其特征在于,对于机器视觉任务,步骤8)中将边缘图对应的码流BE发送给紧凑表示码流解码器,步骤9)中,紧凑表示码流解码器对码流BE解码得到矢量化后的边缘图E并对其进行前向传递,得到解码图像
Figure FDA0003360606150000021
对于人眼视觉任务,步骤8)将码流BE和BC发送给双路码流解码器,步骤9)中,双路码流解码器对码流BE和BC解码得到E和C并对其进行前向传递,得到解码图像
Figure FDA0003360606150000022
5. method as claimed in claim 1 is characterized in that, for machine vision task, in step 8), the code stream BE corresponding to edge map is sent to compact representation code stream decoder, in step 9), compact representation code. The stream decoder decodes the code stream BE to obtain the vectorized edge map E and forwards it to obtain the decoded image
Figure FDA0003360606150000021
For the human eye vision task, step 8) sends the code streams BE and BC to the dual-channel code stream decoder, and in step 9), the dual-channel code stream decoder decodes the code streams BE and BC to obtain E and C and forward pass it to get the decoded image
Figure FDA0003360606150000022
6.一种双路码流解码器训练生成方法,其步骤包括:6. A method for training and generating a dual-channel code stream decoder, the steps comprising:1)提取各样本图片的边缘图;1) Extract the edge map of each sample image;2)利用贝塞尔曲线对边缘图进行矢量化,作为驱动机器视觉任务的紧凑表示;然后在矢量化后的边缘图中进行关键点提取,将提取的关键点作为辅助信息;2) Use the Bezier curve to vectorize the edge map as a compact representation for driving machine vision tasks; then extract key points in the vectorized edge map, and use the extracted key points as auxiliary information;3)对所述紧凑表示和所述辅助信息分别进行熵编码无损压缩,获得两路码流;3) Entropy coding and lossless compression are respectively performed on the compact representation and the auxiliary information to obtain two code streams;4)对两路码流进行初步解码,获得边缘图以及辅助信息;4) Preliminarily decode the two code streams to obtain edge maps and auxiliary information;5)将解码得到的边缘图以及辅助信息输入生成神经网络中,进行网络的前向计算;5) Input the edge map and auxiliary information obtained by decoding into the generating neural network, and perform forward calculation of the network;6)根据步骤5)得到的计算结果与对应原始图片进行损失函数计算,并将计算的损失反向传播到神经网络进行网络权值更新;6) Calculate the loss function according to the calculation result obtained in step 5) and the corresponding original picture, and back-propagate the calculated loss to the neural network to update the network weights;7)重复步骤2)~6)直到神经网络的损失收敛,得到针对人眼视觉重建图像任务的双路码流解码器。7) Steps 2) to 6) are repeated until the loss of the neural network converges, and a dual-channel code stream decoder for the task of reconstructing images for human vision is obtained.7.一种紧凑表示码流解码器训练生成方法,其步骤包括:7. A method for training and generating a compact representation code stream decoder, the steps comprising:1)提取各样本图片的边缘图,并对边缘图进行矢量化作为驱动机器视觉任务的紧凑表示;1) Extract the edge map of each sample image, and vectorize the edge map as a compact representation for driving machine vision tasks;2)对所述紧凑表示进行熵编码无损压缩,获得一路码流;2) Entropy coding lossless compression is performed on the compact representation to obtain a code stream;3)对码流进行初步解码,获得边缘图;3) Preliminarily decode the code stream to obtain an edge map;4)将解码得到的边缘图输入生成神经网络中,进行网络的前向计算;4) Input the edge map obtained by decoding into the generating neural network, and perform the forward calculation of the network;5)根据步骤4)得到的计算结果与对应原始图片进行损失函数计算,并将计算的损失反向传播到神经网络进行网络权值更新;5) According to the calculation result obtained in step 4) and the corresponding original picture, carry out loss function calculation, and back-propagate the calculated loss to the neural network to update the network weights;6)重复步骤2)~5)直到神经网络的损失收敛,得到针对机器视觉重建图像任务的紧凑表示码流解码器。6) Repeat steps 2) to 5) until the loss of the neural network converges, and obtain a compact representation code stream decoder for the task of reconstructing images in machine vision.8.一种可扩展人机协同图像编码系统,其特征在于,包括编码器、基于权利要求6所述方法训练得到的双路码流解码器和基于权利要求7所述方法训练得到的紧凑表示码流解码器;其中,8. A scalable human-machine collaborative image coding system, characterized in that it comprises an encoder, a dual-channel code stream decoder obtained by training based on the method of claim 6 and a compact representation obtained by training based on the method according to claim 7 code stream decoder; wherein,编码器,用于提取图片的边缘图;然后利用贝塞尔曲线对边缘图进行矢量化,作为驱动机器视觉任务的紧凑表示;然后在矢量化后的边缘图中进行关键点提取,将提取的关键点作为辅助信息;然后对所述紧凑表示和所述辅助信息分别进行熵编码无损压缩,获得两路码流;The encoder is used to extract the edge map of the picture; then use the Bezier curve to vectorize the edge map as a compact representation for driving machine vision tasks; then perform key point extraction in the vectorized edge map, and extract the extracted The key points are used as auxiliary information; then the compact representation and the auxiliary information are respectively subjected to entropy coding lossless compression to obtain two code streams;双路码流解码器,用于对两路码流进行解码得到边缘图以及辅助信息,然后对解码得到的边缘图以及辅助信息进行前向传递,得到用于人眼视觉重建图像任务的解码图像;Two-channel code stream decoder is used to decode the two-channel code stream to obtain the edge map and auxiliary information, and then forward the decoded edge map and auxiliary information to obtain the decoded image for the task of human visual reconstruction. ;紧凑表示码流解码器,用于对边缘图对应的码流进行解码得到边缘图,然后对解码得到的边缘图进行前向传递,得到用于机器视觉重建图像任务的解码图像。The compact representation code stream decoder is used to decode the code stream corresponding to the edge map to obtain the edge map, and then forward the decoded edge map to obtain the decoded image for the image reconstruction task of machine vision.
CN201911415561.7A2019-12-312019-12-31Method and system for encoding extensible man-machine cooperative image and method for training decoderActiveCN113132755B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201911415561.7ACN113132755B (en)2019-12-312019-12-31Method and system for encoding extensible man-machine cooperative image and method for training decoder

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201911415561.7ACN113132755B (en)2019-12-312019-12-31Method and system for encoding extensible man-machine cooperative image and method for training decoder

Publications (2)

Publication NumberPublication Date
CN113132755A CN113132755A (en)2021-07-16
CN113132755Btrue CN113132755B (en)2022-04-01

Family

ID=76770772

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201911415561.7AActiveCN113132755B (en)2019-12-312019-12-31Method and system for encoding extensible man-machine cooperative image and method for training decoder

Country Status (1)

CountryLink
CN (1)CN113132755B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113949880B (en)*2021-09-022022-10-14北京大学Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method
CN116366863A (en)*2023-03-072023-06-30上海大学Image feature compression and decompression method with cooperative human eye vision and machine vision

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106846253A (en)*2017-02-142017-06-13深圳市唯特视科技有限公司A kind of image super-resolution rebuilding method based on reverse transmittance nerve network
CN107610140A (en)*2017-08-072018-01-19中国科学院自动化研究所Near edge detection method, device based on depth integration corrective networks
CN108364262A (en)*2018-01-112018-08-03深圳大学A kind of restored method of blurred picture, device, equipment and storage medium
CN109255794A (en)*2018-09-052019-01-22华南理工大学A kind of full convolution edge feature detection method of standard component depth
CN109920049A (en)*2019-02-262019-06-21清华大学 Method and system for fine 3D face reconstruction assisted by edge information
WO2019141258A1 (en)*2018-01-182019-07-25杭州海康威视数字技术股份有限公司Video encoding method, video decoding method, device, and system
US10482603B1 (en)*2019-06-252019-11-19Artificial Intelligence, Ltd.Medical image segmentation using an integrated edge guidance module and object segmentation network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7861207B2 (en)*2004-02-252010-12-28Mentor Graphics CorporationFragmentation point and simulation site adjustment for resolution enhancement techniques

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106846253A (en)*2017-02-142017-06-13深圳市唯特视科技有限公司A kind of image super-resolution rebuilding method based on reverse transmittance nerve network
CN107610140A (en)*2017-08-072018-01-19中国科学院自动化研究所Near edge detection method, device based on depth integration corrective networks
CN108364262A (en)*2018-01-112018-08-03深圳大学A kind of restored method of blurred picture, device, equipment and storage medium
WO2019141258A1 (en)*2018-01-182019-07-25杭州海康威视数字技术股份有限公司Video encoding method, video decoding method, device, and system
CN109255794A (en)*2018-09-052019-01-22华南理工大学A kind of full convolution edge feature detection method of standard component depth
CN109920049A (en)*2019-02-262019-06-21清华大学 Method and system for fine 3D face reconstruction assisted by edge information
US10482603B1 (en)*2019-06-252019-11-19Artificial Intelligence, Ltd.Medical image segmentation using an integrated edge guidance module and object segmentation network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Real-Time Deep Image Super-Resolution via Global Context Aggregation and Local Queue Jumping;Yueyu Hu, Jiaying Liu等;《IEEE》;20171213;全文*
TOWARDS IMAGE UNDERSTANDING FROM DEEP COMPRESSION WITHOUT DECODING;Robert Torfason等;《ICLR 2018》;20180316;全文*
基于神经网络的图像视频编码;贾川民等;《电信科学》;20190520(第05期);全文*
边缘增强深层网络的图像超分辨率重建;谢珍珠等;《中国图象图形学报》;20180116(第01期);全文*

Also Published As

Publication numberPublication date
CN113132755A (en)2021-07-16

Similar Documents

PublicationPublication DateTitle
CN110290387B (en) A Generative Model-Based Image Compression Method
CN103607591A (en)Image compression method combining super-resolution reconstruction
CN115880762B (en)Human-machine hybrid vision-oriented scalable face image coding method and system
CN113132727A (en)Scalable machine vision coding method based on image generation
CN112492313B (en)Picture transmission system based on generation countermeasure network
CN105392009B (en)Low bit rate image sequence coding method based on block adaptive sampling and super-resolution rebuilding
CN103686177B (en)A kind of compression of images, the method, apparatus of decompression and picture system
CN113132755B (en)Method and system for encoding extensible man-machine cooperative image and method for training decoder
CN114422795B (en)Face video encoding method, decoding method and device
CN114501034B (en)Image compression method and medium based on discrete Gaussian mixture super prior and Mask
CN118101972A (en) End-to-end instance-separable semantic-image joint compression coding and decoding system and method
CN115052147A (en)Human body video compression method and system based on generative model
CN116980611A (en)Image compression method, apparatus, device, computer program product, and medium
CN117692662A (en) A point cloud encoding and decoding method based on double octree structure
CN112907494A (en)Non-pairing human face image translation method based on self-supervision learning
CN118864690A (en) High-fidelity 3D image reconstruction method based on brain EEG signals
Yin et al.Enabling translatability of generative face video coding: A unified face feature transcoding framework
CN113660386A (en) A color image encryption compression and super-resolution reconstruction system and method
WO2023143101A1 (en)Facial video encoding method and apparatus, and facial video decoding method and apparatus
CN119152051A (en)Three-dimensional medical image compression system for man-machine vision
CN114897189A (en)Model training method, video coding method and decoding method
CN110958417A (en)Method for removing compression noise of video call video based on voice clue
CN106251373A (en)A kind of single width color image compression coded method merging super-resolution technique and JPEG2000 standard
CN119484858A (en) A neural representation video coding method based on high-frequency feature enhancement
CN112866715B (en)Universal video compression coding system supporting man-machine hybrid intelligence

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp