CN111263163A

Movatterモバイル変換

Info

Publication number: CN111263163A
Application number: CN202010104794.1A
Authority: CN
Inventors: 冯落落; 李锐; 乔廷慧
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2020-06-09

Abstract

The invention provides a method for realizing a depth video compression framework based on a mobile phone platform, which belongs to the fields of image classification, target detection, face recognition and the like, and comprises the following steps: s1, building the whole video compression network, training the model by using videos of a plurality of different scenes to obtain a trained large network, and storing the graph model and parameter information of the network; s2, pruning and quantifying the trained model; s3, pruning and quantization are performed for each layer, and the weights in the entire network are huffman coded using huffman coding and then stored. Under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the model is about 1/100 times of the original model, and a video compression framework based on depth learning can be conveniently deployed in mobile phone equipment.

Description

Method for realizing depth video compression framework based on mobile phone platform

Technical Field

The invention relates to the fields of image classification, target detection, face recognition and the like, in particular to a method for realizing a depth video compression framework based on a mobile phone platform.

Background

Video is now becoming the primary medium for mass information dissemination. Especially since the development of media, video data is growing explosively. Video compression methods based on deep learning have become the mainstream direction of recent research. Video compression methods based on deep learning have become strong competitors to h.264 and h.265, which are currently the mainstream methods.

However, the video compression method based on deep learning usually has a very large number of parameters, and the mobile phone equipment often has limited memory capacity and computing power, so that the video compression method cannot be deployed into the mobile phone equipment at all, and therefore how to compress the deep learning video compression algorithm deployed into the mobile phone becomes a key problem.

Disclosure of Invention

The technical task of the invention is to solve the defects that the existing deep learning video compression framework is very large and is difficult to be deployed in embedded equipment such as a mobile phone, and the like, and provide a method for realizing the deep video compression framework based on a mobile phone platform. Under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the video compression framework based on depth learning is deployed in a mobile phone.

The technical scheme adopted by the invention for solving the technical problems is as follows:

the patent mainly provides that a video compression framework based on deep learning with excellent performance is deployed on a mobile phone platform by utilizing pruning, quantization and Huffman coding.

1. A method for realizing a depth video compression framework based on a mobile phone platform comprises the following steps:

s1, building a whole video compression network, training a model by using videos of a plurality of different scenes, then training the model by using videos of a plurality of 5000 different scenes, iterating for 100 ten thousand times in total to obtain a trained large network, and then storing a graph model and parameter information of the network;

s2, pruning and quantifying the trained model;

s3, pruning and quantization are performed for each layer separately, and in order to further reduce the storage, the weights in the entire network are huffman coded using huffman coding and then stored.

Preferably, the video compression network built by using the tensoflow framework in thestep 1 includes 6 networks of optical flow Net, MV Encoder Net, MV Decoder Net, Motion Compensation Net, Residual Encoder Net and Residual Decoder Net, and the working process is as follows:

s101, splitting the video into each frame of picture, inputting the current frame and the previous reconstructed frame to an Optical flow network Optical FlowNet, and obtaining a motion vector of the current frame;

s102, coding the motion vector through a motion vector coding network MV Encoder Net to obtain a coded result,

s103, quantizing the coded result to obtain a quantized result, wherein the quantized result is used as one of contents required to be stored in the current frame;

s104, inputting the result after passing through the Motion vector decoding network MV Decoder Net, namely the reconstructed Motion vector of the current frame and the picture of the previous reconstructed frame into a Motion compensation network Motion compensation Net to obtain a predicted frame of the current frame;

s105, subtracting the real frame and the predicted frame to obtain residual error information r which cannot be included in the predicted frame_t；

S106, encoding Residual error encoder net, quantizing Q, entropy encoding and storing Residual error information, then decoding the Residual error encoder net to obtain a Residual error reconstruction result, and adding the Residual error reconstruction result and a predicted frame to obtain a final reconstruction frame;

s107, the compressed video needs to store the motion vector encoded in step S103 and the residual encoded in step S106.

Preferably, thestep 2 comprises the following steps:

s201, pruning is carried out firstly, all data with absolute value less than 0.5 are pruned by visualizing the trained weight of each layer, so as to obtain a sparse matrix, the obtained sparse matrix is stored,

the value stored by indexing this absolute position is changed to be the relative value diff which indicates the offset of the current value from the previous value, the maximum offset is set to be 8, thus 3 bits are used to store each offset, and in addition, the position of 12 is supplemented with a number of 0, so that when idx is 15, the offset is 3;

s202, after pruning is finished, quantizing the data after pruning is finished.

Preferably, in step S201, the CSR is used to store the matrix.

Scheme preferably, in step S202, quantization of the matrix is performed using a conventional K-mean algorithm.

Preferably, in step S202, the K-mean algorithm is specifically as follows:

firstly, selecting an initial value in K-means, then sampling, wherein the used K is 11, namely 11 points are selected;

then training by using a K-mean algorithm to obtain final 11 central points, clustering data into corresponding clusters, assuming that K =4 is used, then using the K-mean clustering to respectively obtain clusters of each data, respectively obtaining 4 cluster centers, and then only storing the 4 clusters, wherein each data index needs to be stored once;

after the quantization is finished, the model needs to be optimized, each parameter is subjected to inverse derivation, the derivatives of each cluster are added, and then the quantized parameter is subjected to gradient reduction, param-lr gradient, by using the addition gradient.

Scheme preferably, the initial value in K-means is selected using a data density based method, i.e. based on the frequency of occurrence of data as the probability of selection, and then sampling is performed.

Compared with the prior art, the method for realizing the depth video compression framework based on the mobile phone platform has the following beneficial effects that:

under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the model is about 1/100 times of the original model, and a video compression framework based on depth learning can be conveniently deployed in mobile phone equipment.

Drawings

To more clearly describe the working principle of the automatic spraying combined dust-catching net according to the present invention, a simplified diagram will be attached for further explanation.

FIG. 1 is a schematic diagram of a deep learning video compression framework used by the present invention;

FIG. 2 is a schematic illustration of the index number storage of the present invention;

FIG. 3 is a schematic diagram of matrix quantization using the K-mean algorithm of the present invention;

FIG. 4 is a memory map of the CSR sparse matrix of the present invention.

The reference numerals in the figures denote:

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the implementation method of a depth video compression framework based on a mobile phone platform according to the present invention includes the following steps:

s1, building a whole video compression network by using a tensoflow framework, wherein the whole video compression network comprises 6 networks of optical flow, MV Encoder Net, MV Decoder Net, Motion Compensation Net, Residual Encoder Net and Residual Decoder Net shown in figure 1, training a model by using 5000 videos of different scenes, and iterating for 100 ten thousand times to obtain a large trained network. The graphical model and parameter information of the network is then stored.

The video compression network in fig. 1 works as follows:

S2, pruning and quantifying the trained model;

then pruning and quantizing the trained model, and gradually pruning and quantizing layer by layer, wherein the method comprises the following specific steps:

s201, pruning is carried out firstly, the trained weight of each layer is visualized, the absolute value of most data of each layer is found to be very small and is close to 0, therefore, all data with the absolute value smaller than 0.5 are pruned, a sparse matrix is obtained, a common CSR method is used for storing the sparse matrix, and the CSR method is used for storing the matrix as shown in figure 4.

As shown in fig. 4, there are 3 × 3 sparse matrices, and then the information we need to store is the reserved values [1, 2, 3, 4, 5, 6], and the column index [0, 2, 2, 0, 1, 2] corresponding to each data, and since we store by row, we can only store those numbers in the value list belonging to one row, we use one list to store [0, 2, 3, 6 ]. Indicating that 1, 2 belong to a row, 3 belongs to a row, and 4, 5, 6 belong to a row. Therefore, in order to store the sparse matrix, a total of 2 a + n +1 data needs to be stored, which is much smaller than n. For storage data compression, we store the absolute position of index [0, 2, 3, 6] as a value, we use a relative value, and thus the number of bits required to store the index number is reduced. As shown in fig. 2, if we store the idx absolute value, the number of bits required for a number is 4 bits. But if we store diff, the number of bits required for a number is 3 bits. diff denotes the offset of the current value from the last value, in order to store the offsets using a smaller number of bits, we set the maximum offset to 8, so we store each offset using 3 bits, and to this effect we complement the 12 position by anumber 0, so that when idx is 15, the offset is 3.

Experiments prove that the memory capacity can be reduced to 1/13 by pruning, and the precision is hardly lost. This further demonstrates that there is a large amount of redundant information in the weights for deep learning.

S202, after pruning is finished, quantizing the pruned data, wherein the quantizing of the matrix is carried out by using a traditional K-mean algorithm.

As shown in fig. 3, the initial value of K-means is selected first, where we use a data density based method, i.e. based on the frequency of occurrence of data as the probability of selection, and then sampling is performed. We use K as 11, i.e. 11 points are selected.

Then, training is carried out by using a K-mean algorithm to obtain the final 11 central points, and then data are clustered into corresponding clusters. Specifically, as shown in fig. 3, it is assumed that K =4 is used, and then K-mean clustering is used to obtain clusters of each data, where different colors in the graph represent different clusters, e.g., blue 2.09, 2.12, 1.92, and 1.87 in the graph are one cluster, and we use them collectively as data 2.00, and 2.00 is obtained by averaging 4 values. Also for the other 3 clusters, we treated in the same way to get 4 cluster centers 2.00, 1.50, 0.00, -1.00, respectively.

Then, only 4 data need to be stored, each data occupies 32 bits, each data index needs to be stored once, but the index only needs to store 2 bits, so that compared with the case that 16 data are stored as 32 bits, the storage amount of the quantized model is only 5/16 of the original storage amount.

After we have quantized, we need to perform a downward optimization on the model, where we separately inverse-derive each parameter, then add the derivatives of each cluster, and then use this sum gradient to perform a gradient descent-param-lr gradient-on the quantized parameter, as shown in fig. 3.

S3, pruning and quantization are performed for each layer separately, and to further reduce storage, we huffman code the weights in the entire network using huffman coding and then store.

Experiments prove that the compressed model only occupies 2/205 of the original model. A model with more than 1G is compressed to be stored in dozens of megabytes. Can be conveniently deployed in mobile phone equipment.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

In addition to the technical features described in the specification, the technology is known to those skilled in the art.

Claims

1. A method for realizing a depth video compression framework based on a mobile phone platform is characterized by comprising the following steps:

s1, building the whole video compression network, training the model by using videos of a plurality of different scenes to obtain a trained large network, and storing the graph model and parameter information of the network;

s2, pruning and quantifying the trained model;

s3, pruning and quantization are performed for each layer, and the weights in the entire network are huffman coded using huffman coding and then stored.

2. The method for implementing a depth video compression framework based on a mobile phone platform according to claim 1, wherein the video compression network built by using a tensoflow framework in step S1 includes 6 networks, namely, optical Flow Net, mvencorder Net, MV Decoder Net, Motion compression Net, Residual encoder Net, and Residual Decoder Net.

3. The method for implementing a depth video compression framework based on a mobile phone platform as claimed in claim 2, wherein the step S1 is as follows:

4. The method for implementing a depth video compression framework based on a mobile phone platform according to claim 1, 2 or 3, wherein the step S2 includes the following steps:

s202, after pruning is finished, quantizing the data after pruning is finished.

5. The method of claim 4, wherein in step S201, the CSR is used to store the matrix.

6. The method as claimed in claim 4, wherein in step S202, matrix quantization is performed using a conventional K-mean algorithm.

7. The method for implementing the depth video compression framework based on the mobile phone platform as claimed in claim 6, wherein in step S202, the K-mean algorithm is specifically as follows:

then training by using a K-mean algorithm to obtain final 11 central points, clustering data into corresponding clusters, assuming that K is 4, then using the K-mean clustering to respectively obtain clusters of each data, respectively obtaining 4 cluster centers, and then only storing the 4 data, wherein each data index also needs to be stored;

8. The method of claim 1, 2, 3, 5, 6 or 7, wherein the initial value of K-means is selected by using a method based on data density, that is, the probability of selection is determined according to the frequency of occurrence of data, and then sampling is performed.