Movatterモバイル変換


[0]ホーム

URL:


CN111263163A - Method for realizing depth video compression framework based on mobile phone platform - Google Patents

Method for realizing depth video compression framework based on mobile phone platform
Download PDF

Info

Publication number
CN111263163A
CN111263163ACN202010104794.1ACN202010104794ACN111263163ACN 111263163 ACN111263163 ACN 111263163ACN 202010104794 ACN202010104794 ACN 202010104794ACN 111263163 ACN111263163 ACN 111263163A
Authority
CN
China
Prior art keywords
video compression
network
net
frame
mobile phone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010104794.1A
Other languages
Chinese (zh)
Inventor
冯落落
李锐
乔廷慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co LtdfiledCriticalJinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN202010104794.1ApriorityCriticalpatent/CN111263163A/en
Publication of CN111263163ApublicationCriticalpatent/CN111263163A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The invention provides a method for realizing a depth video compression framework based on a mobile phone platform, which belongs to the fields of image classification, target detection, face recognition and the like, and comprises the following steps: s1, building the whole video compression network, training the model by using videos of a plurality of different scenes to obtain a trained large network, and storing the graph model and parameter information of the network; s2, pruning and quantifying the trained model; s3, pruning and quantization are performed for each layer, and the weights in the entire network are huffman coded using huffman coding and then stored. Under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the model is about 1/100 times of the original model, and a video compression framework based on depth learning can be conveniently deployed in mobile phone equipment.

Description

Method for realizing depth video compression framework based on mobile phone platform
Technical Field
The invention relates to the fields of image classification, target detection, face recognition and the like, in particular to a method for realizing a depth video compression framework based on a mobile phone platform.
Background
Video is now becoming the primary medium for mass information dissemination. Especially since the development of media, video data is growing explosively. Video compression methods based on deep learning have become the mainstream direction of recent research. Video compression methods based on deep learning have become strong competitors to h.264 and h.265, which are currently the mainstream methods.
However, the video compression method based on deep learning usually has a very large number of parameters, and the mobile phone equipment often has limited memory capacity and computing power, so that the video compression method cannot be deployed into the mobile phone equipment at all, and therefore how to compress the deep learning video compression algorithm deployed into the mobile phone becomes a key problem.
Disclosure of Invention
The technical task of the invention is to solve the defects that the existing deep learning video compression framework is very large and is difficult to be deployed in embedded equipment such as a mobile phone, and the like, and provide a method for realizing the deep video compression framework based on a mobile phone platform. Under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the video compression framework based on depth learning is deployed in a mobile phone.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the patent mainly provides that a video compression framework based on deep learning with excellent performance is deployed on a mobile phone platform by utilizing pruning, quantization and Huffman coding.
1. A method for realizing a depth video compression framework based on a mobile phone platform comprises the following steps:
s1, building a whole video compression network, training a model by using videos of a plurality of different scenes, then training the model by using videos of a plurality of 5000 different scenes, iterating for 100 ten thousand times in total to obtain a trained large network, and then storing a graph model and parameter information of the network;
s2, pruning and quantifying the trained model;
s3, pruning and quantization are performed for each layer separately, and in order to further reduce the storage, the weights in the entire network are huffman coded using huffman coding and then stored.
Preferably, the video compression network built by using the tensoflow framework in thestep 1 includes 6 networks of optical flow Net, MV Encoder Net, MV Decoder Net, Motion Compensation Net, Residual Encoder Net and Residual Decoder Net, and the working process is as follows:
s101, splitting the video into each frame of picture, inputting the current frame and the previous reconstructed frame to an Optical flow network Optical FlowNet, and obtaining a motion vector of the current frame;
s102, coding the motion vector through a motion vector coding network MV Encoder Net to obtain a coded result,
s103, quantizing the coded result to obtain a quantized result, wherein the quantized result is used as one of contents required to be stored in the current frame;
s104, inputting the result after passing through the Motion vector decoding network MV Decoder Net, namely the reconstructed Motion vector of the current frame and the picture of the previous reconstructed frame into a Motion compensation network Motion compensation Net to obtain a predicted frame of the current frame;
s105, subtracting the real frame and the predicted frame to obtain residual error information r which cannot be included in the predicted framet
S106, encoding Residual error encoder net, quantizing Q, entropy encoding and storing Residual error information, then decoding the Residual error encoder net to obtain a Residual error reconstruction result, and adding the Residual error reconstruction result and a predicted frame to obtain a final reconstruction frame;
s107, the compressed video needs to store the motion vector encoded in step S103 and the residual encoded in step S106.
Preferably, thestep 2 comprises the following steps:
s201, pruning is carried out firstly, all data with absolute value less than 0.5 are pruned by visualizing the trained weight of each layer, so as to obtain a sparse matrix, the obtained sparse matrix is stored,
the value stored by indexing this absolute position is changed to be the relative value diff which indicates the offset of the current value from the previous value, the maximum offset is set to be 8, thus 3 bits are used to store each offset, and in addition, the position of 12 is supplemented with a number of 0, so that when idx is 15, the offset is 3;
s202, after pruning is finished, quantizing the data after pruning is finished.
Preferably, in step S201, the CSR is used to store the matrix.
Scheme preferably, in step S202, quantization of the matrix is performed using a conventional K-mean algorithm.
Preferably, in step S202, the K-mean algorithm is specifically as follows:
firstly, selecting an initial value in K-means, then sampling, wherein the used K is 11, namely 11 points are selected;
then training by using a K-mean algorithm to obtain final 11 central points, clustering data into corresponding clusters, assuming that K =4 is used, then using the K-mean clustering to respectively obtain clusters of each data, respectively obtaining 4 cluster centers, and then only storing the 4 clusters, wherein each data index needs to be stored once;
after the quantization is finished, the model needs to be optimized, each parameter is subjected to inverse derivation, the derivatives of each cluster are added, and then the quantized parameter is subjected to gradient reduction, param-lr gradient, by using the addition gradient.
Scheme preferably, the initial value in K-means is selected using a data density based method, i.e. based on the frequency of occurrence of data as the probability of selection, and then sampling is performed.
Compared with the prior art, the method for realizing the depth video compression framework based on the mobile phone platform has the following beneficial effects that:
under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the model is about 1/100 times of the original model, and a video compression framework based on depth learning can be conveniently deployed in mobile phone equipment.
Drawings
To more clearly describe the working principle of the automatic spraying combined dust-catching net according to the present invention, a simplified diagram will be attached for further explanation.
FIG. 1 is a schematic diagram of a deep learning video compression framework used by the present invention;
FIG. 2 is a schematic illustration of the index number storage of the present invention;
FIG. 3 is a schematic diagram of matrix quantization using the K-mean algorithm of the present invention;
FIG. 4 is a memory map of the CSR sparse matrix of the present invention.
The reference numerals in the figures denote:
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the implementation method of a depth video compression framework based on a mobile phone platform according to the present invention includes the following steps:
s1, building a whole video compression network by using a tensoflow framework, wherein the whole video compression network comprises 6 networks of optical flow, MV Encoder Net, MV Decoder Net, Motion Compensation Net, Residual Encoder Net and Residual Decoder Net shown in figure 1, training a model by using 5000 videos of different scenes, and iterating for 100 ten thousand times to obtain a large trained network. The graphical model and parameter information of the network is then stored.
The video compression network in fig. 1 works as follows:
s101, splitting the video into each frame of picture, inputting the current frame and the previous reconstructed frame to an Optical flow network Optical FlowNet, and obtaining a motion vector of the current frame;
s102, coding the motion vector through a motion vector coding network MV Encoder Net to obtain a coded result,
s103, quantizing the coded result to obtain a quantized result, wherein the quantized result is used as one of contents required to be stored in the current frame;
s104, inputting the result after passing through the Motion vector decoding network MV Decoder Net, namely the reconstructed Motion vector of the current frame and the picture of the previous reconstructed frame into a Motion compensation network Motion compensation Net to obtain a predicted frame of the current frame;
s105, subtracting the real frame and the predicted frame to obtain residual error information r which cannot be included in the predicted framet
S106, encoding Residual error encoder net, quantizing Q, entropy encoding and storing Residual error information, then decoding the Residual error encoder net to obtain a Residual error reconstruction result, and adding the Residual error reconstruction result and a predicted frame to obtain a final reconstruction frame;
s107, the compressed video needs to store the motion vector encoded in step S103 and the residual encoded in step S106.
S2, pruning and quantifying the trained model;
then pruning and quantizing the trained model, and gradually pruning and quantizing layer by layer, wherein the method comprises the following specific steps:
s201, pruning is carried out firstly, the trained weight of each layer is visualized, the absolute value of most data of each layer is found to be very small and is close to 0, therefore, all data with the absolute value smaller than 0.5 are pruned, a sparse matrix is obtained, a common CSR method is used for storing the sparse matrix, and the CSR method is used for storing the matrix as shown in figure 4.
As shown in fig. 4, there are 3 × 3 sparse matrices, and then the information we need to store is the reserved values [1, 2, 3, 4, 5, 6], and the column index [0, 2, 2, 0, 1, 2] corresponding to each data, and since we store by row, we can only store those numbers in the value list belonging to one row, we use one list to store [0, 2, 3, 6 ]. Indicating that 1, 2 belong to a row, 3 belongs to a row, and 4, 5, 6 belong to a row. Therefore, in order to store the sparse matrix, a total of 2 a + n +1 data needs to be stored, which is much smaller than n. For storage data compression, we store the absolute position of index [0, 2, 3, 6] as a value, we use a relative value, and thus the number of bits required to store the index number is reduced. As shown in fig. 2, if we store the idx absolute value, the number of bits required for a number is 4 bits. But if we store diff, the number of bits required for a number is 3 bits. diff denotes the offset of the current value from the last value, in order to store the offsets using a smaller number of bits, we set the maximum offset to 8, so we store each offset using 3 bits, and to this effect we complement the 12 position by anumber 0, so that when idx is 15, the offset is 3.
Experiments prove that the memory capacity can be reduced to 1/13 by pruning, and the precision is hardly lost. This further demonstrates that there is a large amount of redundant information in the weights for deep learning.
S202, after pruning is finished, quantizing the pruned data, wherein the quantizing of the matrix is carried out by using a traditional K-mean algorithm.
As shown in fig. 3, the initial value of K-means is selected first, where we use a data density based method, i.e. based on the frequency of occurrence of data as the probability of selection, and then sampling is performed. We use K as 11, i.e. 11 points are selected.
Then, training is carried out by using a K-mean algorithm to obtain the final 11 central points, and then data are clustered into corresponding clusters. Specifically, as shown in fig. 3, it is assumed that K =4 is used, and then K-mean clustering is used to obtain clusters of each data, where different colors in the graph represent different clusters, e.g., blue 2.09, 2.12, 1.92, and 1.87 in the graph are one cluster, and we use them collectively as data 2.00, and 2.00 is obtained by averaging 4 values. Also for the other 3 clusters, we treated in the same way to get 4 cluster centers 2.00, 1.50, 0.00, -1.00, respectively.
Then, only 4 data need to be stored, each data occupies 32 bits, each data index needs to be stored once, but the index only needs to store 2 bits, so that compared with the case that 16 data are stored as 32 bits, the storage amount of the quantized model is only 5/16 of the original storage amount.
After we have quantized, we need to perform a downward optimization on the model, where we separately inverse-derive each parameter, then add the derivatives of each cluster, and then use this sum gradient to perform a gradient descent-param-lr gradient-on the quantized parameter, as shown in fig. 3.
S3, pruning and quantization are performed for each layer separately, and to further reduce storage, we huffman code the weights in the entire network using huffman coding and then store.
Experiments prove that the compressed model only occupies 2/205 of the original model. A model with more than 1G is compressed to be stored in dozens of megabytes. Can be conveniently deployed in mobile phone equipment.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
In addition to the technical features described in the specification, the technology is known to those skilled in the art.

Claims (8)

CN202010104794.1A2020-02-202020-02-20Method for realizing depth video compression framework based on mobile phone platformPendingCN111263163A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010104794.1ACN111263163A (en)2020-02-202020-02-20Method for realizing depth video compression framework based on mobile phone platform

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010104794.1ACN111263163A (en)2020-02-202020-02-20Method for realizing depth video compression framework based on mobile phone platform

Publications (1)

Publication NumberPublication Date
CN111263163Atrue CN111263163A (en)2020-06-09

Family

ID=70952978

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010104794.1APendingCN111263163A (en)2020-02-202020-02-20Method for realizing depth video compression framework based on mobile phone platform

Country Status (1)

CountryLink
CN (1)CN111263163A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113255576A (en)*2021-06-182021-08-13第六镜科技(北京)有限公司Face recognition method and device
CN114898446A (en)*2022-06-162022-08-12平安科技(深圳)有限公司 Artificial intelligence-based face recognition method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090074073A1 (en)*2003-07-182009-03-19Microsoft CorporationCoding of motion vector information
CN108304928A (en)*2018-01-262018-07-20西安理工大学Compression method based on the deep neural network for improving cluster
CN110009565A (en)*2019-04-042019-07-12武汉大学 A Lightweight Network-Based Super-Resolution Image Reconstruction Method
CN110166779A (en)*2019-05-232019-08-23西安电子科技大学Video-frequency compression method based on super-resolution reconstruction
CN110753225A (en)*2019-11-012020-02-04合肥图鸭信息科技有限公司Video compression method and device and terminal equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090074073A1 (en)*2003-07-182009-03-19Microsoft CorporationCoding of motion vector information
CN108304928A (en)*2018-01-262018-07-20西安理工大学Compression method based on the deep neural network for improving cluster
CN110009565A (en)*2019-04-042019-07-12武汉大学 A Lightweight Network-Based Super-Resolution Image Reconstruction Method
CN110166779A (en)*2019-05-232019-08-23西安电子科技大学Video-frequency compression method based on super-resolution reconstruction
CN110753225A (en)*2019-11-012020-02-04合肥图鸭信息科技有限公司Video compression method and device and terminal equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113255576A (en)*2021-06-182021-08-13第六镜科技(北京)有限公司Face recognition method and device
CN114898446A (en)*2022-06-162022-08-12平安科技(深圳)有限公司 Artificial intelligence-based face recognition method, device, equipment and storage medium

Similar Documents

PublicationPublication DateTitle
US10939123B2 (en)Multi-angle adaptive intra-frame prediction-based point cloud attribute compression method
CN109996071B (en) Variable bit rate image encoding and decoding system and method based on deep learning
CN110677644B (en)Video coding and decoding method and video coding intra-frame predictor
CN108322742A (en)A kind of point cloud genera compression method based on intra prediction
CN107396124A (en)Video-frequency compression method based on deep neural network
TWI521890B (en) Image coding apparatus, method and program, and image decoding apparatus, method and program
CN110248190B (en)Multilayer residual coefficient image coding method based on compressed sensing
CN116489369B (en)Driving digital video compression processing method
CN111432211B (en)Residual error information compression method for video coding
CN108028945A (en)The apparatus and method of conversion are performed by using singleton coefficient update
CN115361559B (en) Image encoding method, image decoding method, device and storage medium
Gu et al.Compression of human motion capture data using motion pattern indexing
CN116910285B (en)Intelligent traffic data optimized storage method based on Internet of things
CN111263163A (en)Method for realizing depth video compression framework based on mobile phone platform
CN118075472A (en)Spectrum compression method based on LOCO-I algorithm and Huffman coding
CN118632027B (en) A point cloud compression method based on graph convolutional network
KR20210152992A (en)Method, apparatus and recording medium for encoding/decoding image using binary mask
CN113784147A (en) A high-efficiency video coding method and system based on convolutional neural network
JP2017158183A (en) Image processing device
CN111652789A (en) Embedding method and extraction method of a color image
CN115913248A (en)Live broadcast software development data intelligent management system
CN105791863A (en) Layer-based 3D-HEVC Depth Map Intra-frame Prediction Coding Method
CN115170683A (en)Image processing method, image processing device, electronic equipment and storage medium
KR20130022541A (en)Method and apparatus for encoding image, and method and apparatus for decoding image
CN114222124A (en)Encoding and decoding method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
WD01Invention patent application deemed withdrawn after publication

Application publication date:20200609

WD01Invention patent application deemed withdrawn after publication

[8]ページ先頭

©2009-2025 Movatter.jp