Background technology
Often normal form Huffman tree is used carrying out data compression, as: GZIB, ZLIB, PNG, JPEG, MPEG.In actual compression, except preserving compressed encoding, also needing to preserve normal form Huffman tree and initial data total amount etc., can decompress like this.The present invention program for be how to preserve normal form Huffman tree.
First introduce the principle of normal form Huffman encoding, normal form Huffman encoding process is probably divided into 4 steps:
1) compression unit is counted or probability statistics, and according to sorting from big to small, assuming that to byte data statistics as described in Table 1:
| Symbol | A | B | C | D | E |
| Counting | 15 | 7 | 6 | 6 | 5 |
| Probability | 0.38461538 | 0.17948718 | 0.15384615 | 0.15384615 | 0.12820513 |
Table 1 pair compression unit counts or sequence after probability statistics
Wherein, symbol is used for identifying compression unit, and counting represents the number of respective compression units, and probability represents that respective compression units accounts for the ratio of total compression unit.
2) normal form Huffman tree is created, particularly: the merging binary tree always finding two probabilities of occurrence or least number of times, until be merged into a root node according to statistics.Assuming that divide a according to the above results, b ..e be totally 5 steps (e is result), and visioning procedure as shown in Figure 1.
3) be encoded to 0 according to left sibling, right node is that 1 pair of data cell is encoded, and produces normal form Huffman code tables, as shown in table 2.
| Character | A | B | C | D | E |
| Code | 010 | 0 | 101 | 110 | 111 |
Table 2 normal form Huffman code tables
4) carry out data replacement according to coding namely to compress, namely originally need 8 data A stored, B ..E, finally can with 1,3,3,3,3 bit data replace, and sequential storage, finally achieves the object of compression.Also needing record coding table to decompress, is also { A, 0}, { B, 100}, { C, 101}, { D, 110}, { E, 111}.Owing to mainly discussing how memory encoding table, the detailed process of compression coding and decode procedure just no longer describe in detail.
The following scheme of the many employings of current storage normal form Huffman code tables:
1) the rectangular formula of record code:
| Character | A | B | C | D | E |
| Code length | 1 | 3 | 3 | 3 | 3 |
Table 3 records the code length form of Huffman encoding
As shown above, what code table needed preservation is 1,3,3,3,3, because the feature of normal form Huffman tree, just can restore original tree by these numerals.Replace the benefit of coding to be the width of storage bit number by code length to be more limited, just to take whole byte code tables, the longest also can not more than 255, namely the longest is 1 byte.
2) mode of code length basis being carried out second-compressed to code table is recorded:
By the data 1,3,3 of above-mentioned code table, 3,3 compress again, as used RLE compression, then become after compression: 1,0,3,3 (representing 0+1=1 individual 1 and 3+1=4 individual 3), also the compression of other mode is had, when there being a large amount of repeating data, such as, when length is 8 entirely, what may record is exactly 8, and 255.
There is following defect in the scheme of existing preservation normal form Huffman tree:
In not compression situation: when maximum code length is longer, each code length preserves figure place needs to strengthen, and character (8bit) compression under extreme case, code table length will reach 256 bytes.
In compression situation: most compression algorithm, all can improve in the face of repetition byte compression ratio, but in the face of almost each code length seldom time when less (also repeated) just be comparatively difficult to play a role.
To sum up, all there is the excessive problem of code table in the scheme of existing preservation normal form Huffman tree, and extreme case is issued to even to exceed directly deposits code table.
Summary of the invention
The invention provides a kind of method of preserving normal form Huffman tree, the method can adopt as far as possible few data to preserve normal form Huffman tree, improves storage efficiency.
The invention provides a kind of device preserving normal form Huffman tree, this device can adopt as far as possible few data to preserve normal form Huffman tree, improves storage efficiency.
The invention provides another kind of method of preserving normal form Huffman tree, the method can adopt as far as possible few data to preserve normal form Huffman tree, improves storage efficiency.
The invention provides the another kind of device preserving normal form Huffman tree, this device can adopt as far as possible few data to preserve normal form Huffman tree, improves storage efficiency.
Preserve a method for normal form Huffman tree, the method comprises:
The node of normal form Huffman tree is marked, has subtree by M flag node, by N flag node without subtree;
Carry out record successively to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt order from left to right, record from first node, being only recorded to first is not the node of N;
Using the vertex ticks of record as final entry result, preserve final entry result.
Preserve a device for normal form Huffman tree, this device comprises vertex ticks logging modle and preserves module;
Described vertex ticks logging modle, marks the node of normal form Huffman tree, has subtree by M flag node, by N flag node without subtree; Carry out record successively to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt order from left to right, record from first node, being only recorded to first is not the node of N; The vertex ticks of record is sent to described preservation module;
Described preservation module, using the vertex ticks of record as final entry result, preserves final entry result.
Preserve a method for normal form Huffman tree, the method comprises:
The node of normal form Huffman tree is marked, has subtree by M flag node, by N flag node without subtree;
Successively record is carried out to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt dextrosinistral order, record from first node, being only recorded to first is not the node of M;
Using the vertex ticks of record as final entry result, preserve final entry result.
Preserve a device for normal form Huffman tree, this device comprises vertex ticks logging modle and preserves module;
Described vertex ticks logging modle, marks the node of normal form Huffman tree, has subtree by M flag node, by N flag node without subtree; Successively record is carried out to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt dextrosinistral order, record from first node, being only recorded to first is not the node of M; The vertex ticks of record is sent to described preservation module;
Described preservation module, using the vertex ticks of record as final entry result, preserves final entry result.
As can be seen from such scheme, in the present invention, the node of normal form Huffman tree is marked, then record is carried out to the mark of every node layer, then, using the vertex ticks of record as final entry result, preserve final entry result.The present invention adopts from left to right or dextrosinistral order carries out record to the mark of every node layer, record from first node, is recorded to first for N or be not the node of M.Particularly, when employing is recorded the mark of every node layer from left to right, record from first node, being only recorded to first is not the node of N, and subsequent node no longer carries out record; Similarly, when employing is recorded the mark of every node layer from right to left, record from first node, being only recorded to first is not the node of M, subsequent node no longer record; In addition, record is not carried out to all nodes of every layer of rightmost side node and last one deck on this basis.The application changes the rectangular formula of record code and second-compressed mode being used to have, provides easier recording mode, decreases storage data, thus improve storage efficiency.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with embodiment and accompanying drawing, the present invention is described in more detail.
See Fig. 2, for the present invention preserves the method indicative flowchart of normal form Huffman tree, it comprises the following steps:
Step 201, marks the node of normal form Huffman tree, has subtree by M flag node, by N flag node without subtree; Carry out record successively to the mark of the every node layer of normal form Huffman tree, particularly: adopt order from left to right, record from first node, being only recorded to first is not the node of N.
Wherein, M and N can choose as required, can be 1, also can be 2 or more positions.Here for 1, M value is 1, N value is 0.Need the normal form Huffman tree preserved still to be described with the example shown in Fig. 1, mark is carried out afterwards for shown in Fig. 3 to each node: ground floor: 1, the second layer: 01, third layer 11, the 4th layer: 0000.
In order to reduce storage data, in the application, when recording the mark of every node layer, record from first node, being only recorded to first is not the node of N, follow-up no longer record.
For the example of Fig. 3, record is carried out to each layer:
Ground floor: 1;
The second layer: 01;
Third layer: 1 (because every layer is only recorded to the node that first is not 0);
4th layer: 0000;
Final entry result: 10110000.
Step 202, using the vertex ticks of record as final entry result, preserves final entry result.
Alternatively, the leaf node number in normal form Huffman tree can also be added up, obtain leaf node sum, leaf node sum is added in final entry result.
In the example of Fig. 3, leaf node adds up to 5.The present invention adopts from left to right or dextrosinistral order carries out record to the mark of every node layer, record from first node, is recorded to first for N or be not the node of M.Particularly, when employing is recorded the mark of every node layer from left to right, record from first node, being only recorded to first is not the node of N, and subsequent node no longer carries out record; Similarly, when employing is recorded the mark of every node layer from right to left, record from first node, being only recorded to first is not the node of M, and subsequent node is record no longer.The application changes the rectangular formula of record code and second-compressed mode being used to have, provides easier recording mode, decreases storage data, thus improve storage efficiency.
After preserving final entry result, when needing, can decode to it, to know the structure of normal form Huffman tree.Decoding detailed process comprises:
From top to bottom each layer is decoded successively:
Be labeled as the nodes of M in the last layer of statistics current layer, the nodes counted is multiplied by 2, using the product value that the obtains nodes P as current layer; If when current layer is ground floor, P is 1;
Marker bit is read successively from final entry result, stop until running into when the marker bit number being labeled as M or reading reaches P, for the mark read fills succeeding marker M, until marker bit sum reaches P position, using the mark result of the mark after filling as current layer.Example below in conjunction with Fig. 3 is described, and its decode procedure (5 leaf nodes, are recorded as 10110000) is:
Foregoing schemes is adopted to decode to each layer, particularly:
Ground floor: 1; Decoding: 1.
Now, P is 1; Marker bit is read successively from final entry result, stop until running into when the marker bit number being labeled as M or reading reaches P, for the mark read fills succeeding marker M, until marker bit sum reaches P position, using the mark result of the mark after filling as current layer, then the mark result of the ground floor obtained is 1.
The second layer: 01, decoding: 01;
Now, P is 2; Marker bit is read successively from final entry result, stop until running into when the marker bit number being labeled as M or reading reaches P, for the mark read fills succeeding marker M, until marker bit sum reaches P position, using the mark result of the mark after filling as current layer, then the mark result of the second layer obtained is 01.
Third layer: 1, decoding: 11;
Now, P is 2; From final entry result, read marker bit successively (read the 3rd in 10110000 above, here read from the 4th), stop (here until run into when the marker bit number being labeled as M or reading reaches P, read one, namely 1) be, that the mark read fills succeeding marker M, until marker bit sum reaches P position, using the mark result of the mark after filling as current layer, then the mark result of the third layer obtained is 11.
4th layer: 0000, be decoded as: 0000.
Now, P is 4; From final entry result, read marker bit successively (read the 4th in 10110000 above, here read from the 5th), stop (here until run into when the marker bit number being labeled as M or reading reaches P, read 4, namely 0000), then the mark result of the 4th layer that obtains is 0000.
Decoding terminates.
Now, P=0; Decoding terminates.
In order to adopt as far as possible few data to preserve normal form Huffman tree, when encoding preservation, because each node of last one deck is all labeled as 0, record can not be carried out to last one deck.Further, when encoding preservation, record can not also be carried out to last node of every layer.
Accordingly, decode and to adjust corresponding.Illustrate, if when coding is preserved, adopt order from left to right, record from first node, being only recorded to first is not the node of N, and, record is not carried out to last one deck, also record is not carried out to last node of every layer; Decoding is specially:
The mark of ground floor node is set to M;
From top to bottom subsequent layers is decoded successively:
Be labeled as the nodes of M in the last layer of statistics current layer, the nodes counted is multiplied by 2, using the product value that the obtains nodes P as current layer;
From final entry result, read leaf node sum, the leaf segment that before counting current layer, all layers are labeled as N is counted, and calculates residue leaf segment and to count Q;
Judge whether P equals Q, if so, then current layer is last one deck, and P node is all leaf node, is labeled as N; Otherwise, from final entry result, read marker bit successively, stop until running into when the marker bit number being labeled as M or reading reaches P-1, for the mark read fills succeeding marker M, until mark figure place reaches P position, using the mark result of the mark after filling as current layer.
Example below for Fig. 3 is described in detail.Record is carried out to each layer:
Ground floor: do not record (because last node of every layer does not record);
The second layer: 0 (because last node of every layer does not record);
Third layer: 1 (do not record because of last node of every layer and be only recorded to the node that first is not 0);
4th layer: do not record (because last one deck does not record);
Final entry result: 01.
Like this, once sharing the structures that 2 can store the normal form Huffmans tree in previous example, its efficiency is apparent, improves a lot than existing record code length mode and the efficiency of carrying out second-compressed mode.
In the example of Fig. 3, leaf node adds up to 5.
Example below in conjunction with Fig. 3 is described, and its decode procedure (5 leaf nodes, are recorded as 01) is:
Ground floor: not record, decoding: 1 (because of last node not record of every layer, one is decided to be 1).
The each layer later to the second layer is decoded in the following way: current layer nodes (P)=last layer is the nodes * 2 of 1; When remain leaf segment count=Q time, the whole node of current layer is all leaf node, is also namely all 0; Compressed-bit is read in step-by-step, until running into compressed-bit is that 1 (assuming that T position, reading to T position) or number reach P-1 stopping; When the compressed-bit read is 0, leaf node counting+1; From T+1 to P, position is all filled to 1.
In this example, be followed successively by after subsequent layers decoding:
The second layer: 0, decoding: 01 (totally 2 nodes, last does not record);
Third layer: 1, decoding: 11 (totally 2 nodes, do not record last node, are recorded to the node that first is not 0);
4th layer: not record, is decoded as: 0000 (totally 4 leaf nodes, before there is 1 leaf node, also surplus 4 leaf nodes, therefore determine that this layer is last one deck).In the present invention, in order to adopt as far as possible few data to preserve normal form Huffman tree, first the feature of normal form Huffman tree being understood deeply and being analyzed:
Each node or only have Liang Ge branch, or be leaf node, such binary tree is also referred to as the standard B-tree;
All nonleaf nodes all lean on one side (being generally right side), and the right side in one deck from certain node has child node until this layer terminates;
As normal form Huffman tree leaf node number >=2, see from top to bottom, root node always has two nodes, and last node layer also must be leaf node.Based on above analysis, The present invention gives the following scheme storing solution normal form Huffman tree:
Store the value of the sum of leaf node, tree structure and leaf nodes;
Tree structure travels through each node 1 bit representation with or without subtree (1: have, 0: nothing) by sequence;
Last node (comprising root node) of every layer does not record (because except last one deck, this node necessarily has subtree);
Every layer is only recorded to the node (because rear be 1 entirely) that first is not 0;
Last one deck does not record (have 2 times of children tree nodes number because last one deck must be last layer and be 0 entirely).
The present invention is according to the feature of normal form Huffman tree (also claiming the standard B-tree), step-by-step stores the mode whether a part of node has subtree, reach relatively stable and compress object efficiently, when the most extreme, the normal form Huffman tree that whole 256 characters (8bit) are formed can be saved to log2 (N)-1 in N-2 position.
Because all identical according to the data total bit after normal form Huffman encoding compress, the difference recording normal form Huffman tree therefore only need be considered.Even if do not consider to record the excessive datas such as total leaf segment is counted, leaf node code maximum number of digits, total bytes, store for example above, rough calculation:
Direct record coding table method: 5x3=15 position;
Traditional approach stores the code table of normal form Huffman encoding length: 5x2=10 position;
Traditional approach carries out second-compressed after storing the code table of normal form Huffman encoding length: 4x2=8 position or other unknown situation;
The method of storage is simplified in step-by-step of the present invention: 2.
Extreme example one: every layer has 2 child nodes except root node and last one deck, and leaf node adds up to 256.
Direct record coding table method: 255*256=65280 position;
Traditional approach stores the code table of normal form Huffman encoding length: 8*256=2048 position;
Traditional approach carries out second-compressed after storing the code table of normal form Huffman encoding length: 8*256+256*2 or other
The method of storage is simplified in step-by-step of the present invention: 254.
Extreme example two: leaf node adds up to the full binary tree of 256.
Direct record coding table method: 8*256=2048 position;
Traditional approach stores the code table of normal form Huffman encoding length: 8*256=2048 position;
Traditional approach carries out second-compressed after storing the code table of normal form Huffman encoding length: 8*2 or other
The method of storage is simplified in step-by-step of the present invention: 7.
As can be seen from data above: be no matter that under extreme case or general case, storage means of the present invention has high efficiency.
Technical solution of the present invention can with fixing 1 store a node in tree structure whether have subtree, even if increase redundancy to 2 even more multidigit also may have superiority than alternate manner; Different search orders can also be adopted, or represent with or without subtree with differentiable different value.Further, normal form Huffman tree is comprised to the situation of left subtree and right subtree simultaneously, left and right subtree can also separate step-by-step preservation, namely, from top to bottom normal form Huffman tree is divided into left subtree and right subtree, then preserve respectively.The variant (such as: all keep left) of normal form Huffman tree also can be preserved in this way.The technical program, except being applied to packed data, also can do other purposes.Can according to the logic of chained list, just left and right node step-by-step stores and carrys out alternative the present invention program without node.
Aforementioned is adopt order from left to right to carry out record to each node layer, also passable, order is from right to left adopted to carry out record to each node layer, unlike, be that to be recorded to first be not the node of N when recording every node layer from left to right, and recording every node layer from right to left, to be recorded to first be not constantly the node of M; The coding preserving type of two kinds of orders and decoding process are all similar, and it is no longer repeated here, only do brief description below.
For dextrosinistral mode:
Coding is preserved and is specifically comprised:
The node of normal form Huffman tree is marked, has subtree by M flag node, by N flag node without subtree;
Successively record is carried out to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt dextrosinistral order, record from first node, being only recorded to first is not the node of M;
Using the vertex ticks of record as final entry result, preserve final entry result.
Correspondingly, decode procedure comprises:
From top to bottom each layer is decoded successively:
Be labeled as the nodes of M in the last layer of statistics current layer, the nodes counted is multiplied by 2, using the product value that the obtains nodes P as current layer; If when current layer is ground floor, P is 1;
Marker bit is read successively from final entry result, stop until running into when the marker bit number being labeled as N or reading reaches P, for the mark read fills succeeding marker N, until mark total bit reaches P position, using the mark result of the mark after filling as current layer.
Natch, in order to adopt as far as possible few data to preserve normal form Huffman tree, when encoding preservation, similarly, also can not carry out record to last one deck, record can also do not carried out to a node of every layer of rightmost side further.
See Fig. 4, for the present invention preserves the apparatus structure schematic diagram of normal form Huffman tree, this device comprises vertex ticks logging modle and preserves module;
Described vertex ticks logging modle, marks the node of normal form Huffman tree, has subtree by M flag node, by N flag node without subtree; Carry out record successively to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt order from left to right, record from first node, being only recorded to first is not the node of N; The vertex ticks of record is sent to described preservation module;
Described preservation module, using the vertex ticks of record as final entry result, preserves final entry result.
Further, this device also comprises leaf node sum statistical module, adds up, obtains leaf node sum, send to described preservation module to the leaf node number in normal form Huffman tree;
Described preservation module, adds to leaf node sum in final entry result, preserves.
Preferably, this device also comprises the first decoder module, to the decoding of normal form Huffman tree, particularly: from top to bottom each layer is decoded successively: the nodes being labeled as M in the last layer of statistics current layer, the nodes counted is multiplied by 2, using the product value that the obtains nodes P as current layer; If when current layer is ground floor, P is 1; Marker bit is read successively from final entry result, stop until running into when the marker bit number being labeled as M or reading reaches P, for the mark read fills succeeding marker M, until marker bit sum reaches P position, using the mark result of the mark after filling as current layer.
Further, present invention also provides the another kind of device preserving normal form Huffman tree, this device comprises vertex ticks logging modle and preserves module;
Described vertex ticks logging modle, marks the node of normal form Huffman tree, has subtree by M flag node, by N flag node without subtree; Successively record is carried out to the mark of the every node layer of normal form Huffman tree from top to bottom, particularly: adopt dextrosinistral order, record from first node, being only recorded to first is not the node of M; The vertex ticks of record is sent to described preservation module;
Described preservation module, using the vertex ticks of record as final entry result, preserves final entry result.
Further, this device also comprises leaf node sum statistical module, adds up, obtains leaf node sum, send to described preservation module to the leaf node number in normal form Huffman tree;
Described preservation module, adds to leaf node sum in final entry result, preserves.
Preferably, this device also comprises the second decoder module, to the decoding of normal form Huffman tree, particularly: from top to bottom each layer is decoded successively:
Be labeled as the nodes of M in the last layer of statistics current layer, the nodes counted is multiplied by 2, using the product value that the obtains nodes P as current layer; If when current layer is ground floor, P is 1;
Marker bit is read successively from final entry result, stop until running into when the marker bit number being labeled as N or reading reaches P, for the mark read fills succeeding marker N, until mark total bit reaches P position, using the mark result of the mark after filling as current layer.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.