Background
Processing and transmitting mass data are not required in the cloud computing era, the mass data often needs to occupy an overlarge network bandwidth for transmission, and once a problem occurs in data transmission, the overall performance of a cloud computing center is seriously influenced. In order to improve the transmission performance of mass data, data compression is often required before data transmission, and the total amount of data to be transmitted is reduced through a lossy/lossless compression algorithm. Generally, data such as numbers, texts, synthetic images, and medical images are subjected to lossless compression, and data such as natural images, audio, and video are subjected to lossy compression.
As the most commonly used lossless compression algorithm, the LZ algorithm implements data compression by representing current data with information about history data. Specifically, each time a current input data is processed, the LZ algorithm finds a data segment which is processed before and matched with the head of the current input data by means of dictionary information, calculates the length of data matching between a history data segment and the current data segment, and replaces the current input data with a binary group (the distance from the current matching position, the matching length) when the matching length is greater than a threshold value, thereby achieving the effect of compression coding the input data.
According to the difference of the actual coding and decoding process and the dictionary implementation mode, LZ coding can be divided into multiple branches such as LZ77, LZ78 and the like, and a plurality of lossless compression software such as LZO, LZMA and LZ4 are formed. These software are based on the basic idea of LZ encoding, which differs from one implementation to another: for example, LZMA focuses on improving the compression ratio, has the highest compression ratio, but has the longest coding and decoding time; the LZ4 makes full use of cache, adopts a hash table with 16k size and capable of being completely loaded into the L1 cache to store a dictionary and simplify retrieval, and has the fastest coding and decoding speed, but the speed is improved at the cost of sacrificing partial compression ratio; LZO aims to achieve a balance between compression speed and compression rate, and to achieve as fast compression and decompression as possible while ensuring the compression rate.
In practical applications, there is an urgent need for increasing the encoding and decoding speed in data compression in some applications. For example: in the starting process of the Linux operating system, the Initrd mirror image of the temporary root file system is loaded and decompressed into an Initrd file, and the decompression speed of the Initrd mirror image has an important influence on the starting speed of the Linux operating system; in a spice cloud desktop protocol in the virtualization field, spice needs to send a desktop in a virtual machine to a client in real time for display, and when sending the desktop, data compression needs to be performed on desktop image data through an LZ algorithm so as to reduce bandwidth pressure, and the encoding and decoding speed of the compression algorithm can directly influence the experience effect of the client. If the LZ4 algorithm is used for encoding and decoding, the encoding and decoding speed can be basically guaranteed, but the reduction of the data compression rate can bring other problems, so that the encoding and decoding speed of the LZ algorithm including the LZ4 algorithm needs to be further optimized on the premise of not influencing the compression rate.
Disadvantages of the prior art
(1) The invention discloses a Chinese patent 'an FPGA implementation system based on a low-delay LZ lossless compression algorithm' (patent number: CN 106385260A). The patent discloses an FPGA implementation system based on a low-delay LZ lossless compression algorithm, which comprises an input cache module, an output cache module, a shift register, a read-back control module, a matching search module, a character length calculation module, a matching length calculation module and an output control module. The patent provides an FPGA implementation system based on a low-delay LZ (Lempel-Ziv, namely Ziv and Lempel algorithm) lossless compression algorithm aiming at the defects of the existing lossless compression hardware implementation method, the coding speed of the LZ lossless compression algorithm is accelerated by a hardware implementation mode rather than a software mode, a special FPGA chip needs to be configured, the application range of the FPGA chip is limited, and the integration effect of the FPGA chip with the existing system and software is still to be tested.
(2) The invention discloses a Chinese invention patent 'an LZ evolution lossless compression method for secondary online monitoring of a transformer substation' (patent number: CN 105281874A). The patent provides an LZ evolution lossless compression method for transformer substation secondary online monitoring, wherein a real-time compression dictionary is established in the transmission process and is mapped with a transformer substation secondary online monitoring main station mapping dictionary, so that a dictionary sequence is directly transmitted in the transmission process. When the secondary online monitoring main station and the secondary online monitoring sub station of the transformer substation are in communication establishment, the real-time compression dictionary is synchronized firstly, then information transmission is carried out, and most of the transmitted information is the serial numbers in the real-time compression dictionary, so that the required network bandwidth is reduced, the data transmission efficiency is improved, and reliable and real-time data transmission is guaranteed. The patent needs to establish and synchronize a real-time compression dictionary in advance, is suitable for special scenes and is not suitable for general scenes of cloud computing.
(3) The invention discloses a Chinese invention patent 'a hardware implementation system of an improved LZ4 compression algorithm' (patent number: CN 105207678A). The patent uses hardware circuit to realize improved LZ4 compression algorithm, which can exert the maximum performance of the compression algorithm, but the patent needs the support of special hardware circuit, and the compression rate of LZ4 compression algorithm is worse than that of other compression algorithm based on LZ variant.
(4) The chinese invention patent "share an initial dictionary and huffman tree among a plurality of compressed blocks in LZ-based compression algorithm" (patent No.: CN 105207678A). The patent divides a data string into a block set, and compresses each block based on an initial dictionary set and a Huffman tree set, so that data compression can be performed in a parallel mode, the compression speed is improved, and the compression rate can be poorer than that of other compression algorithms based on an LZ variant. Meanwhile, the patent needs to generate an initial dictionary and a huffman tree in advance before compression, which may cause additional computational overhead.
(5) The Chinese invention patent 'LZ coding compression method, device, equipment and storage medium' (patent number: CN 107565972A). The patent aims to improve the compression rate of LZ encoding under a heterogeneous parallel acceleration platform, but the compression rate is increased at the cost of extra calculation overhead.
(6) U.S. patent "Liternal handling in LZ compression encoding MRU/LRU encoding" (patent number: US6218970B 1). This patent provides a method and system for processing words in a Lempel-Ziv data compression system that optimizes lz encoding by re-arranging MRU/LRU encoding of the characters. The method is suitable for compressing text type data, but is not suitable for compressing other types of data such as images, programs and the like.
(7) The international invention patent "LEMPEL-ZIV (LZ) -BASED DATA COMPRESSION EMPLOYING IMPLICIT VARIABLE-LENGTH DISTANCE CODING" (patent No.: WO2015171263A 1). This patent employs Lempel-ziv (lz) based data compression with implicit variable length distance coding. The distance bit length in the compressed output data can be reduced to further reduce the data size. But this does not reduce the compression time.
(8) U.S. patent "Method, apparatus and system for data block retrieval for LZ data compression" (patent number: US20060018556A 1). This patent provides a technique for rearranging the input data stream for an LZ data compression system to achieve higher data compression. It compares each received data block with each of a predetermined number of previously processed data blocks, forming an affinity array such that each element in the affinity array comprises an affinity number based on one or more match locations and their associated match lengths; then rearranging the sequence of data blocks in the input data stream using the associative array to form a new data stream; the new data stream is finally encoded to achieve higher data compression. This patent also focuses on achieving higher compression rates, but the operations of comparing, rearranging, and re-encoding, etc., introduce additional computational overhead, requiring longer time for the compression operation.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an LZ series compression algorithm coding and decoding speed optimization method, which can effectively improve the coding and decoding speed of the LZ series compression algorithm on the premise of not depending on extra hardware and ensuring the original data compression ratio, thereby reducing the system overhead of actual application cost in the data coding and decoding process, such as Linux system starting, SPICE cloud desktop and the like, and improving the performance and the execution efficiency of the actual application.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
a coding and decoding speed optimization method for an LZ series compression algorithm comprises the optimization of two processes of data coding and data decoding, and the improvement of the coding and decoding speed of the algorithm is realized by optimizing the general operation steps of the LZ series algorithm, and meanwhile, the compression rate of the algorithm is not influenced.
Aiming at the general method, the data coding and decoding steps are optimized from the following aspects.
In the data encoding process, optimizing a data matching stage and an unmatched data output stage, specifically;
searching repeated data, wherein each matching is to read a data slice with the length of the computer word at one time for matching; and under the condition of failed matching, performing byte-by-byte matching on the data slices failed in matching.
Further, in a 32-bit computer system, the computer word size is 4 bytes; in a 64-bit computer system, the computer word size is 8 bytes. The data slice with the length of the word of the computer is read at one time, so that the cache of the computer can be utilized to the maximum extent and the number of CPU cycles required for reading the data can be reduced.
And accumulating the unmatched data when the unmatched data are output until the next repeated data segment is found, and copying the accumulated unmatched data to the code output cache region in batches by adopting a memcpy function.
In the data encoding process, aiming at the defect that the data matching of 1 byte can only be completed in each CPU period in the repeated data matching stage in the LZ series algorithm encoding process, the optimization method fully utilizes the word length of a processor, optimizes each matching into data slices with the length of the computer word length which are read at one time for matching, and performs further byte-by-byte matching on the data slices with the failed matching under the condition of failed matching.
Aiming at the mode of copying a specified number of bytes to a code output cache region at one time adopted by an unmatched data output stage in the coding process of an LZ series algorithm, the optimization method is uniformly changed into accumulating unmatched data, and until the next matched repeated data segment is found, the accumulated unmatched data is copied to the code output cache region at one time by adopting a memcpy function.
Aiming at the extra expense caused by that historical data can only be copied byte by byte when the compressed segment data in the decoding process of the LZ series algorithm is decompressed, the optimization method in the compressed segment data decompression process is improved as follows:
firstly, in the process of data decompression, when it is read that data is a binary (distance, length), repeatedly copying historical data according to information in the binary (distance, length) to obtain current data, wherein in the binary (distance, length), the distance represents a relative distance between a historical data position and a current position, and the length represents the length of the data which needs to be copied and takes the historical data position as a starting point;
then, judging whether the distance between the current position and the historical data position is greater than or equal to length or not;
if the distance is greater than or equal to length, directly adopting a memcpy function to complete data replication;
if the distance is less than length, data overlap exists between the historical data to be copied and the copied data, and data errors can be generated by directly adopting the memcpy function to perform batch copying. In order to avoid data errors and simultaneously utilize the advantages of the memcpy function in data copying efficiency as much as possible, a circular batch copying mode is adopted: in the process of circulating batch copying, the position of the historical data is kept unchanged, and the current position is continuously moved backwards along with the copying process; setting d as the distance between the current position and the historical data position obtained by real-time calculation in the copying process, and setting the initial value of d as distance; and in each cycle, copying data with the length of d bytes by adopting a memcpy function, updating the current position to be the tail part of the data segment, recalculating according to the new current position to obtain the value of d, ending the cycle when the value of d is greater than the length of the remaining data segment to be copied, and finishing the batch copying of the remaining data segment to be copied by adopting the memcpy function.
The optimization method does not reduce the compression rate of the algorithm because the coding and decoding principle of the algorithm is not changed; meanwhile, the optimization method optimizes the execution steps in the encoding and decoding process, and effectively reduces the CPU period required by encoding and decoding, so that the encoding and decoding time can be effectively reduced.
Due to the adoption of the technical scheme, the invention has the beneficial effects that:
(1) the method improves the coding and decoding speed of the LZ series compression algorithm in a pure software mode, and simultaneously can keep the compression ratio of the algorithm unchanged.
(2) The optimization method is effective to LZ series compression algorithms such as LZ77, LZMA, LZO, LZ4 and the like, but not only to a specific compression algorithm, thereby ensuring the algorithm universality of the optimization method.
(3) The optimization method is effective to LZ series compression algorithms under the CPU platform architectures such as x86_64 and arm64, and the platform architecture universality of the optimization method is ensured.
(4) The optimization method of the invention does not reduce the compression ratio of the algorithm because the coding and decoding principle of the algorithm is not changed; meanwhile, due to the fact that execution steps in the encoding and decoding process are optimized, the CPU period required by encoding and decoding is effectively reduced, encoding and decoding time can be effectively reduced, and both compression rate and encoding and decoding are achieved.
(5) The design and implementation of the optimization algorithm are independent design research and development, and the method has strong independent controllability.
(6) The optimization algorithm has obvious effect, and by adopting the scheme of the invention, the coding and decoding speed of the LZ series compression algorithm can be effectively increased on the basis of not needing auxiliary hardware equipment and not reducing the compression ratio. Through experimental statistics, for LZ series compression algorithms except LZ4 such as LZMA and LZO, the method can shorten the coding time by 30% -50% and the decoding time by 50% -60%; for the LZ4 algorithm, the present invention can also shorten the encoding time by 10% and the decoding time by 15%.
Detailed Description
The present invention will be described in further detail with reference to examples.
The invention discloses a coding and decoding speed optimization method of LZ series compression algorithm, which introduces an abbreviation and a key term definition in the industry before explaining the scheme of the invention in detail:
data compression: data compression is a technical method for reducing the data volume to reduce the storage space and improve the transmission, storage and processing efficiency of the data on the premise of not losing useful information, or for reorganizing the data according to a certain algorithm and reducing the redundancy and storage space of the data. Data compression can be divided into lossless compression and lossy compression according to whether information is lost in the compression process.
Lossless compression: lossless compression is compression by using statistical redundancy of data, which can completely recover original data without causing any distortion, but the compression rate is limited by the theory of statistical redundancy of data, generally 2:1 to 5:1, and such methods are widely used for compression of text data, programs and image data (such as fingerprint images, medical images, etc.) in special application occasions. The common lossless compression mainly includes run-length coding, huffman coding, arithmetic coding, dictionary coding, and the like.
Dictionary coding: the principle of dictionary coding is to construct a dictionary, and replace repeated characters or character strings with indexes, thereby realizing the compression of data. If the character string is relatively long, a dictionary is built for the entire character string, which will be large, and the matching speed will decrease rapidly as the dictionary increases.
LZ encoding and decoding: the Lempel-ziv (lz) codec is one of the most popular lossless storage algorithms, and is based on the concept of dictionary compression, and utilizes the context correlation characteristics of character strings, and uses a sliding window (a look-up buffer) as a dictionary, and reserves a look-ahead buffer (forward buffer) for the character strings to be compressed. The compressed character string is represented by a triplet: and finding the next character from back to front in the sliding window, if the same character appears in the window, finding the maximum number of bytes which can be matched, finding the next character, finding the longest matching string in the window (if a plurality of strings with the same length can be matched, finding the last string), and writing the triple formed by the displacement, the length and the next character of the matching string from the current position. If no matching string can be found in the sliding window, then shift = length =0, plus characters that cannot be compressed are output. The LZ algorithm, originally proposed in 1977 by Jacob Ziv and Abraham Lempel, israeli, could solve the problem of dictionary oversize in dictionary matching, and developed to date have produced numerous variants such as LZ77, LZ78, LZW, LZO, LZMA, LZSS, LZR, LZB, LZ4, and the like. Some of these variations are directed to improving compression ratios, such as LZMA; some have focused on rapid compression decompression, such as LZ 4; there are also efforts to balance compression rate and codec speed, such as LZO.
memcpy: memcpy refers to the memory copy function used by C and C + +, and the function prototype is void memcpy (void destin, void source, unsigned n); the function is to copy a number of bytes from the starting location of the source memory address into the target memory address, i.e., copy n bytes from the source into the target destin. If the memory regions pointed by source and destin overlap, this function does not ensure that the overlapping region of the source is not covered before copying. The memcpy in the current glibc library is optimized aiming at various platform architectures, and the quick copy of the memory can be ensured on platform architectures such as x86 and arm.
After the definitions of the abbreviations and key terms are introduced, aiming at the technical problem to be solved, the invention mainly optimizes the general operation flow of the LZ series algorithm, and realizes the improvement of the algorithm coding and decoding speed by optimizing the general operation steps of the LZ series algorithm without influencing the compression ratio of the algorithm.
Referring to fig. 1, the general operational flow of the encoding phase of the LZ series compression algorithm is described as follows:
1) dictionaries (the dictionary implementation of each algorithm is different) are used to cache the identification (usually the hash value of the first few bytes of data) and indices of the entered data. Through the identification, whether the historical data has the data with the same value as the initial data value of the current input data can be found; through the index, the position of the historical data in the memory can be found.
2) The encoding is started and the input data will be read byte by byte.
3) A piece of data (which may be one byte or multiple bytes) is read.
4) If the history data information identical to the hash identification of the current read data is not found in the dictionary, the current read data is directly copied to the encoding output buffer (usually, 1, 4 or 8 specified number of bytes are copied at a time), and the read pointer is jumped to the next byte of the current data segment.
5) If the history data information identical to the hash identification of the current read data is found in the dictionary, the matching (identical) byte number of the history data and the current read data is calculated (usually, matching is performed byte by byte).
6) If the number of matched (same) bytes is larger than or equal to the threshold, a binary group (length) is used as a compressed data segment to replace the data segment with the matched length and output the data segment into the encoding output buffer area, the distance represents the relative distance between the historical data and the current data, the length represents the matched length between the historical data and the current data, namely the length value of the compressed data segment, and the reading pointer is jumped to the next byte of the matched data segment.
7) If the match length is less than the threshold, the current data segment is copied to the encoded output buffer (typically 1, 4, or 8 specified number of bytes are copied at a time), and the read pointer is jumped to the next byte of the current data segment.
8) And updating the identification and index information of the current read data into the dictionary.
9) And (5) repeating the step 3) to the step 8) until all the input data are read, and outputting the data in the code output buffer area to finish the coding operation.
Referring to FIG. 3, the general operational flow of the decoding phase of the LZ series compression algorithm is described as follows:
1) decoding is started and the data to be decoded will be read byte by byte.
2) Reading a segment of data, and judging whether the segment of data is a data segment needing direct copying or a substituted binary group through the identification bit of the segment of data.
3) And if the read data segment is the data segment needing to be directly copied, directly copying the segment of data into a decoding output buffer area.
4) If the read data segment is a compressed data segment represented by a binary (length), the distance finds a historical data position before the distance position in the decoding output buffer, and copies the data segment with length from the historical data position to the current position of the decoding output buffer to complete decompression of the compressed data segment (the copy here is usually performed by copying byte by byte). The reason for adopting this method is that when there is a case where the length byte data is repeated from the history data position, there is a case where the distance between the history data position and the current position is smaller than the length, and in this case, by adopting the memcpy function or the method of copying a plurality of bytes, the byte which is not taken clearly at present is also copied to a new position, resulting in an error in decompressing data).
5) And (5) repeating the step 2) to the step 4) until all the data to be decoded are read, outputting the data in the decoding output buffer area, and finishing the decoding operation.
Aiming at the general coding and decoding flow of the LZ series compression algorithm, the invention optimizes the coding and decoding steps mainly from the following aspects.
For example, in the data encoding process, when the repeated data segment is searched, the matching speed of the repeated data segment is improved by a way of reading the data slice of a plurality of bytes at one time for matching. When the unmatched data are output, the output speed of the unmatched data is improved in a mode of accumulating the unmatched data and outputting the unmatched data in batches at one time. In the data decoding process, the speed of obtaining the current data by repeated copying according to the historical data is improved by adopting a direct batch copying mode and a circulating batch copying mode.
First, aiming at the defect that each CPU cycle can only complete 1 byte of data matching when searching repeated data segments in the step 5) in the encoding process of the LZ series algorithm, the optimization method fully utilizes the word length of the processor, does not match the data segments one by one, but matches the data segments of which the length is the computer word length by reading the data segments of which the length is multiple bytes at one time each time, for example, matches the data segments of which the length is 4 bytes (32-bit computer) or 8 bytes (64-bit computer) at one time, and further matches the data segments of which the matching fails one by one under the condition that the data segment matching fails, so that each CPU cycle can complete the data matching of 4 bytes (32-bit computer) or 8 bytes (64-bit computer), thereby greatly accelerating the data matching speed, the number of CPU cycles required to find duplicate data segments is reduced.
Secondly, aiming at the mode of copying a specified number of bytes to a code output buffer area at one time, which is adopted by the output of unmatched data in the step 4) and the step 7) in the coding process of the LZ series algorithm, the optimization method is uniformly changed into the mode that the unmatched data is not output immediately after being found, but the unmatched data is accumulated, and the accumulated unmatched data is copied to the code output buffer area at one time by adopting a memcpy function until the next matched repeated data segment is found.
Thirdly, aiming at the extra overhead caused by that the historical data can only be copied one by one when the compressed data segment is decompressed in the step 4) of the decoding process of the LZ series algorithm, in the data decoding process, the speed of obtaining the current data by repeated copying according to the historical data is improved by adopting the modes of direct batch copying and circulating batch copying, and the specific operation is as follows:
when current data are obtained by repeated copying according to historical data, the current data are not copied one by one, but the distance between the position of the historical data and the current position and the length of the data to be copied are comprehensively considered, and if the length of the data to be copied is smaller than the distance between the current position and the position of the historical data, the data to be copied are directly copied in batches; if the length of the data to be copied is larger than the distance between the current position and the historical data position, a circulating batch copying mode is adopted, the historical data position is kept unchanged, the current position is updated along with each circulation, and the length of the data copied in batches every circulation is equal to or smaller than the distance between the current position and the historical data position, so that the number of CPU cycles and extra expenses required during data copying are reduced.
The historical data is repeatedly copied to obtain the current data when the data is decoded, and the specific mode is as follows:
in the data decompression process, when the data is read to be a distance (length), obtaining current data by repeatedly copying historical data according to information in the distance (length), wherein in the distance (length), the distance represents a relative distance between a historical data position and a current position, and the length represents the length of the data which needs to be copied by taking the historical data position as a starting point;
firstly, judging whether the distance in the binary group (length) is larger than the length value;
if the distance is greater than or equal to length, directly adopting a memcpy function to complete batch copying of data;
if the distance in the binary (length) is smaller than the length, data overlap exists between the historical data to be copied and the copied data, and a data error will be generated if the memcpy function is directly adopted for batch copying. In order to avoid data errors and simultaneously utilize the advantages of the memcpy function in data copying efficiency as much as possible, a circular batch copying mode is adopted: in the process of circulating batch copying, the position of the historical data is kept unchanged, and the current position is continuously moved backwards along with the copying process; setting d as the distance between the current position and the historical data position obtained by real-time calculation in the copying process, and setting the initial value of d as distance; and in each cycle, copying data with the length of d bytes by adopting a memcpy function, updating the current position to be the tail part of the data segment, recalculating according to the new current position to obtain the value of d, ending the cycle when the value of d is greater than the length of the remaining data segment to be copied, and finishing the batch copying of the remaining data segment to be copied by adopting the memcpy function.
It can be seen that distance and d are not a meaning, in the present invention, distance is a fixed value, and d is a new distance between the current position and the historical data position, which is obtained by recalculating after the current position moves backward continuously, along with the progress of copying, and is a variable. Namely, when the batch replication is circulated, the data replication of distance bytes is completed from the historical data position, then the replication of 2 × distance bytes is completed from the historical data position, and the data replication of length bytes is gradually increased until the data replication of length bytes is completed. Through modification, the decoding process of the LZ series algorithm can be accelerated.
According to the optimization method, the compression rate of the algorithm cannot be reduced because the encoding and decoding principle of the algorithm is not changed; meanwhile, the optimization method optimizes the execution steps in the encoding and decoding process, and effectively reduces the CPU period required by encoding and decoding, so that the encoding and decoding time can be effectively reduced.
Therefore, the method specifically comprises the following steps:
s1: data encoding
1) Improving the operation of the data matching phase. The word length of a processor is fully utilized, each matching is performed by reading data slices of 4 bytes (32-bit computer) or 8 bytes (64-bit computer) at one time, and when the matching fails, the data slices which fail to be matched are further matched byte by byte;
2) improving the operation of the unmatched data output stage. And fully utilizing the performance optimization of the memcpy function on the memory data copying, accumulating unmatched data, and copying the accumulated unmatched data to the code output cache region at one time by using the memcpy function until the next matched data segment is found.
S2: data decoding
Improving the operation of the compressed data segment decompression phase. If the distance between the current position and the historical data position is larger than or equal to the length of the compressed data segment, and the length of the compressed data segment is the matching length of the historical data and the current data, directly copying; otherwise, adopting a cyclic copy mode of the memcpy function, and gradually doubling and adding the data length copied by the memcpy function on the premise of not copying the memory area of the unclear data until the whole compressed data segment is decompressed.
Embodiments of the present invention will be described in detail below with reference to the accompanying fig. 1-4, and details and functions that are not essential to the invention are omitted during the description so as not to obscure the understanding of the present invention.
FIGS. 1 to 4 are flow charts of LZ series compression algorithms before and after the optimization method of the present invention is used. Wherein, fig. 1 and fig. 3 are before optimization, fig. 2 and fig. 4 are after optimization, and the final flow after optimization is as shown in fig. 2 and fig. 4.
As shown in the optimized flow shown in fig. 2, the present embodiment is as follows:
s201: data encoding
1) Creating and initializing a dictionary, and initializing and accumulating starting points of unmatched data segments;
2) starting coding, reading input data byte by byte, and searching historical data with the same hash identification in a dictionary according to the hash identification of the read data segment;
3) if historical data information which is the same as the hash identification of the current read data is not found in the dictionary, and the starting point of the accumulated unmatched data segment is empty, setting the starting point of the accumulated unmatched data segment as the position of the current read data;
4) if the historical data information which is the same as the hash identification of the current read data is found in the dictionary, starting to calculate the number of matched (same) bytes of the historical data and the current data, namely the distance between the historical data and the current data;
5) reading one long (64 bits)/int (32 bits) type data from the current position and the historical position respectively for comparison, if the two data are equal, increasing the matching length by 8 (64 bits) or 4 (32 bits), and reading the next long (64 bits)/int (32 bits) type data;
6) repeating the step 5) until the read long (64 bits)/int (32 bits) type data are not equal, comparing the memory areas represented by the two unequal data to obtain equal byte length, increasing the equal byte length to the matching length, and finishing data matching;
7) judging whether the matching length is greater than a threshold value, if the matching length is greater than or equal to the threshold value, firstly judging whether the starting point of the accumulated unmatched data segment is empty, if not, firstly copying memory data between the starting point of the accumulated unmatched data segment and the current position to a code output cache region through a memcpy function, and emptying the starting point of the accumulated unmatched data segment; then using (distance, length) binary group as compressed data segment to replace the data segment with matched length and output the data segment to the coding output buffer area;
8) if the matching length is smaller than the threshold value and the starting point of the accumulated unmatched data segment is empty, setting the starting point of the accumulated unmatched data segment as the current data reading position;
9) updating the identification and index information of the current read data to a dictionary;
10) and (5) repeating the step 2) to the step 9) until all the input data are read, and outputting the data in the code output buffer area to finish the coding operation.
As shown in the optimized flow shown in fig. 4, the present embodiment is as follows:
s202: data decoding
1) Starting decoding, and reading data to be decoded byte by byte;
2) reading a segment of data, and judging whether the segment of data is a data segment type needing direct copying or a substituted binary group type through the identification bit of the segment of data;
3) if the read data segment is a data segment which needs to be directly copied, directly copying the data segment byte by byte into a decoding output buffer area through a memcpy function;
4) if the read data segment is a compressed data segment represented by a binary (length), finding a historical data position before the distance position in the decoding output buffer area through the distance, and selecting a data replication mode according to the relation between the distance and the length; in a binary group (length), the distance represents a relative distance between the historical data and the current data, and the length represents a matching length of the historical data and the current data;
5) when distance > = length, directly copying memory data with length beginning from the position of the historical data to a decoding output buffer area by using a memcpy function, and then jumping to the step 7);
6) when the distance is less than length, firstly, using a memcpy function to copy memory data with the length of distance from the position of the historical data to a decoding output buffer area in batch, then returning to the step 5) to copy the memory data with the length of distance- = distance and distance = 2;
7) and (5) repeating the step 2) to the step 6) until all the data to be decoded are read, outputting the data in the decoding output buffer area, and finishing the decoding operation.
The method can effectively improve the coding and decoding speed of LZ series compression algorithms including LZ4, has wide application range and does not influence the compression precision. And because the design and implementation of the optimization algorithm are independent design research and development, the optimization algorithm has the characteristic of strong independent controllability.
By adopting the scheme of the invention, the coding and decoding speed of the LZ series compression algorithm can be effectively improved on the basis of not needing auxiliary hardware equipment and not reducing the compression ratio. Through experimental statistics, for LZ series compression algorithms except LZ4 such as LZMA and LZO, the method can shorten the coding time by 30% -50% and the decoding time by 50% -60%; for the LZ4 algorithm, the present invention can also shorten the encoding time by 10% and the decoding time by 15%.
The method can effectively improve the encoding and decoding speed of the LZ series compression algorithm on the premise of not depending on extra hardware and ensuring the original data compression ratio, thereby reducing the system overhead of actual applications such as Linux system starting, SPICE cloud desktop and the like spent in the data encoding and decoding process, and improving the performance and the execution efficiency of the actual applications.
The above description is only a preferred embodiment of the present invention, and is only used to help understanding the method and the core idea of the present invention, the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.