Movatterモバイル変換


[0]ホーム

URL:


CN119376617A - Data compression method, device and equipment - Google Patents

Data compression method, device and equipment
Download PDF

Info

Publication number
CN119376617A
CN119376617ACN202411309469.3ACN202411309469ACN119376617ACN 119376617 ACN119376617 ACN 119376617ACN 202411309469 ACN202411309469 ACN 202411309469ACN 119376617 ACN119376617 ACN 119376617A
Authority
CN
China
Prior art keywords
data
compression
stored
size
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411309469.3A
Other languages
Chinese (zh)
Inventor
王晓琦
赵梓伸
钟戟
王陈园
付凤之
孙宇伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co LtdfiledCriticalShandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202411309469.3ApriorityCriticalpatent/CN119376617A/en
Publication of CN119376617ApublicationCriticalpatent/CN119376617A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供一种数据压缩方法、装置和设备。该方法包括:获取待存储的数据;根据数据的分类结果,确定对应的目标压缩字典;根据目标压缩字典对数据进行压缩,得到压缩后的数据。本申请实施例有效地提升了存储设备的寿命和访问性能。

The present invention provides a data compression method, device and equipment. The method includes: obtaining data to be stored; determining a corresponding target compression dictionary according to the classification result of the data; compressing the data according to the target compression dictionary to obtain compressed data. The embodiment of the present application effectively improves the life and access performance of the storage device.

Description

Data compression method, device and equipment
Technical Field
The present invention relates to the field of data compression technologies, and in particular, to a data compression method, apparatus, and device.
Background
The storage medium in the Solid state disk (Solid STATE DISK, SSD) is worn out when in use, and in the nonvolatile memory, a plurality of pieces of bit information are stored in one unit, read and write in units of pages, and erase in units of blocks.
In the related art, the storage device can only receive the storage related information in the input/output command, the data volume of writing and reading is larger, when the erasing times of one block reach the limit, the error rate can be greatly improved, and the service life of the solid state disk and the access performance of the solid state disk are seriously affected.
Disclosure of Invention
The invention provides a data compression method, a device and equipment, which can compress the data of the same category by adopting the same dictionary file through classifying the data to be stored, effectively reduce dictionary file entries, realize low-cost data compression, compress and store the data based on a compression dictionary determined by the classifying result of the data, ensure that the storage space occupied by storing the same data is less, store more data on a limited storage medium, reduce the data quantity written and read, and improve the service life and the access performance of storage equipment.
The invention provides a data compression method, which comprises the following steps.
Acquiring data to be stored;
determining a corresponding target compression dictionary according to the classification result of the data;
and compressing the data according to the target compression dictionary to obtain compressed data.
According to the data compression method provided by the invention, the method further comprises the following steps:
And determining a classification result of the data according to the data with the preset size in the data to be stored and a preset data classification model, wherein the data classification model is used for identifying the type of the data to be stored.
According to the data compression method provided by the invention, the method further comprises the following steps:
Storing compression type information in a first flash translation layer FTL mapping table;
the storage location of the compressed data and the size of the compressed data are stored in a second FTL mapping table.
According to the data compression method provided by the invention, the method further comprises the following steps:
and stopping the compression of the uncompressed data and storing the compressed data and the uncompressed data when the sum of the compressed data and the uncompressed data is determined to be smaller than a first value in the compression process of the data to be stored under the condition that the size of the data to be stored is larger than a first value and smaller than a second value, wherein the size of the first value is related to the size of a physical page of the solid state disk.
According to the data compression method provided by the invention, the method further comprises the following steps:
and under the condition that the size of the data to be stored is larger than a second value, determining a compression stopping threshold value corresponding to the data to be stored according to the size of the sequential write data quantity and the size of the random write data quantity in the data to be stored.
According to the data compression method provided by the invention, the determining the threshold value for stopping compression corresponding to the data to be stored according to the size of the sequential write data volume and the size of the random write data volume in the data to be stored comprises the following steps:
determining a compression stopping threshold corresponding to the data to be stored based on the following mode:
Wherein,A threshold value for stopping compression corresponding to the data to be stored; the method comprises the steps of storing data to be stored, representing the size of sequential write data quantity in the data to be stored, r representing the size of random write data quantity in the data to be stored, p representing the size of a physical page in a solid state disk, n representing a preset multiple value of the size of the physical page; Is related to the compression dictionary size.
The invention also provides a data compression device, which comprises the following modules:
The acquisition module is used for acquiring data to be stored;
the determining module is used for determining a corresponding target compression dictionary according to the classification result of the data;
and the compression module is used for compressing the data according to the target compression dictionary to obtain compressed data.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a data compression method as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data compression method as described in any of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a data compression method as described in any one of the above.
The data compression method, the device and the equipment provided by the invention can be used for compressing the data to be stored by adopting the same dictionary file, effectively reducing dictionary file entries and realizing low-cost data compression, and compressing and storing the data by the compression dictionary determined based on the data classification result, so that the storage space occupied by storing the same data is less, more data can be stored on a limited storage medium, the data quantity written and read is reduced, and the service life and the access performance of the storage equipment are improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a data compression method according to the present invention.
Fig. 2 is a schematic diagram of a mapping table provided in the present invention.
Fig. 3 is a schematic diagram of a compression module provided by the present invention.
FIG. 4 is a second flow chart of the data compression method according to the present invention.
Fig. 5 is a schematic diagram of a data compression device provided by the present invention.
Fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The data compression method, apparatus and device of the present invention are described below in conjunction with fig. 1-6.
In order to facilitate a clearer understanding of the technical solutions of the embodiments of the present application, some technical contents related to the embodiments of the present application will be first described.
A Flash storage medium in a Solid State Disk (SSD) is worn out when in use, and in a NAND memory, a plurality of pieces of bit information are stored in one unit, read and write in units of pages, and erase in units of blocks. For one memory cell of TLC, wear starts after 3000 write-erase operations. When the erasing times of one block reach the limit, the error rate can be greatly improved, and the service life of the solid state disk is seriously influenced. Therefore, the writing and erasing times should be reduced as much as possible in the aspect of system design, and the service life of the solid state disk is prolonged to the maximum extent under the condition of not increasing the cost of the hard disk.
The data compression technology is a means for prolonging the service life of the solid state disk, and under the same condition, fewer data are written, and fewer erasing times are generated. Therefore, the data can be compressed and deployed in the SSD, and the received user data is firstly compressed and then stored in the NAND, so that the data size and the erasing times can be effectively reduced. In addition, after data compression, the data volume actually written and read becomes smaller, and the access performance of the SSD can be improved. The data compression algorithm reduces the data volume by eliminating redundant information in the data, thereby achieving the purpose of data compression. Dictionary compression techniques are one of the common data compression algorithms. Before compression, a dictionary is built, which contains the common patterns of the data to be compressed. When data is compressed, the data to be compressed is matched with the dictionary, and the matched mode is used for replacing the corresponding index value. And finally, storing the compressed data and the dictionary for decompression operation.
Many file systems allow users to open file compression functions to save space required by the storage device. But these file system level compression mechanisms consume valuable CPU resources. While the storage device can only receive storage related information in I/O commands, advanced information related to context (e.g., file type) cannot be used to avoid compressing files that are already in compressed format, and optimization algorithms cannot be used for different file types. Meanwhile, the existing method can independently construct dictionaries for different data blocks, and the problem of dictionary repetition is inevitably caused.
Fig. 1 is a schematic flow chart of a data compression method according to the present invention, as shown in fig. 1, the method includes the following steps:
step 101, acquiring data to be stored.
Specifically, in the prior art, the storage device can only receive the storage related information in the input/output command, and the data volume written and read is large, so that the access performance of the solid state disk is poor. In order to solve the above-mentioned problem, in the embodiment of the present application, data to be stored is first acquired. Optionally, in the embodiment of the present application, the storage device may be connected to a host, where the host sends a storage instruction to the storage device, and instructs the storage device to store data.
And 102, determining a corresponding target compression dictionary according to the classification result of the data.
Specifically, after the storage device acquires the data to be stored, the storage device may classify the data to be stored, and determine, according to the classification result of the data, that the corresponding target compression dictionary compresses. In other words, in the embodiment of the application, the same dictionary file is adopted for the data belonging to the same category, so that dictionary file entries can be effectively reduced, and low-cost data compression is realized. Alternatively, the compression dictionary may be constructed based on an existing manner, which is not described in detail in the implementation of the present application.
And 103, compressing the data according to the target compression dictionary to obtain compressed data.
Specifically, after the corresponding target compression dictionary is determined according to the data classification result, the data can be compressed and stored based on the determined compression dictionary, so that the storage space occupied by storing the same data is smaller, more data can be stored on a limited storage medium, the data quantity written and read is reduced, and the service life and the access performance of the storage device are improved.
The method of the embodiment can compress the data of the same category by adopting the same dictionary file by classifying the data to be stored, effectively reduces dictionary file entries, realizes low-cost data compression, compresses and stores the data by the compression dictionary determined based on the classification result of the data, ensures that the storage space occupied by storing the same data is less, can store more data on a limited storage medium, reduces the data quantity written and read, and improves the service life and the access performance of the storage device.
In an embodiment, the data compression method further comprises:
And determining a classification result of the data according to the data with the preset size in the data to be stored and a preset data classification model, wherein the data classification model is used for identifying the type of the data to be stored.
Specifically, when classifying the data to be stored, the application can determine whether to compress and the compression type only by scanning a small amount of written data without scanning all the data written into the buffer area, thereby effectively improving the response speed.
Optionally, in the implementation of the application, the data to be stored is classified based on a preset data classification model, so that the technical advantage of machine learning can be effectively utilized to realize the classification compression of the stored data, the repeated entry of a compression dictionary and the occupied space of the data on a storage medium are reduced, and the storage capacity is improved.
Exemplary, the classification model in the embodiment of the application is specifically as follows:
(1) The highly similar sample data is clustered by using a density-based clustering method (DBSCAN) to determine the optimal category number of the sample data.
(2) And marking corresponding sample data by using class numbers generated by clustering, and inputting the marked sample data into a KNN classification model for classification training. Sample characteristics include random and sequential two data command request patterns, number of lbs, type of lbs (lbaf), proportion of mixed read and write in data I/O, etc. And finding K training samples closest to the test set samples in the training set based on a certain distance measure, and judging the category of the test samples by using a voting method through the found K training samples.
Alternatively, the compression program uses a dictionary-based algorithm to convert the input data into 16-bit unsigned integers and calculates the number of occurrences of each integer to generate the best dictionary (the cached version of the dictionary is maintained by the DRAM space of the SSD).
Optionally, a density-based clustering method (DBSCAN) is selected from a plurality of clustering algorithms, and the algorithm can automatically determine the number of clustering centers without manual setting. In addition, some files, such as bmp files, do not support file compression, and jpg (jpeg) files can only be compressed in a lossy way, so that a clustering algorithm can classify data types unsuitable for compression into the same category, and the generated model can also be used for predicting whether input data has compressible value. The classification model adopts a k-nearest neighbor algorithm (KNN) which has high precision, is insensitive to abnormal values, has no data input assumption and supports multiple classifications. KNN is a supervised learning algorithm, and the category of the new sample is determined by k training samples closest to the new sample according to a classification decision. The distance metric, the choice of k value and the classification decision rule are three basic elements constituting the k-nearest neighbor method. Since modern file systems tend to assign consecutive logical block addresses to the same file, a sort prediction is made by reading part of the data (512 bytes) from a write request.
Compared with the method for generating training samples by manually labeling a large amount of data, the method provided by the embodiment of the application has the advantages that the time consumption is shorter and the generation error of the training samples can be avoided by clustering the sample data by adopting the clustering algorithm. Sample data for training includes text files, program binaries, pictures, video, program source code, and the like. The clustering algorithm avoids using a manual sample labeling method which is easy to make mistakes and takes long time for a large amount of data. In addition, since the compression module performs prediction and compression processing by using idle cores, the classification model should make predictions within the delay time of writing a flash page, so that the firmware program uses pipeline parallelism to hide the predicted delay in the worst case. Typically the minimum page write latency is about 200us, so the training goal of the classification model is to achieve classification within 200 us.
The method of the embodiment can effectively utilize the technical advantages of machine learning to realize classifying and compressing the stored data, reduce repeated entries of a compression dictionary and the occupied space of the data on a storage medium, improve the storage capacity, and can effectively improve the response speed by only determining whether to compress and compress the types without scanning all data written into a buffer area and only scanning a small amount of written data when classifying the data to be stored.
In an embodiment, the data compression method further comprises:
Storing compression type information in a first flash translation layer FTL mapping table;
the storage location of the compressed data and the size of the compressed data are stored in a second FTL mapping table.
Specifically, in a conventional flash memory storage system, the number of logical data blocks corresponding to each physical flash page is the same. However, after data compression, each flash page of the SSD may correspond to an unequal number of logical blocks, so that a new mapping table is needed to record compressed data information, and a field is added in the original mapping table to distinguish compressed data from uncompressed data. Optionally, in the embodiment of the application, the compression type information is stored in the first flash memory conversion layer FTL mapping table, and the storage position of the compressed data and the size of the compressed data are stored in the second FTL mapping table, so that the accurate recording of the compressed data information is realized, and the stored compressed data can be accurately and rapidly positioned when the data in the storage device is read.
For example, fig. 2 is an FTL mapping representation, comprising a first FTL mapping table and a second FTL mapping table.
Optionally, the first FTL mapping table adds a compression class field on the basis of the original mapping table, and stores a mapping between a Logical Page Number (LPN) and a Physical Page Number (PPN), while recording a compression class, where a compression class of 0 represents that data is not compressed. In the first FTL mapping table, LPN (0) and LPN (1) are mapped to PPN (26) and PPN (44), uncompressed data is associated, LPN (16), LPN (17), LPN (18), LPN (19) are mapped to PPN (89), LPN (42), LPN (43), LPN (44) are mapped to PPN (90), compressed data is associated.
Optionally, the second FTL mapping table records information of compressed data stored on the physical page, including Offset (OFF), length (LEN). The offset field stores the starting position in the corresponding physical address and the length field stores the length of the compressed data. Storing the compressed data in separate physical pages may result in a portion of the pages being wasted. Therefore, when the compressed data is continuously written, the writing can be started from the next sub-page of the previous compressed data, and the space waste is avoided. The offset field and the length field are combined, and this storage mode can be implemented.
For example, the physical page size is 8KB, and can be divided into 16 sub-pages (sub-page size 512B). The compressed data can be written from the 0 th sub-page of the PPN (89), and can be written to the 4 th sub-page of the PPN (90) due to the data length 10000B, and the waste of 240 bytes of the 4 th sub-page of the PPN (90) cannot be avoided, but compared with the single-page storage of the single-page writing. The next compressed data is written, and writing can be started from the 5 th sub-page of the PPN (90), so that the offset is 5, that is, the offset value characterizes the storage initial physical sub-page corresponding to the compressed data.
Optionally, when the SSD receives the read command, the first mapping table is first queried, the requested physical page number is obtained from the flash memory array, and whether the data to be read is compressed is determined according to the compression type field. And for the compressed data, inquiring a second mapping table, and obtaining the compressed data according to the physical page number, the offset and the length. The compressed class corresponding dictionary file is then used to reconstruct the original data content, decompressing into SSD DRAM buffers. After decompression is completed, the I/0 interface initiates DMA data transfer and completes the request.
According to the method, the compression type information is stored in the first flash memory conversion layer FTL mapping table, the storage position of the compressed data and the size of the compressed data are stored in the second FTL mapping table, and accurate recording of the compressed data information is achieved rapidly and accurately, so that the stored compressed data can be positioned accurately and rapidly when the data in the storage device are read.
In an embodiment, the data compression method further comprises:
And under the condition that the size of the data to be stored is larger than a first value and smaller than a second value, if the sum of the compressed data and the uncompressed data is determined to be smaller than the first value in the compression process of the data to be stored, stopping compressing the uncompressed data and storing the compressed data and the uncompressed data, wherein the size of the first value is related to the size of a physical page of the solid state disk.
Specifically, in the embodiment of the application, when the size of the data to be stored is larger than the first value and smaller than the second value and the sum of the compressed data and the uncompressed data is determined to be smaller than the first value in the compression process of the data to be stored, the compression of the uncompressed data is stopped and the compressed data and the uncompressed data are stored, so that the response speed to a compression instruction can be effectively improved on the basis of reducing the storage space. Alternatively, the size of the first value may be the size of a physical page of the storage device, such as 4K.
For example, taking the example that the physical page of the storage device is 4K in size, when the data to be stored is more than 4K and less than 8K, the compressed data amount is monitored in the process of pattern matching with the dictionary file, when the total data amount is found to be less than 4K (including compressed and uncompressed data), the compression algorithm is stopped, the length of the compressed data is recorded and single page storage is performed. When the data is read, the compressed part is decompressed and combined with the uncompressed part to generate the original data. Therefore, on the basis of reducing the storage space, the response speed to the compressed instruction and the stored instruction can be effectively improved.
In the method of the above embodiment, when the size of the data to be stored is greater than the first value and less than the second value, if the sum of the compressed data and the uncompressed data is determined to be less than the first value in the compression process of the data to be stored, the compression of the uncompressed data is stopped and the compressed data and the uncompressed data are stored, so that the response speed to the compressed instruction and the stored instruction can be effectively improved on the basis of reducing the storage space.
In an embodiment, the data compression method further comprises:
And under the condition that the size of the data to be stored is larger than the second value, determining a compression stopping threshold corresponding to the data to be stored according to the size of the sequential write data amount and the size of the random write data amount in the data to be stored.
Specifically, when the size of the data to be stored is larger than the second value, in the embodiment of the application, the compression stopping threshold corresponding to the data to be stored is determined according to the size of the sequential write data amount and the size of the random write data amount in the data to be stored, so that the compression can be stopped when the size of the compressed data reaches the compression stopping threshold, and the response speed to the compression instruction and the storage instruction can be effectively improved on the basis of reducing the storage space.
For example, when the data to be stored is greater than n 4K (n++2), decision mechanism processing is performed, at which time a threshold thr for stopping compression is set, and compression is stopped when the data amount is smaller than the threshold, the threshold being related to the write mode, dictionary file size { a1, a2,., aM }:
Wherein,A threshold value representing a stop compression corresponding to data to be stored; The method comprises the steps of representing the size of sequential write data in data to be stored, r representing the size of random write data in the data to be stored, and p representing the size of physical pages in the solid state disk. That is, in the case where the sequential write data amount is larger than the random write data amount, the compression threshold is relatively small, so that a large compression ratio can be obtained, and in the case where the random write ratio is high, the compression threshold is relatively large, so that a quick response is obtained.
According to the method, the compression stopping threshold corresponding to the data to be stored is determined according to the size of the sequential write data quantity and the size of the random write data quantity in the data to be stored, so that compression can be stopped under the condition that the data size in the compression process is the compression stopping threshold, the response speed to a compression instruction can be effectively improved on the basis of reducing the storage space, and the compression threshold is relatively smaller under the condition that the sequential write data quantity is larger than the random write data quantity, so that a larger compression ratio can be obtained, and when the random write ratio is high, the compression threshold is relatively larger, so that a quicker response degree can be obtained.
Illustratively, FIG. 3 is an SSD system architecture that includes a data compression Module (MLDC).
In the SSD system architecture of the present application, in addition to the flash memory array for data storage, the SSD also includes a system on a chip (SoC) including an I/O interface, a microprocessor core, a hardware accelerator, a flash interface, and a DRAM interface as a controller. To communicate with a host, the I/O interface is connected to a PCIe, SATA socket, and data is interacted with the host using the SATA or NVMe standard. Flash memory arrays of SSDs are typically multi-channel, allowing parallel access, and providing sufficient internal access bandwidth between the data array and the SoC-based controller. The embedded kernel inside the Soc will execute the firmware program to parse the commands from the I/O interface and perform the FTL functions. The FTL layer performs mapping of host logical addresses (LBAs) to Flash (Flash) physical address space (PPN). Most SSDs have on-board DRAMs inside as buffers or data buffers for writing data, and most mapping tables can be placed on the DRAMs to improve access performance. The flash memory controller is responsible for managing the reading and writing of data from the cache memory to the flash memory.
The data compression Module (MLDC) based on machine learning interacts with the host through the I/O interface, classifies input data, and combines a dictionary compression algorithm to realize lossless compression of the data. At the heart of dictionary compression algorithms is creating a dictionary that translates repeated patterns in the original data into shorter manifestations. Optionally, the MLDC module specifically includes a classifier, a dictionary file unit, a data compressor, and a data decompressor. The classifier comprises a trained and perfect classification model, can divide input data with the same characteristics into the same compression class, and each class corresponds to a specific dictionary file generated by training. The compressor compresses the data using the specified dictionary file. When the data is read, the decompressor decompresses the data according to the compression type using the corresponding dictionary file.
Fig. 4 is a compression flow of compressing input data of a solid state disk, specifically:
1. When the host interface receives the write command, it forwards the command to MLDC modules.
2. The classifier uses the DMA hardware unit to look at a small portion of the written data in the data buffer, the predictive model determines the compression class from this portion of data and determines if this portion of data is compressible. The machine learning model combined by the clustering algorithm and the classifying algorithm in the embodiment of the application can classify the input data, and can determine the compression type or the data which does not need to be compressed only by scanning a small amount of written data, thereby reducing the calculation overhead of the solid state disk.
3. The data compressor starts performing a compression operation:
if the data belongs to a category with incompressible or limited benefits after compression (such as files in a compressed state), the compression operation is not executed so as to save the computing resources;
if the data belongs to the compressible category, the compressor uses the shared dictionary file of the category designated by the classifier to match the data to be compressed with the dictionary, and replaces the corresponding index value with the matched mode to compress the content of the whole buffer data. The classifier uses the same dictionary file for the same class of data, avoiding duplication of dictionary entries. Alternatively, for compressed data, the read request may require the data decompressor to extract multiple flash pages, e.g., read LPN (43), the data decompressor needs to decompress the entire PPN (90), possibly resulting in read command latency. To mitigate read latency to some extent, read requests may be responded to immediately upon decompression to an offset. In addition, the firmware program designing MLDC modules retains the decompressed data in the SSD DRAM buffers until the buffer management policy (e.g., first in first out) requires reclamation of the buffer space. The effect of pre-reading data can be achieved if the workload exhibits reasonable spatial locality, as the workload tends to request the rest of the compressed data in a short time.
4. After the data compression is completed, the mapping information of the FTL is updated.
Since the compression operation changes the data size, the MLDC module needs to pass the compressed data size and compression class to the FTL. The FTL saves the above information in the mapping table in order to correctly locate the stored data. The flash memory controller is responsible for writing compressed data blocks into successive physical addresses.
According to the method, page data are classified into different compression categories based on a machine learning technology without prompting of a software system, the same category is compressed and stored by using a shared dictionary, repeated compression is avoided, transparent and low-cost data compression is realized, and the service life of the solid state disk is prolonged under the condition that equipment cost and software operation are not increased. In addition, although the application adds the data compression function, the MLDC module does not need the host system to provide additional information, so the application can be compatible with the existing operating system and I/O interface without changing the host software stack, thereby providing plug-and-play type upgrade applicable to the existing system.
The data compression device provided by the application is described below, and the data compression device described below and the data compression method described above can be referred to correspondingly. The data compression device according to the embodiment of the present application, as shown in fig. 5, includes:
An obtaining module 510, configured to obtain data to be stored;
a determining module 520, configured to determine a corresponding target compression dictionary according to the classification result of the data;
And the compression module 530 is configured to compress the data according to the target compression dictionary, to obtain compressed data.
Optionally, the determining module 520 is further configured to:
And determining a classification result of the data according to the data with the preset size in the data to be stored and a preset data classification model, wherein the data classification model is used for identifying the type of the data to be stored.
Optionally, the compression module 530 is further configured to:
Storing compression type information in a first flash translation layer FTL mapping table;
the storage location of the compressed data and the size of the compressed data are stored in a second FTL mapping table.
Optionally, the compression module 530 is further configured to:
And under the condition that the size of the data to be stored is larger than a first value and smaller than a second value, if the sum of the compressed data and the uncompressed data is determined to be smaller than the first value in the compression process of the data to be stored, stopping compressing the uncompressed data and storing the compressed data and the uncompressed data, wherein the size of the first value is related to the size of a physical page of the solid state disk.
Optionally, the compression module 530 is further configured to:
And under the condition that the size of the data to be stored is larger than the second value, determining a compression stopping threshold corresponding to the data to be stored according to the size of the sequential write data amount and the size of the random write data amount in the data to be stored.
Optionally, the compression module 530 is further configured to:
determining a compression stopping threshold corresponding to the data to be stored based on the following mode:
Wherein,A threshold value representing a stop compression corresponding to data to be stored; the method comprises the steps of representing the size of sequential write data in data to be stored, r representing the size of random write data in the data to be stored, p representing the size of a physical page in a solid state disk, n representing a preset multiple value of the size of the physical page; Is related to the compression dictionary size.
Fig. 6 illustrates a physical schematic diagram of an electronic device that may include a processor 610, a communication interface (Communications Interface) 620, a memory 630, and a communication bus 640, where the processor 610, the communication interface 620, and the memory 630 communicate with each other via the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform a data compression method including obtaining data to be stored, determining a corresponding target compression dictionary according to a classification result of the data, and compressing the data according to the target compression dictionary to obtain compressed data.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer is capable of executing a data compression method provided by the above methods, where the method includes obtaining data to be stored, determining a corresponding target compression dictionary according to a classification result of the data, and compressing the data according to the target compression dictionary to obtain compressed data.
In yet another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the data compression method provided by the above methods, where the method includes obtaining data to be stored, determining a corresponding target compression dictionary according to a classification result of the data, and compressing the data according to the target compression dictionary to obtain compressed data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (10)

CN202411309469.3A2024-09-192024-09-19 Data compression method, device and equipmentPendingCN119376617A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411309469.3ACN119376617A (en)2024-09-192024-09-19 Data compression method, device and equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411309469.3ACN119376617A (en)2024-09-192024-09-19 Data compression method, device and equipment

Publications (1)

Publication NumberPublication Date
CN119376617Atrue CN119376617A (en)2025-01-28

Family

ID=94326055

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411309469.3APendingCN119376617A (en)2024-09-192024-09-19 Data compression method, device and equipment

Country Status (1)

CountryLink
CN (1)CN119376617A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20200133858A1 (en)*2018-10-262020-04-30EMC IP Holding Company LLCMethod, apparatus and computer program product for storing data
CN115774699A (en)*2023-01-302023-03-10本原数据(北京)信息技术有限公司Database shared dictionary compression method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20200133858A1 (en)*2018-10-262020-04-30EMC IP Holding Company LLCMethod, apparatus and computer program product for storing data
CN115774699A (en)*2023-01-302023-03-10本原数据(北京)信息技术有限公司Database shared dictionary compression method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
O. LORINTIU: "Compressed sensing reconstruction of 3D ultrasound data using dictionary learning", IEEE, 29 January 2015 (2015-01-29)*
曾露;李鹏;王焕东;: "基于区域协作的Cache压缩", 高技术通讯, no. 05, 15 May 2016 (2016-05-15)*

Similar Documents

PublicationPublication DateTitle
US11307769B2 (en)Data storage method, apparatus and storage medium
CN108804031B (en)Optimal record lookup
US11327929B2 (en)Method and system for reduced data movement compression using in-storage computing and a customized file system
US11218163B2 (en)Memory system and information processing system
US11263147B2 (en)Memory system including logical-to-physical address translation table in a first cache and a compressed logical-to-physical address translation table in a second cache
US20170177497A1 (en)Compressed caching of a logical-to-physical address table for nand-type flash memory
US20130124796A1 (en)Storage method and apparatus which are based on data content identification
TW201443640A (en)Storage address space to NVM address, span, and length mapping and converting
JP2019128906A (en)Storage device and control method therefor
US10592150B2 (en)Storage apparatus
EP3822795B1 (en)Data storage and acquisition method and device
CN116932493A (en)Data compression method and related device
US12124707B2 (en)Memory system
JP7571026B2 (en) SYSTEM, METHOD, AND APPARATUS FOR ELIMINATING DUPLICATIONS AND VALUE REDUNDANCY IN COMPUTER MEMORY - Patent application
US11455256B2 (en)Memory system with first cache for storing uncompressed look-up table segments and second cache for storing compressed look-up table segments
US20240311003A1 (en)Memory system and method
US11099756B2 (en)Managing data block compression in a storage system
CN119376617A (en) Data compression method, device and equipment
Te et al.Pensieve: a machine learning assisted SSD layer for extending the lifetime
CN118708108B (en) Data sorting method and storage device
US12224778B2 (en)Dictionary compressor, data compression device, and memory system
CN120335731B (en)Data manager, data management method, electronic device, and storage medium
US20250094344A1 (en)Chained mapping with compression
US12411602B2 (en)Memory system
CN117519572A (en)Data storage method, device, processing equipment, storage system and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp