One kind being based on SSD-SMR disk mixing key assignments memory system data method for organizingTechnical field
The present invention relates to computers to mix storage system field, more particularly to a kind of is mixed based on SSD-SMR diskKey assignments memory system data method for organizing.
Background technique
With the rapid development of the emerging technologies such as mobile Internet, cloud computing, Internet of Things, information-intensive society enters networkingBig data era.The quickly global mobile data flow of universal Intelligent mobile terminal (computer, mobile phone, plate etc.) boostingIncrease substantially." the digital universe " of IDC LLC reports display, from 2005 to the year two thousand twenty whole world it is annual create, duplication andThe data volume used will increase by 300 times, rise to 40ZB from 130EB.The current total amount of data of Baidu is more than 1000PB, dailyWeb data to be treated reaches 10PB~100PB;Taobao's aggregate transaction data amount is up to 100PB.It among these include magnanimityThe unstructured datas such as picture, video, text, voice, however traditional relevant database can only be used to storage organizationData, with the exponential growth of data volume, relevant database can not be applicable in big data era.Solve this problemEfficient scheme is exactly to use non-relational database, and the key assignments (Key-Value, KV) in non-relational database is stored in big dataEpoch play vital role, have been widely used in the fields such as search engine, e-commerce, social networks,In more mature product have BigTable, LevelDB of Google company, HBase, Cassandra of Facebook companyAnd RocksDB, the PNUTS of Yahoo, the Atlas etc. of Baidu company.
At present the storage medium of mainstream capacity, cost, performance, in terms of there are great differences, if using singleThe covered range of needs of one storage medium building key assignments storage system institute is limited.Heterogeneous storage medium is integrated in oneIt rises, constitutes mixing storage system, respective advantage can be played, maximized favourable factors and minimized unfavourable ones, demand coverage area can be expanded significantly.It will be highThe novel storage medium of performance and the disk combination of large capacity are the common approach for mixing storage system, due to a watt record(Shingled Magnetic Recording, SMR) disk storage capacity is big and flash memory solid-state disk (the Solid based on NANDState Disk, SSD) the fast characteristic of read or write speed, the integrated of SSD and SMR disk met high capacity, high-performance and low costDemand.
Integrate SSD and SMR disk directly simply to construct the mixing key assignments storage system based on SSD-SMR with level frameworkIt is unable to fully the characteristic of the high-performance and SMR magnetic disk mass using SSD.The reason is that disk manufacturer is in order to be compatible with traditional softPart joined conversion layer to shield internal structure difference, such as flash translation layer (FTL) (the Flash Translation of SSD whereinLayers, FTL) and SMR watt record conversion layer (Shingled Translation Layers, STL), and conversion layer is depositedCausing upper layer application that can not perceive its internal structure, to can not carry out going deep into optimization for its internal structure.
Summary of the invention
The purpose of the present invention is to provide one kind to be based on SSD-SMR disk mixing key assignments memory system data method for organizing,With overcome the deficiencies in the prior art.
In order to achieve the above objectives, the present invention adopts the following technical scheme:
One kind will be counted in data storage procedure based on SSD-SMR disk mixing key assignments memory system data method for organizingAccording to key-value pair decoupling storage in the mixing storage system being made of SSD and SMR disk in storage Index process, according to numberAccording to feature by data distribution to SSD and SMR disk, the key in key-value pair is stored in SSD, the value in key-value pair is stored inSMR disk, after receiving upper layer application transmission request instruction, the LSM tree on memory and SSD searches target key, according toLookup result carries out command adapted thereto operation.
Further, key assignments index LSM tree is stored in SSD by the index based on LSM tree building key assignments storage system.
Further, the node of LSM tree is address tuple, with four-tuple < key, value-address, value-Offset, value-size > expression;Key is globally unique, is the identifier of value;Value-address and value-Offset is respectively directed to address and the deviant of daily record data.
Further, value is stored in a manner of daily record data in SMR disk.
Further, in SMR data in magnetic disk storing process, when the capacity of each value is greater than its own available capacityThreshold value when, and storage system at this time load it is relatively light when, carry out garbage reclamation at this time, the granularity of garbage reclamation is needleTo each key-value pair carry out, and each key-value pair carry out garbage collection operation be it is businesslike, first on SMR diskIt divides data collection according to Zone (region), pointer tuple is then updated on SSD, then it is complete for the garbage reclamation of this key-value pairAt.
Further, after storage system, which receives upper layer application, sends read request, the LSM tree traversed first in memory is looked intoTarget key is looked for, if searching failure, then the LSM tree on SSD searches target key, if searching failure, then returns to userError message shows that this key is not stored in storage system, if finding target key, reads the finger where current keyNeedle nodal information, the value of SMR disk is read according to pointer nodal information, and returns data to user.
Further, after storage system, which receives upper layer application, sends write request, traversal is on memory and SSD firstLSM tree searches target key, if searching failure, shows to be this time write operation, new space is distributed on SMR disk, willValue is written to the first address of free space, and address information and deviant, data size information and key are then combined into fourThis tuple is inserted into LSM tree by tuple;If searching successfully, this operation is to update operation, at this time first on SMR diskThe original of value spatially adds new data, then updates nodal information on LSM tree.
Compared with prior art, the invention has the following beneficial technical effects:
It is of the invention a kind of based on SSD-SMR disk mixing key assignments memory system data method for organizing, in data storage procedureIn, data are stored into the key-value pair in Index process and decoupling in the mixing storage system being made of SSD and SMR disk are depositedStorage, according to data characteristics by data distribution to SSD and SMR disk, is stored in SSD for the key in key-value pair, will be in key-value pairValue is stored in SMR disk, and after receiving upper layer application transmission request instruction, the LSM tree on memory and SSD searches targetKey carries out command adapted thereto operation according to lookup result, and distribution is broken up in the conjunction of key assignments data decoupler, realizes the maximum of mixing storageChange and utilizes.The layout type of data has been largely fixed systematicness in the mixing key assignments storage of SSD and SMR disk compositionCan, the Method of Data Organization of the decoupling key assignments storage of the present invention in most cases can all be hit using the read-write of key and is buffered inThe LSM tree of memory and value are stored by the way of additional write, and maximum dynamics excavates mixing storage system maximum performance, are reducedCarrying cost;Optimized using the high-performance of SSD and the large capacity feature of SMR disk using to data access speed, design andRealize low cost, the mixing key assignments storage system of high-performance and large capacity.The present invention can optimize reading and writing data speed, realize lowThe mixing key assignments storage system of cost, high-performance and large capacity.
Further, pointer tuple is stored in SSD, pointer tuple value as corresponding to key and its is in SMR diskAddress space composition, key assignments storage system can read and write the value on SMR disk according to key access pointer tuple;?SMR disk storage log data block, value is stored in a manner of log, i.e., write-in or more new capital are write in a manner of additional write every timeEnter to improve write performance, Value additional write back in a manner of log brings redundant storage expense, storage system can periodically intoRow cleaning compression, deletes original data information to save storage resource.
Detailed description of the invention
Fig. 1 is that key assignments mixes storage system architecture diagram.
Fig. 2 is key assignments of the present invention separation storage schematic diagram.
Fig. 3 is that key assignments storage system of the present invention reads schematic diagram.
Fig. 4 is that key assignments storage system of the present invention writes schematic diagram.
Specific embodiment
The invention will be described in further detail with reference to the accompanying drawing:
The organizational form of heterogeneous data in storage medium is always research hotspot, and common mode has building Hierarchical storageWith storage extending transversely, i.e. SSD is combined as the cache layer or SSD and HDD of HDD in unified logical address space, the present inventionConsider data organization feature in key assignments storage, selects the storage medium organizational form of storage extending transversely.HM-SSD and SMR diskIt is that difference towards Host Administration, with conventional storage media is its internal physical structure to upper layer application exposure, upper layer is answeredWith can make corresponding optimisation strategy according to its physical structure, therefore both media are that the present invention provides hardware supporteds.The Method of Data Organization that the present invention designs make full use of SSD high-performance and SMR disk large capacity feature come optimize using pairData access speed, design and implementation low cost, the mixing key assignments storage system of high-performance and large capacity.
One kind being based on SSD-SMR disk mixing key assignments memory system data method for organizing, is based on HM-SSD and HM-SMR magneticThe storage system extending transversely of disk composition, the key-value pair Key and Value of key assignments storage occur in pairs, Key Value Data amountIt is small, for indexing;Value data volume is larger, for storing data;Data are stored into the key-value pair in Index process by SSDWith decoupling storage in the mixing storage system of SMR disk composition, according to data characteristics by data distribution to SSD and SMR disk,By the biggish small documents of access frequency, i.e. key in key-value pair is stored in SSD, the less big file of access frequency, i.e. key-value pairIn value be stored in SMR disk, the compromise of performance and cost is realized with this.
As shown in Figure 1, underlying storage medium is made up of HM-SSD and HM-SMR disk physical bus interconnection, host sideConversion layer (TL) is made of multiple software modules, including data management, garbage reclamation, cache management and address of cache function mouldBlock, this partial function module realize that upper layer application externally provides key-value pair storage service in operating system software layer.
It is primarily based on SSD-SMR disk mixing key assignments memory system data treatment process, as shown in Fig. 2, key assignments is indexedLSM tree is stored in SSD, and the node of LSM tree is address tuple, with four-tuple < key, value-address, value-Offset, value-size > expression, key are globally unique, are the identifiers of value, value is in SMR disk with logThe mode of data stores, and value-address and value-offset are respectively directed to address and the deviant of daily record data;
SMR disk itself updates, since the value of SMR disk is that addition is write in a manner of log, it is therefore desirable to periodicallyGarbage collection operation, specific garbage collection procedure: when each value capacity be greater than its own available capacity threshold valueWhen, and storage system at this time load it is relatively light when, carry out garbage reclamation at this time, the granularity of garbage reclamation is for eachKey-value pair carry out, and each key-value pair carry out garbage collection operation be it is businesslike, first on SMR disk according toZone (region) divides data collection, and pointer tuple is then updated on SSD, then completes for the garbage reclamation of this key-value pair.
It is illustrated in figure 3 key assignments storage system and reads schematic diagram, after storage system, which receives upper layer application, sends read request,The LSM tree traversed first in memory searches target key, if searching failure, then the LSM tree on SSD is searched, if searchingFailure, then return to user's error message, show that this key is not stored in storage system, if finding target key, readsPointer nodal information where key at present, the value of SMR disk is read according to pointer nodal information, and returns data to useFamily.So far reading process terminates.
It is illustrated in figure 4 key assignments storage system and writes schematic diagram, after storage system, which receives upper layer application, sends write request,LSM tree of the traversal on memory and SSD first searches target key, if searching failure, shows to be this time write operation, in SMR magneticNew space is distributed on disk, and value is written to the first address of free space, it is then that address information and deviant, data is bigSmall information and key are combined into four-tuple, this tuple is inserted into LSM tree;If searching successfully, this operation is to update operation,It adds new data on original space of value first on SMR disk at this time, then updates nodal information on LSM tree.So far it readsProcess terminates.
In data storage procedure, data are stored into the key-value pair in Index process and are mixed what is be made of SSD and SMR diskDecoupling storage in storage system is closed to deposit the key in key-value pair according to data characteristics by data distribution to SSD and SMR diskBe placed on SSD, the value in key-value pair be stored in SMR disk, according to receiving after upper layer application sends request instruction, in memory andLSM tree on SSD searches target key, according to lookup result, carries out command adapted thereto operation, and the conjunction of key assignments data decoupler is broken up pointCloth realizes maximally utilizing for mixing storage.The layout type for mixing data in key assignments storage of SSD and SMR disk composition is verySystem performance, the Method of Data Organization of the decoupling key assignments storage of the present invention, although the read-write process of data are determined in big degreeMiddle relative increase is once searched, but is buffered in the LSM tree of memory since the read-write of key in most cases can all be hit, andValue is stored by the way of additional write, and the performance of the total system of storage system can't be affected substantially, and the present invention is maximumDynamics excavates mixing storage system maximum performance, reduces carrying cost;It is special using the high-performance of SSD and the large capacity of SMR diskPoint is applied to optimize to data access speed, design and implementation low cost, the mixing key assignments storage system of high-performance and large capacity.The present invention can optimize reading and writing data speed, realize low cost, the mixing key assignments storage system of high-performance and large capacity.