Movatterモバイル変換


[0]ホーム

URL:


CN102968279B - A kind of store the method that system simplifies configuration automatically - Google Patents

A kind of store the method that system simplifies configuration automatically
Download PDF

Info

Publication number
CN102968279B
CN102968279BCN201210453207.5ACN201210453207ACN102968279BCN 102968279 BCN102968279 BCN 102968279BCN 201210453207 ACN201210453207 ACN 201210453207ACN 102968279 BCN102968279 BCN 102968279B
Authority
CN
China
Prior art keywords
metadata
space
module
management
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210453207.5A
Other languages
Chinese (zh)
Other versions
CN102968279A (en
Inventor
王恩东
张宇
文中领
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co LtdfiledCriticalInspur Electronic Information Industry Co Ltd
Priority to CN201210453207.5ApriorityCriticalpatent/CN102968279B/en
Publication of CN102968279ApublicationCriticalpatent/CN102968279A/en
Application grantedgrantedCritical
Publication of CN102968279BpublicationCriticalpatent/CN102968279B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The present invention provides a kind of and stores the method that system simplifies configuration automatically, automatically simplifies configuration and is generally used in storage system, it is intended to provide jumbo virtual drive to operating system. Automatic reduction techniques virtual offer can exceed the volume space of reality to upper strata, namely allows operating system think operable capacity more than the actual capacity being actually provided to it. This is primarily due to the data in enterprise and is being gradually increased often, rather than has so many at the very start, and configuration is simplified in employing automatically can improve the utilization rate of memory capacity. Automatically simplify configuration storage scheme and can provide jumbo virtual drive to operating system. Operating system thinks there is so big disk space when distributing disk space, and therefore along with the increase of business data amount, enterprise can consider that situation substep increases storage device, it is not necessary to operating system is adjusted. This is equivalent to the warm connection function being to achieve storage device, it is possible to improve storage efficiency and the motility of disk unit.

Description

Translated fromChinese
一种存储系统自动精简配置的方法A method for automatic thin provisioning of storage system

技术领域technical field

本发明涉及服务器存储系统,具体地说是一种存储系统自动精简配置的方法。The invention relates to a server storage system, in particular to a method for automatic thin provisioning of a storage system.

背景技术Background technique

实际的物理存储设备可能并没有这么多,而让操作系统认为有这么大的存储空间。因此,自动精简配置必须能够支持动态扩容,在实际的物理存储设备设备空间占满前,使用自动精简配置的动态扩容方法扩充实际的存储空间,才能保证用户系统的正常使用。否则,操作系统因为其并不知实际物理容量情况,而使用并不存在的存储空间,将造成不可估量的严重问题。There may not be so many actual physical storage devices, and the operating system thinks that there is such a large storage space. Therefore, thin provisioning must be able to support dynamic expansion. Before the actual physical storage device space is full, use the dynamic expansion method of thin provisioning to expand the actual storage space, so as to ensure the normal use of the user system. Otherwise, the operating system uses non-existing storage space because it does not know the actual physical capacity, which will cause immeasurable and serious problems.

目前自动精简配置在EMC、IBM、HDS、HP3PAR等国外存储厂商的存储系统中已经广泛运用,但是国内厂商在该领域还是稍显薄弱。At present, thin provisioning has been widely used in storage systems of foreign storage manufacturers such as EMC, IBM, HDS, and HP3PAR, but domestic manufacturers are still weak in this field.

HP3PAR的自动精简配置总体来说有很多优点,例如其单位存储单元为16KB,这是一种非常精细的颗粒度很高的自动精简配置,相比一些更加粗糙的实现方式,3PAR的自动精简配置容量节约达到最大化,能够有更明显的性能效果,并且其自动精简配置是完全自动化的自动精简配置,在传统的未使用自动精简配置的存储环境中,通常会给一些应用程序配置大量冗余容量(通常是实际需要的3倍或更多)的原因,即是为了避免手动重复配置行为所带来的复杂性和中断。如果自动精简配置仍然离不开手动设定的,就仍然会保留大量的原有的复杂性。HP3PAR's thin provisioning has many advantages in general. For example, its unit storage unit is 16KB, which is a very fine-grained thin provisioning. Compared with some rougher implementations, 3PAR's thin provisioning Capacity savings are maximized, with more obvious performance effects, and its thin provisioning is fully automated thin provisioning. In traditional storage environments that do not use thin provisioning, some applications are usually configured with a large amount of redundancy The reason for the capacity (often 3 times or more than actually needed) is to avoid the complexity and disruption of manually repeating configuration actions. If thin provisioning is still inseparable from manual settings, it will still retain a lot of original complexity.

但是,HP3PAR的自精简实现上,尤其在回收上,其独特的ASIC芯片协助处理。However, HP3PAR's self-slimming implementation, especially in recycling, is aided by its unique ASIC chip.

EMC与IBM的自精简实现特征,在分配和预警等方面与两家公司软件技术能力出众不无关系,在对POOL和LUN(或者叫Volume)的处理上,都是基于虚拟化技术,在读写性能均衡上有独到之处,但是在空间回收方面,都存在比较大的难题,实现也都是在其他附加软件应用上进行实现,而不是提供通用的回收机制。The self-simplified implementation features of EMC and IBM are not unrelated to the superior software technology capabilities of the two companies in terms of allocation and early warning. The processing of POOL and LUN (or Volume) is based on virtualization technology. There are unique features in writing performance balance, but there are relatively big problems in space recovery, and the implementation is also implemented on other additional software applications instead of providing a general recovery mechanism.

HDS的动态预配置不是完全自动的。仍然需要手动去创建阵列组以及LDEV,并且将其分配给HDP池。在完全自动化的实现方式中,存储系统能够通过合适的RAID保护级别或者性能服务水平自动的创建阵列组或者HDP池卷,并且能实时的把这些卷加入HDP池中,而且整个过程完全不受人为因素的干扰。同时,HDS的动态预配置卷(HDP)以42MB为存储单元。相比3PAR提供16KB的单位存储池,这个大小非常粗糙。Dynamic Provisioning of HDS is not fully automatic. It is still necessary to manually create array groups and LDEVs and assign them to HDP pools. In a fully automated implementation, the storage system can automatically create array groups or HDP pool volumes with appropriate RAID protection levels or performance service levels, and can add these volumes to HDP pools in real time, and the whole process is completely free from human intervention. factor interference. At the same time, the dynamic provisioning volume (HDP) of HDS uses 42MB as the storage unit. Compared with the 16KB unit storage pool provided by 3PAR, this size is very rough.

另外一个方面,大的存储单元分配容量同样意味着需要更少的元数据来描述一个虚拟卷,这时,有助于自动精简配置表现出良好的性能。但是这种粗略的自动精简配置可能会在实际操作中带来一定的威胁,因为自动精简配置分配存储单元的时候不会考虑这部分存储单元的空间是已经格式化的还是初始状态的,因此当应用需要格式化一些已分配出去的存储空间时,过大的存储单元会导致潜在的运行风险。如果这些未格式化的初始状态的存储空间遍布整个卷,他们将消耗更多的存储单元内的空间。On the other hand, large storage unit allocation capacity also means that less metadata is required to describe a virtual volume, which in this case helps thin provisioning to perform well. However, this kind of rough thin provisioning may bring certain threats in actual operation, because when thin provisioning allocates storage units, it does not consider whether the space of these storage units is already formatted or in the initial state, so when When an application needs to format some of the allocated storage space, an oversized storage unit can lead to potential operational risks. If these unformatted initial state storage spaces are spread throughout the volume, they will consume more space in the storage unit.

这些国外厂商中,3PAR的最大区别在于,3PAR的自动精简配置完全构建于底层架构之上,而不是作为一项附加的软件功能提供给存储系统。因而更加透明、也更加自动化。而其他的厂商,比如NetApp、HDS和HP,自动精简配置的方法大都是从RAID组当中创建存储资源池,用户跟以前一样分配LUNs和卷,但实际上系统只是在欺骗硬件。当实际磁盘使用增长,存储管理员必须调整他们卷的分配,这种自动精简配置可能会导致手动操作多一些,元数据处理更加繁杂。Among these foreign manufacturers, the biggest difference of 3PAR is that 3PAR's thin provisioning is completely built on the underlying architecture, rather than provided to the storage system as an additional software function. It is thus more transparent and more automated. Other manufacturers, such as NetApp, HDS and HP, mostly use thin provisioning methods to create storage resource pools from RAID groups. Users allocate LUNs and volumes as before, but in fact the system is just deceiving the hardware. As actual disk usage grows, storage administrators must adjust their volume allocations, and this thin provisioning can lead to more manual operations and more onerous metadata processing.

面对国外厂商的技术垄断以及封锁,本发明提供一种自动精简配置系统架构,该架构不像HP3PAR那样采用专属芯片协助处理,采用虚拟化技术去实现,采用双B+树进行元数据管理、修改存储协议提供动态容量回收功能,并通过修改利用SSD的协议,在操作系统层面提供预警机制,实现自动化自动精简配置,不但具有高性能,并且具有回收功能,存储单元粒度可根据实际应用调控。In the face of technology monopoly and blockade by foreign manufacturers, the present invention provides an automatic thin provisioning system architecture. Unlike HP3PAR, this architecture uses a dedicated chip to assist in processing, but uses virtualization technology to implement it, and uses dual B+ trees for metadata management and modification. The storage protocol provides a dynamic capacity reclamation function. By modifying the SSD protocol, an early warning mechanism is provided at the operating system level to realize automatic thin provisioning. It not only has high performance, but also has a reclamation function. The granularity of storage units can be adjusted according to actual applications.

发明内容Contents of the invention

本发明的目的是提供一种存储系统自动精简配置的方法。The object of the present invention is to provide a method for automatic thin provisioning of a storage system.

本发明的目的是按以下方式实现的,该系统包括5个功能模块,它们分别为:池组织模块1);精简分配模块2);精简回收模块3);动态扩容模块4);预警模块5),各功能模块说明如下:The object of the present invention is achieved in the following manner. The system includes 5 functional modules, which are respectively: pool organization module 1); streamlined allocation module 2); streamlined recovery module 3); dynamic expansion module 4); early warning module 5 ), each functional module is described as follows:

1)池组织模块:1) Pool organization module:

池组织就是对每一个分配的存储单元元数据的记录和管理,本系统中,采取逻辑上管理两个设备:数据设备和元数据设备;Pool organization is to record and manage the metadata of each allocated storage unit. In this system, two devices are logically managed: data device and metadata device;

在池组织模块中,元数据的管理是比较复杂的,因为自动精简配置需要管理数据设备上每一个数据块,而且还要有结构来描述存储池和精简卷,所以需要大量的元数据来管理这些内容,因此,单独分配一个设备来存储元数据的信息,这个设备称为元数据设备,元数据设备用以存储数据设备中存储单元的分配信息以及存储池操作产生的元数据信息,而数据设备,则是按照存储单元粒度进行切割的实际存储设备,该设备中无元数据信息;In the pool organization module, the management of metadata is more complicated, because automatic thin provisioning needs to manage each data block on the data device, and there is also a structure to describe the storage pool and thin volume, so a large amount of metadata is required to manage These contents, therefore, allocate a separate device to store metadata information. This device is called a metadata device. The metadata device is used to store the allocation information of the storage unit in the data device and the metadata information generated by the storage pool operation, while the data The device is the actual storage device divided according to the granularity of the storage unit, and there is no metadata information in the device;

由此可见,池组织模块元数据的管理需要管理两个设备,元数据设备和数据设备,因此,需要针对这两个设备分别管理,在这里就需要一套空间映射space-map的管理机制来记录每个存储单元的映射情况,辅助元数据的管理,同时,这套机制不光可以辅助实现元数据的管理,还能够充当新存储单元的分配器,存储单元的粒度也是在池组织模块中定义的;It can be seen that the management of the metadata of the pool organization module needs to manage two devices, the metadata device and the data device. Therefore, these two devices need to be managed separately. Here, a space-map management mechanism is needed to Record the mapping of each storage unit to assist in the management of metadata. At the same time, this mechanism can not only assist in the management of metadata, but also act as an allocator for new storage units. The granularity of storage units is also defined in the pool organization module of;

元数据管理的空间映射的管理机制包含两种实现方式:The spatial mapping management mechanism of metadata management includes two implementation methods:

管理实际设备——数据设备的实现方式;管理元数据空间——元数据设备的实现方式;在池组织模块中,元数据管理包含这两种实现方式,元数据的管理需要这两种实现方式共同作用,从而实现池组织元数据全面的管理;Manage actual devices—the implementation of data devices; manage metadata space—the implementation of metadata devices; in the pool organization module, metadata management includes these two implementations, and metadata management requires these two implementations Work together to achieve comprehensive management of pool organization metadata;

2)精简分配模块:2) Thin allocation module:

在精简分配模块中,系统使用写时分配技术,物理磁盘容量只有在应用程序写真实数据到逻辑卷时才进行分配,简单理解就是只有在有写请求的时候,才会按需从空闲的资源池中分配新的块给该请求;在精简分配模块中,当有写请求时,针对该请求所对应的存储单元:首先,要查找元数据中保存的元数据信息,这里分为两种情况:In the thin allocation module, the system uses the allocation-on-write technology. The physical disk capacity is only allocated when the application program writes real data to the logical volume. The simple understanding is that only when there is a write request, it will be allocated from idle resources on demand. Allocate a new block in the pool to the request; in the thin allocation module, when there is a write request, for the storage unit corresponding to the request: first, to find the metadata information stored in the metadata, there are two cases here :

a)找到,说明该存储单元已完成映射,因此直接将其映射到逻辑卷;a) Found, indicating that the storage unit has been mapped, so it is directly mapped to the logical volume;

b)未找到,说明该存储单元未完成映射,则需要对该请求进行延迟处理,随后在精简分配主线程中完成该请求的操作;b) Not found, indicating that the storage unit has not been mapped, and the request needs to be delayed, and then the operation of the request should be completed in the thin allocation main thread;

针对未找到的情况,还需后续处理该写请求,主要是分配新的存储单,完成该存储单元的映射,并能在下一次的处理中,因为其已完成映射,所以就能完成这个写请求的操作;For the case that it is not found, the write request needs to be processed later, mainly to allocate a new storage list, complete the mapping of the storage unit, and in the next processing, because the mapping has been completed, the write request can be completed operation;

3)精简回收模块:3) Streamline the recycling module:

文件系统层获悉空间释放情况发通知给底层存储系统,底层存储接收释放通知,丢弃相应的存储单元;The file system layer learns about the space release and sends a notification to the underlying storage system, and the underlying storage receives the release notification and discards the corresponding storage unit;

精简回收包括两个层面:应用程序所在的文件系统层;实际数据所在的存储系统层,而这其中一个重要的问题是文件系统与存储系统的通信机制问题,文件系统不会有精简的“意识”,当一份空间不再被使用时,没有现成的机制去通报这些情况,而底层的存储系统层也不会主动回收空间,除非你“告诉”它,因此就需要设计一套文件系统与存储系统的通信机制;Thin recycling includes two levels: the file system layer where the application program is located; the storage system layer where the actual data is located, and one of the important issues is the communication mechanism between the file system and the storage system. ", when a space is no longer used, there is no ready-made mechanism to report these situations, and the underlying storage system layer will not actively reclaim space unless you "tell" it, so it is necessary to design a file system and The communication mechanism of the storage system;

针对通信机制的问题,要设计文件系统与底层块驱动,通过支持discard来实现文件系统与存储系统的通信,目前,discard支持是为优化linux内核对SSD的支持,而在ext4和xfs文件系统是添加的一套机制;Aiming at the problem of communication mechanism, it is necessary to design the file system and the underlying block driver, and realize the communication between the file system and the storage system by supporting discard. At present, the discard support is to optimize the support of the Linux kernel for SSD, while the ext4 and xfs file systems are A set of mechanisms added;

discard是linux的术语,是为通知存储设备这些扇区不再存储有效数据,其本质就是文件系统通过discard告诉底层的SSD盘哪些扇区可以被删除,使用类似的这种通信机制去修改SCSI协议,能更好的支持自动精简配置的精简回收功能,并且不需要采取第三方工具;Discard is a term in Linux, which is to notify the storage device that these sectors no longer store valid data. The essence is that the file system tells the underlying SSD disk which sectors can be deleted through discard, and uses a similar communication mechanism to modify the SCSI protocol. , can better support the thin reclamation function of automatic thin provisioning, and does not need to use third-party tools;

因此,设计类似discard支持的通信机制,在精简回收模块,当文件系统删除某个文件后,通过discard通知底层存储设备该文件对应的一块区域不再存储有效数据,在底层进行回收;Therefore, a communication mechanism similar to discard support is designed. In the streamlined recovery module, when the file system deletes a file, it notifies the underlying storage device through discard that the area corresponding to the file no longer stores valid data, and recycles at the underlying layer;

4)动态扩容模块:4) Dynamic expansion module:

实现存储池容量的扩充;动态扩容主要逻辑:Realize the expansion of storage pool capacity; the main logic of dynamic expansion:

a)根据上层传入的扩容参数或扩容完的映射表信息,计算扩容完的存储单元的数量;a) Calculate the number of expanded storage units according to the expansion parameters passed in from the upper layer or the expanded mapping table information;

b)使用空间映射机制完成扩容的实现,即实现存储池,也就是池组织模块1)中数据设备的扩容;b) Use the space mapping mechanism to complete the realization of expansion, that is, to realize the storage pool, that is, the expansion of the data device in the pool organization module 1);

c)计算底层存储池需要扩容的存储单元的数量;c) Calculate the number of storage units that need to be expanded in the underlying storage pool;

循环逐次处理每个存储单元的初始化信息,这里主要是在数据设备元数据中分配新的存储单元记录数据块映射的扩容信息;然后保存扩容的信息;The initialization information of each storage unit is processed sequentially in a loop. Here, a new storage unit is allocated in the metadata of the data device to record the expansion information of the data block mapping; and then the expansion information is saved;

扩容循环结束后,修改数据设备的容量大小;After the expansion cycle ends, modify the capacity of the data device;

提交存储池的元数据刷新至磁盘;Submit the metadata of the storage pool to be refreshed to disk;

5)空间预警模块:5) Space early warning module:

底层或内核态的检测预警信息,有则通知用户层或用户态,用户层或用户态的监听预警信息;The detection and early warning information of the bottom layer or kernel state, if there is, the user layer or user state is notified, and the monitoring early warning information of the user layer or user state;

自动精简配置空间预警模块主要实现当空间容量已经达到预警阀值,由底层或内核态向用户层或用户态报警;The automatic thin provisioning space warning module mainly realizes that when the space capacity has reached the warning threshold, the bottom layer or kernel state sends an alarm to the user layer or user state;

空间预警模块要分为两部分,分别为内核态和用户态,需要实现内核态和用户态的通信;空间预警模块主要逻辑为:The space early warning module is divided into two parts, namely the kernel state and the user state, and the communication between the kernel state and the user state needs to be realized; the main logic of the space early warning module is:

1)在内核态,检测空间容量是否达到预警阀值;1) In the kernel state, check whether the space capacity reaches the warning threshold;

2)当达到预警阀值,向用户态发送预警信息;2) When the warning threshold is reached, send warning information to the user state;

3)用户态监测预警信息,接收预警信息,通知管理员;3) User mode monitors early warning information, receives early warning information, and notifies the administrator;

但是如何将内核态的信息,捕捉后,通知用户态,是空间预警模块的最基本技术点。However, how to capture the information in the kernel state and notify the user state is the most basic technical point of the space warning module.

自动精简配置技术扩展了存储管理功能,虽然实际分配的物理容量小,但可以为操作系统动态提供超大容量的虚拟存储空间。随着应用写入的数据越来越多,实际存储空间也可以及时扩展,而无需手动扩展。换句话说,自动精简配置提供的是“运行时空间”,可以显著减少已分配但是未使用的存储空间。Thin provisioning technology expands the storage management function. Although the actual allocated physical capacity is small, it can dynamically provide super large capacity virtual storage space for the operating system. As the application writes more and more data, the actual storage space can also be expanded in time without manual expansion. In other words, thin provisioning provides "runtime space" that can significantly reduce allocated but unused storage space.

本发明的有益效果是:采用该架构不需像HP3PAR那样采用专属芯片协助处理分配与回收,采用虚拟化技术去实现,采用双B+树进行元数据管理、修改存储协议提供动态容量回收功能,并通过修改利用SSD的协议,在操作系统层面提供预警机制,实现自动化自动精简配置,不但具有高性能,并且具有回收功能,存储单元粒度可根据实际应用调控。The beneficial effects of the present invention are: adopting this architecture does not need to use a dedicated chip to assist in processing allocation and recovery like HP3PAR, and adopts virtualization technology to realize it, adopts double B+ trees for metadata management, and modifies storage protocols to provide dynamic capacity recovery functions, and By modifying the protocol of using SSD, an early warning mechanism is provided at the operating system level to realize automatic thin provisioning, which not only has high performance, but also has a recovery function, and the granularity of storage units can be adjusted according to actual applications.

自动精简配置存储方案能够给操作系统提供大容量的虚拟驱动器。操作系统在分配磁盘空间时认为有这么大的磁盘空间,因此随着企业数据量的增加,企业可以考虑情况分步增加存储设备,不需要对操作系统进行调整。这就相当于是实现了存储设备的热插拔功能,可以提高磁盘设备的存储效率和灵活性。Thin provisioning storage solutions can provide large-capacity virtual drives to the operating system. The operating system thinks that there is such a large amount of disk space when allocating disk space. Therefore, as the amount of enterprise data increases, the enterprise can consider the situation and increase storage devices step by step without adjusting the operating system. This is equivalent to realizing the hot swap function of the storage device, which can improve the storage efficiency and flexibility of the disk device.

附图说明Description of drawings

图1是自动精简配置示意图;Figure 1 is a schematic diagram of thin provisioning;

图2是架构主要模块示意图;Figure 2 is a schematic diagram of the main modules of the architecture;

图3是系统架构总体流程图;Figure 3 is the overall flow chart of the system architecture;

图4是空间映射信息区分处理;Fig. 4 is the differentiated processing of spatial mapping information;

图5是discard通信机制示意图;Figure 5 is a schematic diagram of the discard communication mechanism;

图6是普通逻辑卷扩容与本发明的扩容流程对比;Fig. 6 is a comparison between the ordinary logical volume expansion and the expansion process of the present invention;

图7是空间预警模块主要逻辑示意图。Figure 7 is a schematic diagram of the main logic of the space early warning module.

具体实施方式detailed description

参照说明书附图对本发明的方法作以下详细地说明。The method of the present invention is described in detail below with reference to the accompanying drawings.

本发明主要的需解决的技术包括:The main technologies to be solved in the present invention include:

1)虚拟地址映射1) Virtual address mapping

虚拟地址空间对应用主机的映射机制,从而在不中断数据访问情况下,动态提升存储系统使用空间;The mapping mechanism of the virtual address space to the application host can dynamically increase the space used by the storage system without interrupting data access;

2)容量再回收功能2) Capacity recycle function

自动精简配置能够动态回收已分配的空间,但是如何回收已分配出去的物理空间,并将空间重新分配再使用是自动精简配置的难题;Thin provisioning can dynamically reclaim the allocated space, but how to reclaim the allocated physical space and reallocate the space for reuse is a difficult problem of thin provisioning;

3)虚拟存储空间管理3) Virtual storage space management

动态扩展的虚拟存储池需要保持存储访问性能的高效,因此需要研究性能优化的数据分布算法、轻量级的数据迁移算法和高效的数据平衡算法,最大限度地降低存储池的扩展并行访问造成的影响,降低数据移动的次数,实现系统访问的高度并行;Dynamically expanding virtual storage pools need to maintain efficient storage access performance, so it is necessary to study performance-optimized data distribution algorithms, lightweight data migration algorithms, and efficient data balancing algorithms to minimize the impact of parallel access caused by the expansion of storage pools. impact, reduce the number of data movements, and achieve a high degree of parallelism in system access;

4)空间分配的动态报警机制4) Dynamic alarm mechanism for space allocation

需综合利用多种因素,包括应用程序的运行情况、存储空间的消耗情况、存储系统的组织形式及初始化时间、文件系统格式化时所消耗的代价等,自适应地改变自动精简配置的报警机制,避免固定分配报警机制或者分配限制机制所面临的灵活性和安全性不足的问题。It is necessary to make comprehensive use of various factors, including the running status of the application program, the consumption of storage space, the organization form and initialization time of the storage system, the cost consumed when the file system is formatted, etc., to adaptively change the alarm mechanism of thin provisioning , to avoid the problem of insufficient flexibility and security faced by the fixed allocation alarm mechanism or the allocation restriction mechanism.

该架构完全基于软件实现自动精简配置,不需要借助特殊的专属芯片,并支持动态回收和存储单元粒度动态可调,并提供空间预警。The architecture is completely based on software to realize automatic thin provisioning, without the need for special dedicated chips, and supports dynamic reclamation and dynamic adjustment of storage unit granularity, and provides space warning.

该架构主要由5个功能模块,如图所示,它们分别为:池组织模块、精简分配模块、精简回收模块、动态扩容模块和预警模块组成。The architecture is mainly composed of five functional modules, as shown in the figure, they are: pool organization module, thin allocation module, thin recovery module, dynamic expansion module and early warning module.

从该架构中可以看出,只有扩容模块、精简分配模块与精简回收模块直接可以与用户进行交互,用户可以发起与这3个模块相关的指令,进行相应的自精简控制,而对于预警模块,用户只有被动接收容量是否达到预警,可以查询,但是不可控制该模块的工作。对于池组织模块,属于该架构核心模块,对用户透明。It can be seen from the architecture that only the capacity expansion module, the streamlined distribution module and the streamlined recovery module can directly interact with the user, and the user can initiate commands related to these three modules to perform corresponding self-simplification control. For the early warning module, The user can only query whether the passive receiving capacity reaches the early warning, but cannot control the work of the module. For the pool organization module, it belongs to the core module of the architecture and is transparent to users.

如图3所示,对于来自用户的请求,该架构首先判断该请求是否涉及回收,如果涉及回收,则查询池组织模块,调用精简回收模块的处理流程,回收容量并更新池组织的元数据;如果该请求不涉及回收,为正常的读写请求,则区分读写,如是读请求,则通过精简分配模块查询在池组织中的具体存储单元,进行处理,如果是写请求,则会判断是否超出预警模块的设置,如果超出,预警模块则会直接向用户发出告警,如果不超出预警阀值,则提交精简分配处理,精简分配模块则从池组织模块中分配具体的存储单元进行处理。As shown in Figure 3, for a request from a user, the architecture first judges whether the request involves recycling. If it involves recycling, it queries the pool organization module, invokes the processing flow of the streamlined recycling module, reclaims the capacity, and updates the metadata of the pool organization; If the request does not involve recovery and is a normal read and write request, read and write will be distinguished. If it is a read request, the thin allocation module will query the specific storage unit in the pool organization and process it. If it is a write request, it will judge whether it is If the setting of the early warning module is exceeded, the early warning module will directly send an alarm to the user. If it does not exceed the early warning threshold, it will submit thin allocation processing, and the thin allocation module will allocate specific storage units from the pool organization module for processing.

各功能模块说明如下:Each functional module is described as follows:

1)池组织模块:1) Pool organization module:

池组织就是对每一个分配的存储单元(或记为block)元数据的记录和管理。本架构中,采取逻辑上管理两个设备:数据设备和元数据设备;Pool organization is the recording and management of metadata for each allocated storage unit (or block). In this architecture, two devices are logically managed: data device and metadata device;

在池组织模块中,元数据的管理是比较复杂的,因为自动精简配置需要管理数据设备上每一个数据块,而且还要有结构来描述存储池和精简卷,所以需要大量的元数据来管理这些内容。因此,本发明单独分配一个设备来存储元数据的信息,这个设备称为元数据设备。元数据设备用以存储数据设备中存储单元的分配信息以及存储池操作产生的元数据信息,而数据设备,则是按照存储单元粒度进行切割的实际存储设备,该设备中无元数据信息;In the pool organization module, the management of metadata is more complicated, because automatic thin provisioning needs to manage each data block on the data device, and there is also a structure to describe the storage pool and thin volume, so a large amount of metadata is required to manage these contents. Therefore, the present invention separately allocates a device to store metadata information, and this device is called a metadata device. The metadata device is used to store the allocation information of the storage units in the data device and the metadata information generated by the storage pool operation, while the data device is the actual storage device that is divided according to the granularity of the storage unit, and there is no metadata information in the device;

由此可见,池组织模块元数据的管理需要管理两个设备(元数据设备和数据设备),因此需要针对这两个设备分别管理。在这里就需要一套空间映射(space-map)的管理机制来记录每个存储单元的映射情况,辅助元数据的管理。同时,这套机制不光可以辅助实现元数据的管理,还能够充当新存储单元的分配器。存储单元的粒度也是在池组织模块中定义的;It can be seen that the management of the metadata of the pool organization module needs to manage two devices (metadata device and data device), so the two devices need to be managed separately. Here, a space-map management mechanism is needed to record the mapping of each storage unit and assist in the management of metadata. At the same time, this mechanism can not only assist in the management of metadata, but also act as an allocator for new storage units. The granularity of the storage unit is also defined in the pool organization module;

元数据管理的空间映射的管理机制包含两种实现方式:The spatial mapping management mechanism of metadata management includes two implementation methods:

管理实际设备——数据设备的实现方式;管理元数据空间——元数据设备的实现方式;Manage the actual device - the implementation of the data device; manage the metadata space - the implementation of the metadata device;

在池组织模块中,元数据管理包含这两种实现方式,元数据的管理需要这两种实现方式共同作用,从而实现池组织元数据全面的管理。图4简单说明在磁盘上的逻辑示意;In the pool organization module, metadata management includes these two implementation methods, and the management of metadata requires the joint action of these two implementation methods, so as to realize the comprehensive management of pool organization metadata. Fig. 4 simply illustrates the logic diagram on the disk;

2)精简分配模块:2) Thin allocation module:

在精简分配模块中,本发明使用写时分配技术,物理磁盘容量只有在应用程序写真实数据到逻辑卷时才进行分配。简单理解就是只有在有写请求的时候,才会按需从空闲的资源池中分配新的块给该请求;In the thin allocation module, the present invention uses the allocation-on-write technology, and the capacity of the physical disk is only allocated when the application program writes real data to the logical volume. A simple understanding is that only when there is a write request, new blocks will be allocated to the request from the free resource pool on demand;

在精简分配模块中,当有写请求时,针对该请求所对应的存储单元:In the thin allocation module, when there is a write request, for the storage unit corresponding to the request:

首先,要查找元数据中保存的元数据信息,这里分为两种情况:First of all, to find the metadata information saved in the metadata, there are two situations:

a)找到,说明该存储单元已完成映射,因此直接将其映射到逻辑卷;a) Found, indicating that the storage unit has been mapped, so it is directly mapped to the logical volume;

b)未找到,说明该存储单元未完成映射,则需要对该请求进行延迟处理,随后在精简分配主线程中完成该请求的操作;b) Not found, indicating that the storage unit has not been mapped, and the request needs to be delayed, and then the operation of the request should be completed in the thin allocation main thread;

针对未找到的情况,还需后续处理该写请求,主要是分配新的存储单,完成该存储单元的映射,并能在下一次的处理中,因为其已完成映射,所以就可以完成这个写请求的操作;For the case that it is not found, the write request needs to be processed later, mainly to allocate a new storage unit, complete the mapping of the storage unit, and in the next processing, because the mapping has been completed, the write request can be completed operation;

3)精简回收模块:3) Streamline the recycling module:

文件系统层获悉空间释放情况发通知给底层存储系统。底层存储接收释放通知,丢弃相应的存储单元;The file system layer notifies the underlying storage system when it learns about the release of space. The underlying storage receives the release notification and discards the corresponding storage unit;

精简回收包括两个层面:应用程序所在的文件系统层;实际数据所在的存储系统层。而这其中一个重要的问题是文件系统与存储系统的通信机制问题。文件系统不会有精简的“意识”,当一份空间不再被使用时,没有现成的机制去通报这些情况。而底层的存储系统层也不会主动回收空间,除非你“告诉”它。因此就需要设计一套文件系统与存储系统的通信机制;Thin reclamation includes two levels: the file system layer where the application program resides; and the storage system layer where the actual data resides. One of the important issues is the communication mechanism between the file system and the storage system. The file system will not be "aware" of compaction, and there is no mechanism in place to notify when a space is no longer in use. And the underlying storage system layer will not actively reclaim space unless you "tell" it. Therefore, it is necessary to design a communication mechanism between the file system and the storage system;

针对通信机制的问题,本发明架构中,设计文件系统与底层块驱动通过支持discard来实现文件系统与存储系统的通信。目前,discard支持是为优化linux内核对SSD的支持,而在ext4和xfs文件系统添加的一套机制;Aiming at the problem of the communication mechanism, in the framework of the present invention, the file system and the underlying block driver are designed to realize the communication between the file system and the storage system by supporting discard. Currently, discard support is a set of mechanisms added to the ext4 and xfs file systems to optimize the Linux kernel's support for SSDs;

discard是linux的术语,是为通知存储设备这些扇区不再存储有效数据。其本质就是文件系统通过discard告诉底层的SSD盘哪些扇区可以被删除,使用类似的这种通信机制去修改SCSI协议,可以更好的支持自动精简配置的精简回收功能,并且不需要采取第三方工具;Discard is a term in Linux, which is to notify the storage device that these sectors no longer store valid data. Its essence is that the file system tells the underlying SSD disk which sectors can be deleted through discard. Using a similar communication mechanism to modify the SCSI protocol can better support the thin recovery function of thin provisioning, and does not need to use a third-party tool;

因此,本发明设计类似discard支持的通信机制,在精简回收模块,当文件系统删除某个文件后,通过discard通知底层存储设备该文件对应的一块区域不再存储有效数据,可以在底层进行回收;Therefore, the present invention designs a communication mechanism similar to the support of discard. In the streamlined recovery module, when the file system deletes a certain file, the underlying storage device is notified through discard that the area corresponding to the file no longer stores valid data, and can be recycled at the bottom layer;

4)动态扩容模块:4) Dynamic expansion module:

实现存储池容量的扩充;Realize the expansion of storage pool capacity;

动态扩容主要逻辑:The main logic of dynamic expansion:

a)根据上层传入的扩容参数(扩容完的映射表信息),计算扩容完的存储单元的数量;a) Calculate the number of storage units that have been expanded according to the expansion parameters (mapping table information after expansion) passed in from the upper layer;

b)使用空间映射机制完成扩容的实现,即实现存储池,也就是模块1中数据设备的扩容;b) Use the space mapping mechanism to complete the realization of expansion, that is, to realize the storage pool, that is, the expansion of the data device in module 1;

c)计算底层存储池需要扩容的存储单元的数量;c) Calculate the number of storage units that need to be expanded in the underlying storage pool;

循环逐次处理每个存储单元的初始化信息。这里主要是在数据设备元数据中分配新的存储单元记录数据块映射的扩容信息;然后保存扩容的信息;The loop processes the initialization information of each storage unit one by one. This is mainly to allocate a new storage unit in the metadata of the data device to record the expansion information of the data block mapping; and then save the expansion information;

扩容循环结束后,修改数据设备的容量大小;After the expansion cycle ends, modify the capacity of the data device;

提交存储池的元数据刷新至磁盘;Submit the metadata of the storage pool to be refreshed to disk;

图6是普通逻辑卷扩容示意图与本发明的扩容流程对比:Fig. 6 is a schematic diagram of ordinary logical volume expansion and a comparison with the expansion process of the present invention:

5)空间预警模块:5) Space early warning module:

底层(内核态)检测预警信息,有则通知用户层(用户态)。用户层(用户态)监听预警信息;The bottom layer (kernel state) detects early warning information, and notifies the user layer (user state) if there is any. The user layer (user state) monitors early warning information;

自动精简配置空间预警模块主要实现当空间容量已经达到预警阀值,由底层(内核态)向用户层(用户态)报警。The thin-provisioning space warning module mainly realizes that when the space capacity has reached the warning threshold, the bottom layer (kernel state) sends an alarm to the user layer (user state).

空间预警模块要分为两部分,分别为内核态和用户态,需要实现内核态和用户态的通信,空间预警模块主要逻辑为:The space early warning module is divided into two parts, namely the kernel state and the user state, and the communication between the kernel state and the user state needs to be realized. The main logic of the space early warning module is:

1)在内核态,检测空间容量是否达到预警阀值;1) In the kernel state, check whether the space capacity reaches the warning threshold;

2)当达到预警阀值,向用户态发送预警信息;2) When the warning threshold is reached, send warning information to the user state;

3)用户态监测预警信息,接收预警信息,通知管理员;3) User mode monitors early warning information, receives early warning information, and notifies the administrator;

主要逻辑示意图如图7所示:The main logic diagram is shown in Figure 7:

空间预警模块的主要逻辑如上图所示,但是如何将内核态的信息,捕捉后,通知用户态,是空间预警模块的最基本技术点。The main logic of the space early warning module is shown in the figure above, but how to capture the information in the kernel state and notify the user state is the most basic technical point of the space early warning module.

除说明书所述的技术特征外,均为本专业技术人员的已知技术。Except for the technical features described in the instructions, all are known technologies by those skilled in the art.

Claims (1)

CN201210453207.5A2012-11-132012-11-13A kind of store the method that system simplifies configuration automaticallyActiveCN102968279B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201210453207.5ACN102968279B (en)2012-11-132012-11-13A kind of store the method that system simplifies configuration automatically

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201210453207.5ACN102968279B (en)2012-11-132012-11-13A kind of store the method that system simplifies configuration automatically

Publications (2)

Publication NumberPublication Date
CN102968279A CN102968279A (en)2013-03-13
CN102968279Btrue CN102968279B (en)2016-06-08

Family

ID=47798444

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201210453207.5AActiveCN102968279B (en)2012-11-132012-11-13A kind of store the method that system simplifies configuration automatically

Country Status (1)

CountryLink
CN (1)CN102968279B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9003135B2 (en)2013-01-152015-04-07International Business Machines CorporationEfficient allocation and reclamation of thin-provisioned storage
CN103345417A (en)*2013-06-062013-10-09华为技术有限公司Method and device for recovering storage space
CN103631537B (en)*2013-11-292016-09-14华为技术有限公司The management method of a kind of virtual disk and device
CN103744792B (en)*2014-01-102017-10-31浪潮电子信息产业股份有限公司It is a kind of to realize the system that the automatic simplify configuration of storage system is fully distributed
CN103838524B (en)*2014-03-132017-01-18深圳市迪菲特科技股份有限公司Implement method and storage system of self-adaption thin provisioning
CN104035729B (en)*2014-05-222017-02-15中国科学院计算技术研究所Block device thin-provisioning method for log mapping
CN103973497A (en)*2014-05-232014-08-06浪潮电子信息产业股份有限公司Method and device for realizing multi-path concurrent storage based on high-density micro server
CN104820575B (en)*2015-04-272017-08-15西北工业大学Realize the method that storage system is simplified automatically
US9697079B2 (en)2015-07-132017-07-04International Business Machines CorporationProtecting data integrity in de-duplicated storage environments in combination with software defined native raid
CN105159616A (en)*2015-09-112015-12-16浪潮(北京)电子信息产业有限公司Disk space management method and device
US9846538B2 (en)2015-12-072017-12-19International Business Machines CorporationData integrity and acceleration in compressed storage environments in combination with software defined native RAID
CN106919342A (en)*2015-12-282017-07-04成都华为技术有限公司Storage resource distribution method and device based on automatic simplify configuration
CN106202350A (en)*2016-07-052016-12-07浪潮(北京)电子信息产业有限公司A kind of distributed file system simplifies the method and system of configuration automatically
CN107239283B (en)*2017-05-312020-11-20苏州浪潮智能科技有限公司 A storage system management software adaptation method and device
CN107506142A (en)*2017-08-182017-12-22郑州云海信息技术有限公司The distribution method and device of a kind of volume space
CN108073723A (en)*2018-01-032018-05-25郑州云海信息技术有限公司A kind of file in distributed type assemblies storage is from compressing method and equipment
CN109358976A (en)*2018-10-242019-02-19郑州云海信息技术有限公司 A self-reduction characteristic testing method, device, equipment and storage medium
CN109445701B (en)*2018-10-262021-02-23北京计算机技术及应用研究所Automatic simplifying configuration synchronization method for double-control disk array
CN110007868B (en)*2019-04-122022-07-22苏州浪潮智能科技有限公司 A kind of SSD disk metadata storage method, device, controller and storage medium
CN110647294B (en)*2019-09-092022-03-25Oppo广东移动通信有限公司Storage block recovery method and device, storage medium and electronic equipment
CN111459880B (en)*2020-03-272023-01-06苏州浪潮智能科技有限公司 A method and system for modifying the total space of a self-thinning pre-allocated space compatible system

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101013996A (en)*2007-02-162007-08-08华为技术有限公司Block storage service method, block storage service system and block storage service client
US7747835B1 (en)*2005-06-102010-06-29American Megatrends, Inc.Method, system, and apparatus for expanding storage capacity in a data storage system
CN102591789A (en)*2011-12-262012-07-18成都市华为赛门铁克科技有限公司Storage space recovery method and storage space recovery device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7162600B2 (en)*2005-03-292007-01-09Hitachi, Ltd.Data copying method and apparatus in a thin provisioned system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7747835B1 (en)*2005-06-102010-06-29American Megatrends, Inc.Method, system, and apparatus for expanding storage capacity in a data storage system
CN101013996A (en)*2007-02-162007-08-08华为技术有限公司Block storage service method, block storage service system and block storage service client
CN102591789A (en)*2011-12-262012-07-18成都市华为赛门铁克科技有限公司Storage space recovery method and storage space recovery device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
存储的自动精简配置技术应用研究;邱红飞;《电信科学》;20101115(第11期);第12-17页*

Also Published As

Publication numberPublication date
CN102968279A (en)2013-03-13

Similar Documents

PublicationPublication DateTitle
CN102968279B (en)A kind of store the method that system simplifies configuration automatically
US11093177B2 (en)Virtualized OCSSDs spanning physical OCSSD channels
JP6890401B2 (en) Multiple mode storage management device, multiple mode storage device, and its selection latent exposure (SUE) mapping operation method
CN104317742B (en)Thin provisioning method for optimizing space management
US8984221B2 (en)Method for assigning storage area and computer system using the same
CN111158587B (en)Distributed storage system based on storage pool virtualization management and data read-write method
CN102859499B (en)Computer system and storage controlling method thereof
US20110066823A1 (en)Computer system performing capacity virtualization based on thin provisioning technology in both storage system and server computer
KR20170008153A (en)A heuristic interface for enabling a computer device to utilize data property-based data placement inside a nonvolatile memory device
JP2011170833A (en)Method and apparatus to manage object based tier
CN103797770A (en)Method and system for sharing storage resources
US20130254500A1 (en)Storage apparatus, computer system, and control method for storage apparatus
WO2013004136A1 (en)Distributed storage method, device and system
US11797200B2 (en)Method for managing namespaces in a storage device and storage device employing the same
CN102207830A (en)Cache dynamic allocation management method and device
CN105988727B (en) RAID-based storage method and storage device
CN104731517A (en)Method and device for allocating capacity of storage pool
CN104536903A (en)Mixed storage method and system for conducting classified storage according to data attributes
US8447947B2 (en)Method and interface for allocating storage capacities to plural pools
US11561695B1 (en)Using drive compression in uncompressed tier
CN104238955B (en)A kind of device and method of storage resource virtualization distribution according to need
JP5597266B2 (en) Storage system
US11144445B1 (en)Use of compression domains that are more granular than storage allocation units
CN117348789A (en)Data access method, storage device, hard disk, storage system and storage medium
CN106293519A (en)A kind of disk storage management method

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp