CN104123171A

Movatterモバイル変換

Info

Publication number: CN104123171A
Application number: CN201410256198.XA
Authority: CN
Inventors: 左起同; 王备; 陈建海; 何钦铭; 黄步添
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-06-10
Filing date: 2014-06-10
Publication date: 2014-10-29
Anticipated expiration: 2034-06-10
Also published as: CN104123171B

Abstract

本发明公开了一种基于NUMA架构的虚拟机迁移方法及系统，该虚拟机迁移方法包括获取待迁移虚拟机的配置信息，并根据待迁移虚拟机已使用的内存页、虚拟机寄存器中的内容和I/O设备状态生成临时文件；关闭待迁移虚拟机；获取目标主机上所有物理节点的节点信息，并根据节点信息和迁移指令判断目标主机是否存在目标节点：若存在，则根据所述的临时文件和配置信息，将待迁移虚拟机迁移至目标节点；否则，发送错误报告。本发明在迁移的基础上保证了虚拟机的NUMA拓扑结构的虚拟节点与物理机的NUMA节点一一对应，使得虚拟机的性能下降较少，且根据目标主机的硬件情况对虚拟机的NUMA拓扑结构进行改变，使得虚拟机能够适应更加多样的硬件环境。

The invention discloses a virtual machine migration method and system based on NUMA architecture. The virtual machine migration method includes obtaining configuration information of a virtual machine to be migrated, and according to the memory pages used by the virtual machine to be migrated and the contents of the virtual machine register. Generate a temporary file with the I/O device status; close the virtual machine to be migrated; obtain the node information of all physical nodes on the target host, and judge whether there is a target node in the target host according to the node information and migration instructions: if it exists, then according to the described Temporary files and configuration information, migrate the virtual machine to be migrated to the target node; otherwise, send an error report. On the basis of migration, the present invention ensures that the virtual nodes of the NUMA topology of the virtual machine correspond to the NUMA nodes of the physical machine one by one, so that the performance of the virtual machine decreases less, and the NUMA topology of the virtual machine is adjusted according to the hardware condition of the target host. The structure is changed so that the virtual machine can adapt to more diverse hardware environments.

Description

Translated fromChinese

基于NUMA架构的虚拟机迁移方法及系统Virtual machine migration method and system based on NUMA architecture

技术领域technical field

本发明涉及计算机技术领域，特别涉及一种基于NUMA架构的虚拟机迁移方法及系统。 The invention relates to the field of computer technology, in particular to a virtual machine migration method and system based on NUMA architecture. the

背景技术Background technique

在计算机技术中，虚拟化技术是将物理的计算资源，诸如CPU、内存、网络和存储器等加以抽象，使得用户能够更合理地使用这些资源的一项技术。现已被广泛运用于互联网服务、云计算等一些场景。 In computer technology, virtualization technology is a technology that abstracts physical computing resources, such as CPU, memory, network and storage, so that users can use these resources more reasonably. It has been widely used in some scenarios such as Internet services and cloud computing. the

早期的计算机采用SMP结构(Symmetric Multi-Processing，对称多处理结构)，即是所有的内存为所有的CPU共享，每个CPU都可以任意地访问内存的任意地址。然而，随着硬件的发展，特别是随着CPU核心数量的增多，现有的SMP结构容易带来访存冲突，也就是说，不同的CPU同时访问同一块内存导致总线冲突，访问请求被延迟这一现象越来明显。于是，在高性能计算中，多核心的计算机往往不采用SMP结构而采用NUMA架构(Non Uniform Memory Access Architecture，非统一内存访问)：机器的CPU和存储器可被分为若干个节点，每个节点由若干个CPU和一块存储单元组成。CPU访问本地的存储器成为本地访存，访问远端的存储器成为远端访存，远端访存通过CPU和CPU直接的电路通信。这样，便避免了总线冲突带来的性能开销。然而，却带来了内存的访存速度不平衡的问题：远端访存的开销要远大于本地范存的开销。这也直接造成了在NUMA架构下常规的虚拟化技术的性能问题。 Early computers adopted the SMP structure (Symmetric Multi-Processing, symmetric multi-processing structure), that is, all the memory is shared by all CPUs, and each CPU can arbitrarily access any address of the memory. However, with the development of hardware, especially as the number of CPU cores increases, the existing SMP structure is prone to memory access conflicts, that is, different CPUs access the same memory at the same time, resulting in bus conflicts, and access requests are delayed. A phenomenon is becoming more and more obvious. Therefore, in high-performance computing, multi-core computers often do not use the SMP structure but the NUMA architecture (Non Uniform Memory Access Architecture, non-uniform memory access): the CPU and memory of the machine can be divided into several nodes, each node It consists of several CPUs and a storage unit. When the CPU accesses the local memory, it becomes the local memory access, and when it accesses the remote memory, it becomes the remote memory access. The remote memory access communicates directly through the CPU and the CPU. In this way, performance overhead caused by bus conflicts is avoided. However, it brings about the problem of unbalanced memory access speed: the overhead of remote memory access is much greater than the overhead of local memory. This also directly caused performance problems of conventional virtualization technology under NUMA architecture. the

由于虚拟机监视器需要把底层硬件为虚拟机隐藏，虚拟机得到的存储器通常只是一块连续的线性地址，并不了解其具体位于哪个节点。其得到的VCPU(虚拟CPU)也并没有标识当前运行在哪一个物理节点上。因此，物理机给虚拟机呈现的就是一个无差别的SMP体系结构。虽然虚拟机内部的运行的操作系统对NUMA体系结构有着一定的优化，但由于此类虚拟机对实际物理拓扑缺乏明确的认识，使得这些优化毫无用武之地。 Since the virtual machine monitor needs to hide the underlying hardware from the virtual machine, the memory obtained by the virtual machine is usually only a continuous linear address, and it does not know which node it is located in. The obtained VCPU (virtual CPU) does not identify which physical node it is currently running on. Therefore, what the physical machine presents to the virtual machine is an indiscriminate SMP architecture. Although the operating system running inside the virtual machine has certain optimizations for the NUMA architecture, these optimizations are useless because such virtual machines lack a clear understanding of the actual physical topology. the

VNUMA(虚拟非统一内存访问)技术便是为了解决此类问题而诞生的一项技术。其在创建虚拟机之初通过更改虚拟机的APIC表中的SRAT表和SLIT表，从而创建一个可以呈现NUMA体系结构的虚拟机。正虚拟机由若干个虚拟节点构成，每一个虚拟节点都和一个物理节点严格绑定，即每一个虚拟节点的VCPU只允许运行在指定物理节点的CPU之上。每一个虚拟节点需要分配的内存只能向固定物理节点申请。这样，虚拟机的虚拟NUMA拓扑就严格与物理NUMA结构相一致。虚拟机上运行的操作系统只需根据呈现给它的虚拟NUMA拓扑结构就可以对远端访存问题进行优化。 VNUMA (Virtual Non-Uniform Memory Access) technology is a technology born to solve such problems. It creates a virtual machine that can present the NUMA architecture by changing the SRAT table and SLIT table in the APIC table of the virtual machine at the beginning of creating the virtual machine. A virtual machine is composed of several virtual nodes, and each virtual node is strictly bound to a physical node, that is, the VCPU of each virtual node is only allowed to run on the CPU of the specified physical node. The memory that each virtual node needs to allocate can only apply to a fixed physical node. In this way, the virtual NUMA topology of the virtual machine is strictly consistent with the physical NUMA structure. The operating system running on the virtual machine can optimize the remote memory access problem only according to the virtual NUMA topology presented to it. the

这项技术也会带来一些新问题：由于虚拟机的虚拟节点都与物理节点严格绑定。要改变虚拟机的虚拟机拓扑相当困难。然而，在一些应用场景，比如虚拟机节能，需要关闭一些物理CPU，乃至物理机。如果虚拟机的虚拟节点数大于节能策略计划使用的物理节点数，节能策略就不能很好的执行，因为强制把多出来的虚拟节点上的VCPU迁移到其它物理节点上就必然会破坏虚拟机原有的VNUMA拓扑结构。此时需要对虚拟机的VNUMA拓扑结构进行相应的更新。现有的VNUMA拓扑结构方法是通过动态改变虚拟机的SRAT表和SLIT表并对内存进行动态迁移来实现这一功能的。然而，这种方法也有着明显缺点：改变后的虚拟机只能和改变前的虚拟机处在同一台物理机上，如果用户策略要将其转移到另一台物理机上则无法通过这种技术解决。此外，现有的迁移技术是以节点为单位对虚拟机进行迁移，无法以页为单位根据当前负载灵活地对虚拟机内存进行放置，使得负载均衡的效果受到制约。 This technology will also bring some new problems: because the virtual nodes of the virtual machine are strictly bound to the physical nodes. It is quite difficult to change the virtual machine topology of a virtual machine. However, in some application scenarios, such as virtual machine energy saving, it is necessary to turn off some physical CPUs, or even physical machines. If the number of virtual nodes of the virtual machine is greater than the number of physical nodes planned by the energy-saving policy, the energy-saving policy cannot be implemented well, because forcing the VCPUs on the extra virtual nodes to be migrated to other physical nodes will inevitably destroy the original state of the virtual machine. There are VNUMA topologies. At this time, the VNUMA topology of the virtual machine needs to be updated accordingly. The existing VNUMA topology method realizes this function by dynamically changing the SRAT table and SLIT table of the virtual machine and dynamically migrating the memory. However, this method also has obvious disadvantages: the virtual machine after the change can only be on the same physical machine as the virtual machine before the change, and if the user policy wants to transfer it to another physical machine, it cannot be solved by this technology. . In addition, the existing migration technology migrates virtual machines in units of nodes, and cannot flexibly place memory of virtual machines in units of pages according to the current load, which restricts the effect of load balancing. the

发明内容Contents of the invention

针对现有技术的不足，本发明提出的一种基于NUMA架构的虚拟机迁移方法及系统。 Aiming at the deficiencies of the prior art, the present invention proposes a virtual machine migration method and system based on NUMA architecture. the

一种基于NUMA架构的虚拟机迁移方法，包括： A virtual machine migration method based on NUMA architecture, comprising:

(1)获取待迁移虚拟机的配置信息，并根据待迁移虚拟机已使用的内存页、虚拟机寄存器中的内容和I/O设备状态生成临时文件； (1) Obtain the configuration information of the virtual machine to be migrated, and generate a temporary file according to the memory pages used by the virtual machine to be migrated, the content in the virtual machine register and the state of the I/O device;

(2)关闭待迁移虚拟机； (2) Close the virtual machine to be migrated;

(3)获取目标主机上所有物理节点的节点信息，并根据所述的节点信息和迁移指令判断目标主机是否存在目标节点： (3) Obtain the node information of all physical nodes on the target host, and judge whether there is a target node in the target host according to the node information and migration instructions:

若存在，则根据所述的临时文件和配置信息，将待迁移虚拟机迁移至目标节点； If it exists, migrate the virtual machine to be migrated to the target node according to the temporary file and configuration information;

否则，发送错误报告。 Otherwise, send an error report. the

本发明中虚拟机的配置信息、临时文件以集中式或分布式的方法保存在网络的一个或多个节点中，通常为源主机(迁移前待迁移虚拟机所在的主机)和目标主机之外的临时或永久的存储设备上。 In the present invention, the configuration information and temporary files of the virtual machine are stored in one or more nodes of the network in a centralized or distributed manner, usually outside the source host (the host where the virtual machine to be migrated before migration is located) and the target host on temporary or permanent storage devices. the

所述的迁移指令应当包括待迁移虚拟机信息、目标主机、以及目标节点数等信息。接收到该指令后，保存待迁移虚拟机的相关信息(包括配置信号和临时文件)，直接根据保存的相关信息在目标主机上回复得到相应的虚拟机，即完成了待迁移虚拟机的迁移。该虚拟机迁移方法中根据迁移指令和目标主机中各个物理节点的节点信息选定目标节点，重新构建VNUMA拓扑结构，充分考虑到了目标主机自身的状态和迁移指令匹配程度，使得迁移后的虚拟机能够适应更多样的硬件环境，同时在迁移的过程中不会违反虚拟机的虚拟NUMA节点与物理机(物理主机)的NUMA节点的一一对应原则。 The migration instruction should include information about the virtual machine to be migrated, the target host, and the number of target nodes. After receiving the instruction, save the relevant information of the virtual machine to be migrated (including configuration signals and temporary files), and directly reply to the corresponding virtual machine on the target host according to the saved relevant information, that is, the migration of the virtual machine to be migrated is completed. In this virtual machine migration method, the target node is selected according to the migration instruction and the node information of each physical node in the target host, and the VNUMA topology structure is rebuilt, fully considering the state of the target host itself and the matching degree of the migration instruction, so that the virtual machine after migration It can adapt to more diverse hardware environments, and at the same time, the principle of one-to-one correspondence between the virtual NUMA nodes of the virtual machine and the NUMA nodes of the physical machine (physical host) will not be violated during the migration process. the

所述的配置信息包括待迁移虚拟机的总内存、VCPU的个数和I/O设备信息。配置信息是虚拟机的基础，根据以上信息可以初始化出一个虚拟机。 The configuration information includes the total memory of the virtual machine to be migrated, the number of VCPUs and I/O device information. Configuration information is the basis of a virtual machine, and a virtual machine can be initialized based on the above information. the

所述步骤(1)中通过以下步骤生成临时文件： In the step (1), a temporary file is generated through the following steps:

(1-1)将待迁移虚拟机已使用的内存页转存到临时文件，且在转存过程，标记新产生的脏页； (1-1) Transfer the memory pages used by the virtual machine to be migrated to a temporary file, and mark the newly generated dirty pages during the transfer process;

(1-2)统计脏页的数量，并与设定的阈值进行比较： (1-2) Count the number of dirty pages and compare with the set threshold:

若脏页的数量小于设定阈值，则暂停待迁移虚拟机，并将待迁移虚拟机寄存器中的内容和I/O设备状态、所有脏页转存至临时文件中； If the number of dirty pages is less than the set threshold, suspend the virtual machine to be migrated, and dump the contents of the registers of the virtual machine to be migrated, the status of the I/O device, and all dirty pages to a temporary file;

否则，返回对标记为脏页继续执行步骤(1-1)。 Otherwise, go back to step (1-1) on the marked dirty page. the

在转存过程中，虚拟机仍然在工作，即在转存时，仍然在访问内存页(包括读取和写入)。在转移过程中，由于写入操作的作用，待迁移虚拟机对应的内存页中的内容在不断更新，因此需要循环多次进行转存，满足一定的条件后停止。由于转存和写入操作并行，可能存在通过对某一个内存页执行写入和转存操作，此时为提高生成临时文件的效率，可以先暂停转存操作，待写入完成后再将该页转存至临时文件中。 During the dumping process, the virtual machine is still working, that is, the memory page is still being accessed (including reading and writing) during the dumping process. During the transfer process, due to the effect of the write operation, the content in the memory page corresponding to the virtual machine to be migrated is constantly being updated, so it is necessary to perform the transfer several times in a loop, and stop when certain conditions are met. Since the dump and write operations are parallel, there may be write and dump operations performed on a certain memory page. At this time, in order to improve the efficiency of generating temporary files, you can suspend the dump operation first, and then save the page after the write is completed. The page is dumped to a temporary file. the

脏页指在转存过程中被修改的内存页。在每一次循环中，针对每一个内存页，若在对该页转存之后对该内存页执行过写入操作时，则将该页作为脏页，下一次循环以本次循环的脏页为对象进行转存。 Dirty pages refer to memory pages that have been modified during dumping. In each cycle, for each memory page, if the memory page has been written to after the page is dumped, the page will be regarded as a dirty page, and the next cycle will use the dirty page of this cycle as Objects are dumped. the

所述的阈值为待迁移虚拟机的总内存的页数的1％～10％。 The threshold is 1%-10% of the number of pages in the total memory of the virtual machine to be migrated. the

阈值的选取应与源主机(待迁移虚拟机当前所处的物理机)性能密切相关，其应当考虑与源主机的磁盘读写速率，以及内存访问速率，以防止在迁移的过程中频繁地迭代迁移脏页，影响系统整体效率。总内存的页数即为总内存对应的内存页的总数，内存页总数的1％～10％这一阈值能够适应大多数物理机的硬件条件。 The selection of the threshold should be closely related to the performance of the source host (the physical machine where the virtual machine to be migrated is currently located), and it should consider the disk read and write rate and memory access rate of the source host to prevent frequent iterations during the migration process Migrating dirty pages affects the overall efficiency of the system. The number of pages in the total memory is the total number of memory pages corresponding to the total memory, and the threshold of 1% to 10% of the total number of memory pages can adapt to the hardware conditions of most physical machines. the

所述的节点信息包括物理节点的总内存大小和已使用的内存大小，以及该物理节点上物理CPU的个数和各个物理CPU在设定的时间段内的运行时间。 The node information includes the total memory size and the used memory size of the physical node, as well as the number of physical CPUs on the physical node and the running time of each physical CPU within a set time period. the

该时间段内任意选取，时长为50ms～200ms。 The time period is arbitrarily selected, and the duration is 50ms-200ms. the

所述步骤(3)通过以下步骤确定目标主机是否存在目标节点： The step (3) determines whether there is a target node in the target host through the following steps:

(3-1)根据以下公式计算目标主机上各个物理节点的负载均衡度L_d： (3-1) Calculate the load balancing degree L_d of each physical node on the target host according to the following formula:

${L L}_{d d} = = \frac{{Σ Σ}_{i i = = 11}^{n no} {t t}_{cpu cpu}^{i i}}{{t t}_{total total} * * n no} * * \frac{{M m}_{used used}}{{M m}_{total total}},,$

其中，为设定时间段内该物理节点上第i个CPU的运行时间，t_total为设定时间段的时长，n为该物理节点上的物理CPU个数，M_used为该物理节点已使用内存大小，M_total为该物理节点的总内存大小； in, is the running time of the i-th CPU on the physical node within the set time period, t_total is the length of the set time period, n is the number of physical CPUs on the physical node, and M_used is the used memory size of the physical node , M_total is the total memory size of the physical node;

(3-2)按照负载均衡度从小到大的顺序，依次判断各个物理节点的内存是否大于或等于虚拟节点所需要的内存： (3-2) According to the order of load balancing degree from small to large, determine whether the memory of each physical node is greater than or equal to the memory required by the virtual node:

若大于或等于虚拟节点所需要的内存，则将该物理节点作为候选节点，并继续判断下一个物理节点； If it is greater than or equal to the memory required by the virtual node, take the physical node as a candidate node and continue to judge the next physical node;

否则，直接判断下一个物理节点； Otherwise, directly judge the next physical node;

(3-3)直至候选节点的个数等于目标节点数或所有物理节点均判断完成时停止： (3-3) Stop until the number of candidate nodes is equal to the number of target nodes or all physical nodes are judged to be completed:

(a)若停止后得到的候选节点的个数等于目标节点数，则比较所有候选节点的物理CPU的总个数与待迁移虚拟机的VCPU的总个数： (a) If the number of candidate nodes obtained after stopping is equal to the number of target nodes, compare the total number of physical CPUs of all candidate nodes with the total number of VCPUs of the virtual machine to be migrated:

若所有候选节点的物理CPU的总个数大于或等于待迁移虚拟机的VCPU的总个数，则认为存在目标节点，并以候选节点作为目标节点； If the total number of physical CPUs of all candidate nodes is greater than or equal to the total number of VCPUs of the virtual machine to be migrated, it is considered that there is a target node, and the candidate node is used as the target node;

否则，认为不存在目标节点； Otherwise, it is considered that there is no target node;

(b)若停止后得到的候选节点的个数小于目标节点数，则认为不存在目标节点； (b) If the number of candidate nodes obtained after stopping is less than the number of target nodes, it is considered that there is no target node;

所述目标节点数等于迁移指令设定的目标节点的个数。 The number of target nodes is equal to the number of target nodes set by the migration instruction. the

负载均衡度越低，该物理节点剩余资源越多。将负载均衡度作为确定目标节点的一个条件，优选选择负载均衡度越低的物理节点作为目标节点，且综合考虑了待迁移虚拟机的内存和目标节点数、以及待迁移虚拟机的VCPU的总个数与各个物理CPU之间的关系，最大程度的保证了NUMA系统的整体性能。 The lower the load balance, the more resources the physical node has left. The load balance degree is used as a condition for determining the target node, and the physical node with the lower load balance degree is preferably selected as the target node, and the memory of the virtual machine to be migrated, the number of target nodes, and the total number of VCPUs of the virtual machine to be migrated are considered comprehensively. The relationship between the number and each physical CPU ensures the overall performance of the NUMA system to the greatest extent. the

作为优选，所述步骤(3)通过以下步骤确定目标主机是否存在目标节点： As preferably, said step (3) determines whether there is a target node in the target host through the following steps:

(S3-1)根据以下公式计算目标主机上各个物理节点的负载均衡度L_d： (S3-1) Calculate the load balancing degree L_d of each physical node on the target host according to the following formula:

(S3-2)按照负载均衡度从小到大的顺序，依次判断各个物理节点的内存是否大于或等于虚拟节点所需要的内存： (S3-2) According to the order of load balancing degree from small to large, determine whether the memory of each physical node is greater than or equal to the memory required by the virtual node:

(S3-3)直至候选节点的个数等于目标节点数或所有物理节点均判断完成时停止： (S3-3) Stop until the number of candidate nodes is equal to the number of target nodes or all physical nodes are judged to be completed:

若所有候选节点的物理CPU的总个数大于或等于待迁移虚拟机的VCPU的总个数，则判断存在目标节点，并以候选节点作为目标节点； If the total number of physical CPUs of all candidate nodes is greater than or equal to the total number of VCPUs of the virtual machine to be migrated, it is determined that there is a target node, and the candidate node is used as the target node;

否则，将目标节点加数1返回执行步骤(3-1)，直至目标节点数大于目标主机的物理节点的总数时停止，并判断不存在目标节点； Otherwise, return the target node addend 1 to the execution step (3-1), stop until the number of target nodes is greater than the total number of physical nodes of the target host, and judge that there is no target node;

(b)若停止后得到的候选节点的个数小于目标节点数，将目标节点加1 返回执行步骤(3-1)，直至目标节点数大于目标主机的物理节点的总数时停止，并判断不存在目标节点； (b) If the number of candidate nodes obtained after stopping is less than the number of target nodes, add 1 to the target node and return to step (3-1), stop until the number of target nodes is greater than the total number of physical nodes of the target host, and judge that it is not there is a target node;

所述的目标节点数的初始值为迁移指令设定的目标节点的个数。 The initial value of the number of target nodes is the number of target nodes set by the migration instruction. the

通过动态调整目标节点数，在一定程度上尝试更多物理节点作为目标节点，能更好地满足对目标节点数要求不严格的用户的需求。本发明中在每次循环时，目标节点数自加的步进值为1，在实际应用中可以采用其他步进值，还可以将该自加的步进设置为变化值，根据需要实际情况动态调整。 By dynamically adjusting the number of target nodes and trying more physical nodes as target nodes to a certain extent, it can better meet the needs of users who do not have strict requirements on the number of target nodes. In the present invention, during each cycle, the self-added step value of the target node number is 1, and other step values can be used in practical applications, and the self-added step can also be set as a variable value, according to the actual situation as needed Dynamic Adjustment. the

所述的虚拟节点所需要的内存为待迁移虚拟机的总内存除以目标节点数。 The memory required by the virtual nodes is the total memory of the virtual machine to be migrated divided by the number of target nodes. the

所述步骤(3)中通过以下步骤将待迁移虚拟机迁移至目标节点： In the step (3), the virtual machine to be migrated is migrated to the target node through the following steps:

(S1)根据配置信息在目标节点上构建虚拟机，恢复虚拟机的寄存器的内容、I/O设备状态和页表，并将页表中对应于已使用的内存页的页表项标记为缺页； (S1) build a virtual machine on the target node according to the configuration information, restore the contents of the registers of the virtual machine, the I/O device state and the page table, and mark the page table entry corresponding to the used memory page in the page table as missing Page;

(S2)启动构建的虚拟机，并根据临时文件中已使用的内存页恢复该虚拟机的内存，并取消对应页表项的缺页标记。 (S2) Start the constructed virtual machine, restore the memory of the virtual machine according to the used memory pages in the temporary file, and cancel the page fault mark of the corresponding page table entry. the

迁移过程为在目标主机根据确定的目标节点重构待迁移虚拟机。在重构该虚拟机时需要将待迁移虚拟机的内存写入构建的虚拟机中，本发明中构建虚拟机后直接启动是虚拟机运行，在运行过程中进行内存恢复，一方面根据一定的顺序将临时文件中的内存页的恢复虚拟机的内存，另一方面，根据访问顺序进行，若访问的内存页对应的页表项为缺页，则优先恢复该内存页。 The migration process is to reconstruct the virtual machine to be migrated on the target host according to the determined target node. When reconfiguring the virtual machine, it is necessary to write the memory of the virtual machine to be migrated into the constructed virtual machine. In the present invention, after the virtual machine is constructed, the virtual machine is directly started to run, and the memory is restored during the running process. On the one hand, according to a certain The memory pages in the temporary file are restored sequentially to the memory of the virtual machine. On the other hand, it is performed according to the access sequence. If the page table entry corresponding to the accessed memory page is a page fault, the memory page is restored first. the

本发明还提供了一种基于NUMA架构的虚拟机迁移系统，包括： The present invention also provides a virtual machine migration system based on NUMA architecture, including:

虚拟机监视器，用于接收迁移指令，获取待迁移虚拟机的配置信息，所述的配置信息包括虚拟机的内存信息、VCPU的个数、和I/O设备信息等； A virtual machine monitor, configured to receive a migration instruction and obtain configuration information of a virtual machine to be migrated, the configuration information including memory information of the virtual machine, the number of VCPUs, and I/O device information, etc.;

临时文件生成器，用于根据待迁移虚拟机的配置信息和已使用的内存页，以及待迁移虚拟机寄存器中的内容和I/O设备状态生成临时文件，并在生产临时文件后关闭待迁移虚拟机； The temporary file generator is used to generate temporary files according to the configuration information and used memory pages of the virtual machine to be migrated, as well as the contents in the registers of the virtual machine to be migrated and the status of I/O devices, and close the to-be-migrated file after producing the temporary file virtual machine;

节点信息采集器，用于获取目标主机上所有物理节点的节点信息； Node information collector, used to obtain node information of all physical nodes on the target host;

目标节点匹配器，用于根据所述的节点信息和迁移指令判断目标主机是否存在目标节点； The target node matcher is used to judge whether there is a target node in the target host according to the node information and migration instructions;

虚拟机恢复器，用于根据目标节点匹配器的判断结果进行如下操作： The virtual machine restorer is used to perform the following operations according to the judgment result of the target node matcher:

若判断存在目标节点，则根据临时文件和待迁移虚拟机的配置信息将虚拟机迁移至目标节点； If it is judged that there is a target node, the virtual machine is migrated to the target node according to the configuration information of the temporary file and the virtual machine to be migrated;

若判断不存在目标节点，则发送错误报告。 If it is judged that there is no target node, an error report is sent. the

与现有技术相比，本发明的优点在于： Compared with prior art, the advantage of the present invention is:

(a)在迁移的基础上保证了虚拟机的虚拟NUMA节点与物理机的NUMA节点(即物理节点)的严格绑定，使得虚拟机的性能下降较少； (a) On the basis of migration, the strict binding between the virtual NUMA node of the virtual machine and the NUMA node of the physical machine (that is, the physical node) is guaranteed, so that the performance of the virtual machine is less degraded;

(b)根据目标主机的硬件情况对虚拟机的虚拟NUMA节点数进行改变，使得虚拟机能够适应更加多样的硬件环境； (b) Change the number of virtual NUMA nodes of the virtual machine according to the hardware conditions of the target host, so that the virtual machine can adapt to more diverse hardware environments;

(c)采用临时文件记录虚拟机的状态信息，使得虚拟机迁移过程可被延迟，可被远端传输，在时间和空间上更加灵活。 (c) Temporary files are used to record the status information of the virtual machine, so that the migration process of the virtual machine can be delayed, and can be transmitted remotely, which is more flexible in time and space. the

附图说明Description of drawings

图1为本实施例的基于NUMA架构的虚拟机迁移方法的流程图； Fig. 1 is the flowchart of the virtual machine migration method based on NUMA architecture of the present embodiment;

图2为本实施例的基于NUMA架构的虚拟机迁移系统的结构示意图。 FIG. 2 is a schematic structural diagram of a virtual machine migration system based on NUMA architecture in this embodiment. the

具体实施方式Detailed ways

下面将结合附图和具体实施例对本发明进行详细描述。 The present invention will be described in detail below with reference to the drawings and specific embodiments. the

本实施例的虚拟机迁移系统基于如图1所示的MUMA架构，该MUMA架构包括m个主机(物理主机，主机1、…、主机m)，每个物理主机上设有若物理节点，其中第一个主机(主机1)上有p个物理节点(物理节点1、物理节点2、…、物理节点p)。如图1所示，该虚拟机迁移系统包括： The virtual machine migration system of this embodiment is based on the MUMA architecture shown in Figure 1, the MUMA architecture includes m hosts (physical hosts, host 1, ..., host m), each physical host is provided with several physical nodes, wherein There are p physical nodes (physical node 1, physical node 2, ..., physical node p) on the first host (host 1). As shown in Figure 1, the virtual machine migration system includes:

虚拟机监视器，用于接收迁移指令，获取待迁移虚拟机的配置信息，该配置信息包括虚拟机的内存信息、VCPU的个数、和I/O设备信息等； The virtual machine monitor is used to receive the migration instruction and obtain the configuration information of the virtual machine to be migrated, the configuration information includes the memory information of the virtual machine, the number of VCPUs, and I/O device information, etc.;

临时文件生成器，用于根据待迁移虚拟机的配置信息和已使用的内存页，以及待迁移虚拟机寄存器中的内容和I/O设备状态生成临时文件，并在生成临时文件后关闭待迁移虚拟机； The temporary file generator is used to generate temporary files according to the configuration information and used memory pages of the virtual machine to be migrated, as well as the contents in the registers of the virtual machine to be migrated and the status of I/O devices, and close the migration after the temporary files are generated virtual machine;

节点信息采集器，用于采获取目标主机上所有物理节点的节点信息； Node information collector, used to collect node information of all physical nodes on the target host;

目标节点匹配器，根据节点信息和迁移指令判断目标主机是否存在目标节点； The target node matcher judges whether there is a target node in the target host according to node information and migration instructions;

本实施例的迁移指令包括待迁移虚拟机信息、目标主机、以及目标节点数等信息。 The migration instruction in this embodiment includes information such as virtual machine information to be migrated, a target host, and the number of target nodes. the

如图2所示，本实施例的虚拟机迁移方法包括： As shown in Figure 2, the virtual machine migration method in this embodiment includes:

(1)获取待迁移虚拟机的配置信息，并根据待迁移虚拟机已使用的内存页、虚拟机寄存器中的内容和I/O设备状态生成临时文件。通过以下步骤生成临时文件： (1) Obtain the configuration information of the virtual machine to be migrated, and generate a temporary file according to the memory pages used by the virtual machine to be migrated, the content in the virtual machine register and the state of the I/O device. Generate a temporary file by following these steps:

(1-2)统计脏页的数量，并与设定的阈值(本实施例中为总内存的页数的3％)进行比较： (1-2) Count the number of dirty pages, and compare it with the threshold (3% of the total memory pages in this embodiment):

在每一次循环中，针对每一个内存页，若在对该页转存之后对该内存页执行过写入操作时，则将该页作为脏页，下一次循环以本次循环的脏页为对象进行转存。 In each cycle, for each memory page, if the memory page has been written after the page is dumped, the page will be regarded as a dirty page, and the next cycle will use the dirty page of this cycle as Objects are dumped. the

配置信息是虚拟机的基础，根据以上信息可以初始化出一个虚拟机。本实施例中配置信息包括待迁移虚拟机的总内存、VCPU的个数和I/O设备信息。 Configuration information is the basis of a virtual machine, and a virtual machine can be initialized based on the above information. In this embodiment, the configuration information includes the total memory of the virtual machine to be migrated, the number of VCPUs and I/O device information. the

(2)关闭待迁移虚拟机，确保待迁移虚拟机的寄存器的内容、I/O状态和内存不再发生变化，使临时文件与待迁移虚拟机中的状态保持一致。 (2) Close the virtual machine to be migrated, ensure that the register content, I/O state and memory of the virtual machine to be migrated are no longer changed, and make the temporary file consistent with the state in the virtual machine to be migrated. the

(3)获取目标主机上所有物理节点的节点信息，并根据节点信息和迁移指令判断目标主机是否存在目标节点： (3) Obtain the node information of all physical nodes on the target host, and judge whether there is a target node on the target host according to the node information and migration instructions:

若存在，则根据临时文件和配置信息，将待迁移虚拟机迁移至目标节点； If it exists, migrate the virtual machine to be migrated to the target node according to the temporary file and configuration information;

否则，发送错误报告。 Otherwise, send an error report. the

其中，节点信息包括物理节点的总内存大小和已使用的内存大小，以及该物理节点上物理CPU的个数和各个物理CPU在设定的时间段内的运行时间。 Wherein, the node information includes the total memory size and the used memory size of the physical node, as well as the number of physical CPUs on the physical node and the running time of each physical CPU within a set time period. the

本实施例中通过以下步骤确定目标主机是否存在目标节点： In this embodiment, the following steps are used to determine whether there is a target node in the target host:

其中，为设定时间段内该物理节点上第i个CPU的运行时间，t_total为设定时间段的时长(本实施例中t_total＝100ms)，n为该物理节点上的物理CPU个数，M_used为该物理节点已使用内存大小，M_total为该物理节点的总内存大小。 in, For the running time of the i-th CPU on the physical node in the set time period, t_total is the duration of the set time period (t_total =100ms in this embodiment), and n is the number of physical CPUs on the physical node, M_used is the used memory size of the physical node, and M_total is the total memory size of the physical node.

(b)若停止后得到的候选节点的个数小于目标节点数，将目标节点加1返回执行步骤(3-1)，直至目标节点数大于目标主机的物理节点的总数时停止，并判断不存在目标节点。 (b) If the number of candidate nodes obtained after stopping is less than the number of target nodes, add 1 to the target node and return to step (3-1), stop until the number of target nodes is greater than the total number of physical nodes of the target host, and judge whether A target node exists. the

在循环过程中，目标节点数的初始值为迁移指令设定的目标节点的个数。虚拟节点所需要的内存为待迁移虚拟机的总内存除以目标节点数(该次循环对应的目标节点数)。 In the loop process, the initial value of the number of target nodes is the number of target nodes set by the migration command. The memory required by the virtual node is the total memory of the virtual machine to be migrated divided by the number of target nodes (the number of target nodes corresponding to this cycle). the

本实施例中通过以下步骤将待迁移虚拟机迁移至目标节点： In this embodiment, the virtual machine to be migrated is migrated to the target node through the following steps:

构建虚拟机后直接启动是虚拟机运行，在运行过程中进行内存恢复，一方面根据一定的顺序将临时文件中的内存页的恢复虚拟机的内存，另一方面，根据访问顺序进行，若访问的内存页对应的页表项为缺页，则优先恢复该内存页。 After the virtual machine is built, the virtual machine is started directly, and the memory is restored during the running process. On the one hand, the memory pages in the temporary file are restored to the memory of the virtual machine according to a certain order; on the other hand, it is carried out according to the access sequence. If the page table entry corresponding to the memory page is a page fault, the memory page will be restored first. the

以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明，应理解的是以上所述仅为本发明的最优选实施例，并不用于限制本发明，凡在本发明的原则范围内所做的任何修改、补充和等同替换等，均应包含在本发明的保护范围之内。 The above-mentioned specific embodiments have described the technical solutions and beneficial effects of the present invention in detail. It should be understood that the above-mentioned are only the most preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, supplements and equivalent replacements made within the scope shall be included in the protection scope of the present invention. the

Claims

Translated fromChinese

1.一种基于NUMA架构的虚拟机迁移方法，其特征在于，包括： 1. A virtual machine migration method based on NUMA architecture, characterized in that, comprising:

(2)关闭待迁移虚拟机； (2) Close the virtual machine to be migrated;

否则，发送错误报告。 Otherwise, send an error report. the

2.如权利要求1所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述的配置信息包括待迁移虚拟机的总内存、VCPU的个数和I/O设备信息。 2. The virtual machine migration method based on NUMA architecture according to claim 1, wherein the configuration information includes the total memory of the virtual machine to be migrated, the number of VCPUs and I/O device information. the

3.如权利要求2所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述步骤(1)中通过以下步骤生成临时文件： 3. the virtual machine migration method based on NUMA architecture as claimed in claim 2, is characterized in that, in described step (1), generate temporary file by following steps:

4.如权利要求3所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述的阈值为待迁移虚拟机的总内存的页数的1％～10％。 4. The method for migrating a virtual machine based on NUMA architecture according to claim 3, wherein the threshold is 1%-10% of the number of pages in the total memory of the virtual machine to be migrated. the

5.如权利要求4所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述的节点信息包括物理节点的总内存大小和已使用的内存大小，以及该物理节点上物理CPU的个数和各个物理CPU在设定的时间段内的运行时间。 5. The virtual machine migration method based on NUMA architecture as claimed in claim 4, wherein said node information includes the total memory size and the used memory size of the physical node, and the number of physical CPUs on the physical node The number and running time of each physical CPU within a set period of time. the

6.如权利要求5所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述步骤(3)通过以下步骤确定目标主机是否存在目标节点： 6. the virtual machine migration method based on NUMA architecture as claimed in claim 5, is characterized in that, described step (3) determines whether there is target node in target host through the following steps:

7.如权利要求5所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述步骤(3)通过以下步骤确定目标主机是否存在目标节点： 7. the virtual machine migration method based on NUMA architecture as claimed in claim 5, is characterized in that, described step (3) determines whether there is target node in target host through the following steps:

(b)若停止后得到的候选节点的个数小于目标节点数，将目标节点加1返回执行步骤(3-1)，直至目标节点数大于目标主机的物理节点的总数时停止，并判断不存在目标节点； (b) If the number of candidate nodes obtained after stopping is less than the number of target nodes, add 1 to the target node and return to step (3-1), stop until the number of target nodes is greater than the total number of physical nodes of the target host, and judge whether there is a target node;

8.如权利要求6或7所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述的虚拟节点所需要的内存为待迁移虚拟机的总内存除以目标节点数。 8. The method for migrating a virtual machine based on NUMA architecture according to claim 6 or 7, wherein the memory required by the virtual node is the total memory of the virtual machine to be migrated divided by the number of target nodes. the

9.如权利要求1所述的基于NUMA架构的虚拟机迁移方法，其特征在于，所述步骤(3)中通过以下步骤将待迁移虚拟机迁移至目标节点： 9. The method for migrating a virtual machine based on NUMA architecture as claimed in claim 1, wherein the virtual machine to be migrated is migrated to the target node through the following steps in the step (3):

10.一种基于NUMA架构的虚拟机迁移系统，其特征在于，包括： 10. A virtual machine migration system based on NUMA architecture, characterized in that, comprising:

临时文件生成器，用于根据待迁移虚拟机的配置信息和已使用的内存页，以及待迁移虚拟机寄存器中的内容和I/O设备状态生成临时文件，并在生产临时文件后关闭待迁移虚拟机； The temporary file generator is used to generate temporary files according to the configuration information and used memory pages of the virtual machine to be migrated, as well as the contents in the registers of the virtual machine to be migrated and the status of I/O devices, and close the temporary files to be migrated after producing the temporary files virtual machine;

若判断存在目标节点，则根据临时文件和待迁移虚拟机的配置信息将虚拟机迁移至目标节点； If it is judged that there is a target node, the virtual machine is migrated to the target node according to the temporary file and the configuration information of the virtual machine to be migrated;