技术领域technical field
本发明涉及一种提高操作系统大页使用率的方法,属于操作系统内存管理技术领域。The invention relates to a method for increasing the utilization rate of large pages of an operating system, and belongs to the technical field of operating system memory management.
技术背景technical background
分页机制是现代CPU所共有的特征。它建立了一个从程序的虚拟内存空间到机器的物理内存之间的映射。内存以页为单位进行管理。包括Intel x86在内的很多CPU体系结构实现了一种大页机制,支持4k,2M,4M等多种不同大小的物理页粒度。Paging is a common feature of modern CPUs. It establishes a mapping from the program's virtual memory space to the machine's physical memory. Memory is managed in units of pages. Many CPU architectures, including Intel x86, implement a large page mechanism, which supports physical page granularities of different sizes such as 4k, 2M, and 4M.
目前的操作系统(例如Linux)为了简单起见,普遍使用4K大小的页,这种做法使得每一个映射覆盖的虚拟地址的范围只有4K。如果需要访问一段长度为2M的内存,操作系统需要处理512次缺页中断,每次缺页中断分配4K内存。For the sake of simplicity, the current operating system (for example, Linux) generally uses pages of 4K size, which makes the range of virtual addresses covered by each mapping only 4K. If a memory with a length of 2M needs to be accessed, the operating system needs to handle 512 page fault interrupts, and each page fault interrupt allocates 4K memory.
以Linux操作系统为例,Linux自2.6.38开始支持透明大页。使得操作系统能建立大小为2M(使用32位操作系统时为4M)的虚拟地址到物理地址映射,称之为大页映射(简称大页)。使用大页能够减少TLB失效和CPU寻址时发生的计算和Cache开销。该机制包含了几个功能,例如物理页碎片的整理,用以保证有足够的大页对齐的物理内存供使用。在发生缺页中断(page fault)时进行判断,如果条件允许则使用大页建立映射。一套针对大页的内存操作工具,提供了写时复制等基本内存操作。Taking the Linux operating system as an example, Linux supports transparent huge pages since 2.6.38. The operating system can establish a virtual address-to-physical address mapping with a size of 2M (4M when using a 32-bit operating system), which is called a large page mapping (abbreviated as a large page). Using huge pages can reduce the computational and cache overhead that occurs during TLB misses and CPU addressing. This mechanism includes several functions, such as defragmentation of physical pages, to ensure that there is enough large-page-aligned physical memory for use. Judgment is made when a page fault occurs, and if conditions permit, a large page is used to establish a mapping. A set of memory operation tools for large pages, providing basic memory operations such as copy-on-write.
该大页模块主要针对程序中的私有匿名内存进行优化(一个程序使用的内存绝大部分是以私用匿名的方式使用的。而且大页并不适合在其他环境下使用。共享内存使用大页会增加写时复制的开销,映射到文件的内存使用大页会增加缓存的负担)。大页的优化不是强制的,而是采用“如果满足条件,那么使用大页”的逻辑。用“大页使用率”来衡量Linux大页模块的使用程度。它等于:一个进程的私有匿名内存中,以大页形式分配的内存数,除以程序总的私有匿名内存使用量。The large page module is mainly optimized for private anonymous memory in the program (most of the memory used by a program is used privately and anonymously. And the large page is not suitable for use in other environments. Shared memory uses large pages It will increase the overhead of copy-on-write, and the use of large pages in the memory mapped to the file will increase the burden on the cache). The optimization of huge pages is not mandatory, but uses the logic of "if the conditions are met, then use huge pages". Use "huge page usage" to measure the usage of the Linux huge page module. It is equal to: the number of memory allocated in the form of huge pages in the private anonymous memory of a process, divided by the total private anonymous memory usage of the program.
操作系统对内存的分配是以page fault驱动的。每个进程有自己的地址空间(在Linux系统中对应mm_struct结构),一个地址空间维护进程的虚拟地址以及虚拟地址到物理地址的映射。一个进程刚建立或者申请内存的时候,进程会调用函数申请一些虚拟内存。这时候操作系统并不会立刻分配物理内存,而是建立一个virtual memory area(以下简称vma),表示说,这段虚拟地址已经被使用了。每一个mm_struct包含若干个vma,而每一个vma必定属于某一个mm_struct。等到进程实际读写这段虚拟地址的时候,因为没有分配内存,一定会发生page fault。操作系统根据发生page fault的地址通过mm_struct定位到对应的vma,根据发生page fault的原因和vma的属性决定如何处理page fault。例如如果访问到的虚拟地址不在任何vma里面,或者访问的区域不是可读的,那么就是访问了非法的地址,返回segment fault,并杀死进程。如果写操作访问了一个vma标明可写,但是页表上标记不可写的地址时,那么是写了写时复制的页面,操作系统会将页面拷贝一份并重新设为可写页面。如果访问的位置在vma内,但是没有建立虚拟地址到物理地址的映射,那说明还没有分配内存,此时根据vma的类型,分配物理内存,建立映射。The allocation of memory by the operating system is driven by page faults. Each process has its own address space (corresponding to the mm_struct structure in the Linux system), and an address space maintains the virtual address of the process and the mapping from virtual address to physical address. When a process is just created or applies for memory, the process will call a function to apply for some virtual memory. At this time, the operating system does not allocate physical memory immediately, but creates a virtual memory area (hereinafter referred to as vma), indicating that this virtual address has been used. Each mm_struct contains several vma, and each vma must belong to a certain mm_struct. When the process actually reads and writes this virtual address, because no memory is allocated, a page fault will definitely occur. The operating system locates the corresponding vma through mm_struct according to the address where the page fault occurs, and decides how to deal with the page fault according to the cause of the page fault and the attributes of the vma. For example, if the accessed virtual address is not in any vma, or the accessed area is not readable, then an illegal address is accessed, a segment fault is returned, and the process is killed. If the write operation accesses a vma marked as writable, but the page table marks an unwritable address, then the copy-on-write page is written, and the operating system will copy the page and reset it as a writable page. If the accessed location is within the vma, but no mapping from virtual address to physical address has been established, it means that memory has not been allocated. At this time, according to the type of vma, physical memory is allocated and the mapping is established.
建立内存映射时会根据vma的情况决定是否使用大页,一般来说,如果访问地址上下2M对齐构成的一段2M虚拟空间被所在vma包含,而且这段2M虚拟地址空间里面没有建立过映射(二级页表为空),那么Linux内核会为这段虚拟地址分配大页。如果上层申请内存的时候没有考虑大页对齐的需求,申请的vma经常不会2M对齐,这样就无法使用大页。When establishing memory mapping, it will decide whether to use large pages according to the situation of the vma. Generally speaking, if the access address is 2M aligned, a 2M virtual space is included in the vma, and no mapping has been established in this 2M virtual address space (two level page table is empty), then the Linux kernel will allocate large pages for this virtual address. If the upper layer does not consider the requirement of large page alignment when applying for memory, the applied vma is often not 2M aligned, so large pages cannot be used.
由于大页分页机制的固有限制,要求所有的物理地址和虚拟地址必须是对齐的。如果64位操作系统使用2M的页进行映射,那么虚拟地址的首尾必须2M对齐,物理地址的首尾也需要2M对齐。对于32位操作系统,则在物理地址和虚拟地址4M对齐的情况下才能使用大页。物理页的管理是操作系统自己维护的,在透明大页的实现中已经具有了碎片整理功能,可以保证有足够大页对齐的物理页用于建立大页映射。虚拟地址一般是用户申请的,不会考虑到底层是否有透明大页的支持,因此虚拟地址申请时往往不是大页对齐的。例如现在Linux透明大页就存在这个问题。应用程序虚拟地址的申请在很多情况下不满足大页对齐的要求,这使得Linux内核不会分配大页,转而使用小页。这种做法使得很多情况下,Linux内核不会给程序分配大页,大页使用率不高。Due to the inherent limitations of the large page paging mechanism, all physical addresses and virtual addresses must be aligned. If the 64-bit operating system uses 2M pages for mapping, then the beginning and end of the virtual address must be 2M aligned, and the beginning and end of the physical address must also be 2M aligned. For 32-bit operating systems, large pages can only be used when the physical address and virtual address are 4M aligned. The management of physical pages is maintained by the operating system itself. The implementation of transparent huge pages already has a defragmentation function, which can ensure that there are sufficient large-page-aligned physical pages for establishing large-page mappings. The virtual address is generally applied by the user, without considering whether the underlying layer has transparent huge page support, so the virtual address application is often not aligned with the large page. For example, Linux transparent huge pages now have this problem. In many cases, the application of the virtual address of the application does not meet the requirements of large page alignment, which makes the Linux kernel not allocate large pages and use small pages instead. This approach makes the Linux kernel not allocate large pages to programs in many cases, and the usage rate of large pages is not high.
发明内容Contents of the invention
针对现有技术中存在的技术问题,本发明的目的在于提供一种提高操作系统大页使用率的方法,可有效的提升大页的使用率,进而提升了程序的性能。In view of the technical problems existing in the prior art, the purpose of the present invention is to provide a method for increasing the utilization rate of large pages in an operating system, which can effectively increase the utilization rate of large pages, and further improve the performance of programs.
本发明的技术方案为:Technical scheme of the present invention is:
一种提高操作系统大页使用率的方法,其步骤为:A method for improving the utilization rate of large pages in an operating system, the steps of which are:
1)系统在每一进程的虚拟地址空间数据结构中增加一变量a,用于记录为进程已经分配虚拟地址的堆顶位置;并且修改堆起始设置函数,使其返回的堆起始地址大页对齐;1) The system adds a variable a in the virtual address space data structure of each process, which is used to record the heap top position where the virtual address has been allocated for the process; and revises the heap initial setting function to make the returned heap initial address larger page alignment;
2)进程启动时,系统将变量a初始化为0;当该进程调用堆顶设置函数进行内存申请时,向系统传入一堆顶参数b,b为请求设置的新的堆顶位置;2) When the process starts, the system initializes the variable a to 0; when the process calls the heap top setting function to apply for memory, it passes in a heap top parameter b to the system, and b is the new heap top position set by the request;
3)系统根据该进程上一次请求的堆顶位置和当前请求的堆顶位置b,计算满足该进程当前内存需求的堆顶位置向上大页对齐之后的值c;3) The system calculates the value c of the heap top position that satisfies the current memory requirement of the process according to the heap top position requested last time by the process and the heap top position b currently requested;
4)系统将该进程变量a的当前值赋值给该进程的记录已分配内存最高堆地址变量,并将其与c进行比较:如果小于c,则系统根据二者差值增大该进程的堆栈空间;如果大于c,则根据二者差值进行内存释放,减小该进程的堆栈空间,如果相等,则不进行内存调用。4) The system assigns the current value of the process variable a to the highest heap address variable of the process’s record allocated memory, and compares it with c: if it is less than c, the system increases the stack of the process according to the difference between the two space; if it is greater than c, the memory will be released according to the difference between the two, and the stack space of the process will be reduced; if they are equal, no memory call will be made.
进一步的,如果该进程上一次请求的堆顶位置大于当前请求的堆顶位置b,则从上一次请求的堆顶位置和c之间选取一较小值,系统检查b到该较小值之间的虚拟地址是否已被分配了物理内存,如果已被分配,则将这段物理内存清0。Further, if the heap top position requested last time by the process is greater than the currently requested heap top position b, select a smaller value between the heap top position requested last time and c, and the system checks the value between b and the smaller value. Whether the virtual address in between has been allocated physical memory, if it has been allocated, clear this piece of physical memory to 0.
进一步的,如果变量a的值为0,即系统还未对该进程的堆分配过内存,则系统将该进程上一次请求的堆顶位置取值为该进程的堆的起始地址。Further, if the value of the variable a is 0, that is, the system has not allocated memory to the heap of the process, the system takes the heap top position requested last time by the process as the starting address of the heap of the process.
进一步的,所述堆起始设置函数返回的堆起始位置为向上大页对齐后再减去一设定值。Further, the heap start position returned by the heap start setting function is the upper huge page alignment and then minus a set value.
进一步的,所述变量a为长整形地址变量。Further, the variable a is a long integer address variable.
本发明解决对齐的问题核心思想是:在操作系统收到内存申请的系统调用时,修改处理内存申请的逻辑,在满足用户申请的前提下,额外分配一部分内存,从而建立对齐的vma。分配额外内存的时候,需要考虑如何有效利用这些额外内存,减少内存浪费;对于不可避免的浪费,需要在系统性能和内存开销之间权衡,以确定最优方案。发明需要修改操作系统内核的函数,使得即使上层的内存申请不满足对齐要求,内核可以创造大页对齐的vma,满足上层申请,并实现大页。在处理的过程中,需要让所做的修改和已有机制相容,并考虑可能造成的额外开销。The core idea of the present invention to solve the problem of alignment is: when the operating system receives a system call for memory application, it modifies the logic for processing memory application, and allocates an additional part of memory under the premise of satisfying the user application, so as to establish an aligned vma. When allocating additional memory, it is necessary to consider how to effectively utilize the additional memory to reduce memory waste; for unavoidable waste, it is necessary to balance system performance and memory overhead to determine the optimal solution. The invention needs to modify the functions of the operating system kernel, so that even if the memory application of the upper layer does not meet the alignment requirements, the kernel can create a vma with large page alignment, meet the application of the upper layer, and realize the large page. During the processing, it is necessary to make the modifications compatible with the existing mechanism, and consider possible additional overhead.
图1是Linux操作系统下进程使用内存的示意图。箭头表示调用,文字表示调用时需要提供的参数。一个C语言程序,使用内存主要有这几个地方:代码段和数据段,全局变量段,堆,栈。在进程启动时,代码段,数据段(已经初始化的全局变量),bss段(未初始化的全局变量存放的地方)和栈就已经建立了。它们有负责各自建立的函数,其中使用的一部分在图中标明。FIG. 1 is a schematic diagram of memory used by processes under the Linux operating system. The arrow indicates the call, and the text indicates the parameters that need to be provided when calling. A C language program mainly uses memory in these places: code segment and data segment, global variable segment, heap, and stack. When the process starts, the code segment, data segment (initialized global variables), bss segment (where uninitialized global variables are stored) and the stack have already been established. They have functions responsible for their own establishment, and some of them are marked in the figure.
如果需要在程序运行中动态申请内存,有两种方法可以使用。If you need to dynamically apply for memory while the program is running, there are two methods you can use.
一种方法是调用sys_brk函数。每一个进程都有一个堆空间,用来快速申请新的内存。堆在操作系统里面是一个vma,由一套专门的函数负责管理,vma的低地址称之为堆底,高地址称之为堆顶,这段vma的内存使用方式是私有匿名的,用户可以在这段区域内任意访问。用户只有一种方法操作堆,就是设置堆顶(堆底的位置在进程建立时初始化,并且在整个进程运行过程中不能修改)。如果需要申请K字节的内存,就设置新堆顶为原来的堆顶+K。相反的,如果想释放内存,那么只能释放堆顶部分的内存,通过将新堆顶设置得比原来的堆顶小,操作系统就会自动释放新堆顶上方的内存。操作系统提供的设置堆顶的函数为sys_brk。One way is to call the sys_brk function. Each process has a heap space, which is used to quickly apply for new memory. The heap is a vma in the operating system, which is managed by a set of special functions. The low address of the vma is called the bottom of the heap, and the high address is called the top of the heap. The memory usage of this vma is private and anonymous. Users can Any access within this area. The user has only one way to operate the heap, which is to set the heap top (the position of the heap bottom is initialized when the process is created, and cannot be modified during the entire process running). If you need to apply for K bytes of memory, set the new heap top to the original heap top + K. On the contrary, if you want to release memory, you can only release the memory at the top of the heap. By setting the new heap top to be smaller than the original heap top, the operating system will automatically release the memory above the new heap top. The function provided by the operating system to set the top of the heap is sys_brk.
一个程序使用的私有匿名内存都是通过这两个函数分配的。本专利修改由brk函数申请的内存。目的是通过修改,令其产生大页对齐的vma。使得发生缺页中断时,操作系统会在这段vma内分配大页,从而提升大页使用率。这个问题的难点在于malloc等上层程序申请内存时没有考虑大页对齐,它们的参数不一定是大页对齐的。我们需要在底层进行修改,一方面满足上层内存申请的需求,另一方面维护vma满足大页对齐的限制。The private anonymous memory used by a program is allocated through these two functions. This patent modifies the memory requested by the brk function. The purpose is to modify it to generate a large page aligned vma. When a page fault interrupt occurs, the operating system will allocate large pages in this vma, thereby increasing the utilization rate of large pages. The difficulty of this problem is that upper-level programs such as malloc do not consider large page alignment when applying for memory, and their parameters are not necessarily large page aligned. We need to make modifications at the bottom layer, on the one hand to meet the requirements of the upper layer memory application, and on the other hand to maintain the vma to meet the constraints of large page alignment.
对于操作系统来说,sys_brk函数修改的是一个特定vma,我们只要保证这个vma始终大页对齐就可以保证通过sys_brk函数申请的内存都会使用大页。通过sys_brk修改vma时,不会改变vma的低地址,只会将vma的高地址来回移动。我们针对这一特征设计优化的算法。在堆初始化时将堆对应vma的低地址设置成大页对齐的,从而保证了一侧对齐。在另一侧,用户设置的堆顶指针不一定是大页对齐的。采用的方法是将vma的高地址往上扩充到大页对齐,并记录用户实际使用的堆顶的位置。通过预先分配内存的方法实现了高地址的大页对齐。同时记录了用户实际使用的位置,用户见到的,是自己实际使用的位置,而不知道可能已经分配了更多内存。以后用户申请内存时,会从上次堆顶的位置开始申请,如果有一部分内存已经实现分配,那么可以直接使用,不会造成浪费。这一机制对上层是透明的,可以在不需要修改上层应用的情况下实现。For the operating system, the sys_brk function modifies a specific vma. As long as we ensure that this vma is always aligned with large pages, we can ensure that the memory requested through the sys_brk function will use large pages. When vma is modified through sys_brk, the low address of vma will not be changed, only the high address of vma will be moved back and forth. We design optimized algorithms for this feature. When the heap is initialized, the low address of the heap corresponding to the vma is set to be large-page aligned, thereby ensuring one-side alignment. On the other side, the top-of-heap pointer set by the user is not necessarily hugepage-aligned. The method adopted is to expand the high address of vma up to the large page alignment, and record the position of the top of the heap actually used by the user. Large page alignment of high addresses is achieved by pre-allocating memory. At the same time, the location actually used by the user is recorded. What the user sees is the location actually used by the user, without knowing that more memory may have been allocated. When users apply for memory in the future, they will apply from the top of the heap last time. If some memory has been allocated, it can be used directly without causing waste. This mechanism is transparent to the upper layer and can be implemented without modifying the upper layer application.
总地来说,通过控制堆起始地址和每次堆延伸的长度,可以保证堆所对应的vma完全使用大页。Generally speaking, by controlling the starting address of the heap and the length of each heap extension, it is possible to ensure that the vma corresponding to the heap completely uses huge pages.
与现有技术相比,本发明的积极效果为:Compared with prior art, positive effect of the present invention is:
该发明以较小的代价提升的计算机系统的性能。优化的代价主要是扩充大页虚拟地址时,额外的内存开销。它能有效地提升程序的性能。原有的Linux模块的限制使得大页使用率较低,在一些情况下,甚至完全不会使用大页。通过该优化,充分提升了大页的使用率,进而提升程序的性能。该优化是针对内核内存管理的优化,因此该优化影响的程序范围很广,几乎所有的程序都能通过该优化受益。下面将以实验说明发明的优势。The invention improves the performance of the computer system at a relatively small cost. The cost of optimization is mainly the additional memory overhead when expanding the virtual address of the large page. It can effectively improve the performance of the program. The limitation of the original Linux module makes the huge page usage rate low, and in some cases, even the huge page is not used at all. Through this optimization, the utilization rate of huge pages is fully improved, thereby improving the performance of the program. This optimization is optimized for kernel memory management, so this optimization affects a wide range of programs, and almost all programs can benefit from this optimization. The advantages of the invention will be illustrated with experiments below.
使用SPECCPU2006作为测试程序集,对Linux大页模块和优化进行了测试,结果如表1:Using SPECCPU2006 as the test program set, the Linux huge page module and optimization were tested, and the results are shown in Table 1:
表1为测试结果对比表Table 1 is a comparison table of test results
表1中perlbench到xalancbmk是spec2006的12组测试程序,右边是相应的运行时间。我们通过检测linux内核中page fault的处理方式来估计大页的覆盖率。Native是开启Linux自带的Transparenet Hugepage Support功能时的运行情况,optimized为仅修改Linux内核的效果,作为参照还有Linux禁用Transparenet Hugepage Support功能时的运行数据None。In Table 1, perlbench to xalancbmk are 12 groups of test programs of spec2006, and the corresponding running time is on the right. We estimate the coverage of huge pages by examining how page faults are handled in the Linux kernel. Native is the running status when the Transparent Hugepage Support function that comes with Linux is turned on, and optimized is the effect of only modifying the Linux kernel. As a reference, there is also the running data None when Linux disables the Transparent Hugepage Support function.
数据显示,优化能在原有基础上进一步提升程序性能,对于bzip2,gcc这两个样例原有的机制几乎不能使程序性能得到改善,但是添加了优化后,性能得到了提升。对于omnetpp,astar,xalancbmk这三个例子,原有的大页模块能有一些提升,在增加了我这一部分的优化后,提升的幅度更大,超过了原来的幅度。其它的例子,虽有大页对齐的优化并没有使性能得到提升,但是也没有让性能下降,所以可以断定,大页对齐这一优化不会带来额外的性能开销。The data shows that optimization can further improve the performance of the program on the original basis. For the two samples of bzip2 and gcc, the original mechanism can hardly improve the performance of the program, but after adding optimization, the performance has been improved. For the three examples of omnetpp, astar, and xalancbmk, the original huge page module can have some improvements. After adding my part of the optimization, the improvement is even greater, exceeding the original range. In other examples, although the optimization of large page alignment does not improve performance, it does not degrade performance, so it can be concluded that the optimization of large page alignment will not bring additional performance overhead.
缺页中断是影响程序性能的一个重要指标。系统进行缺页中断的处理将会产生额外的开销,使用大页能有效地减少缺页中断。这里测量了每一个单独的测试程序缺页中断的产生次数。Page fault interrupt is an important indicator affecting program performance. The processing of page fault interruption by the system will generate additional overhead, and the use of large pages can effectively reduce page fault interruption. The number of page faults generated by each individual test program is measured here.
图2中纵坐标为缺页中断的次数,因为不同benchmark访存特性不同,存在较大差异,因此纵坐标为指数递增。横坐标为所有benchmark的分别统计。共统计三项,分别是没有使用Linux内核Transparent huge page模块时的缺页中断数目;使用该模块之后;优化后的缺页中断数。数据表明,经过优化之后,缺页中断的数目明显降低,基本上在10000次以内,能有效降低程序产生缺页中断的次数。The ordinate in Figure 2 is the number of page fault interrupts. Because different benchmarks have different memory access characteristics, there are large differences, so the ordinate is exponentially increasing. The abscissa is the statistics of all benchmarks. A total of three statistics are counted, namely, the number of page fault interrupts when the Linux kernel Transparent huge page module is not used; after using this module; and the number of page fault interrupts after optimization. The data shows that after optimization, the number of page fault interrupts is significantly reduced, basically within 10,000 times, which can effectively reduce the number of page fault interrupts generated by the program.
以上分析说明,该发明的优化针有很好的效果,能有效地减少缺页中断的次数,提高大页使用率,进而提升程序性能。The above analysis shows that the optimization pin of the invention has a very good effect, can effectively reduce the number of page fault interrupts, improve the utilization rate of large pages, and then improve program performance.
附图说明Description of drawings
图1为内存申请函数调用示意图。Figure 1 is a schematic diagram of calling a memory application function.
图2为大页优化后前后缺页中断数比较。Figure 2 is a comparison of the number of page faults before and after large page optimization.
具体实施方式detailed description
下面介绍在Linux操作系统上实现大页优化的方法。该方法基于Linux内核版本3.6.3,对glibc的修改基于glibc-2.17,适用于64位x86体系结构的机器。该方法是实现大页优化的一个具体实例。The following describes how to implement large page optimization on the Linux operating system. The method is based on Linux kernel version 3.6.3, and the modification of glibc is based on glibc-2.17, which is suitable for machines with 64-bit x86 architecture. This method is a concrete example of implementing huge page optimization.
修改记录进程虚拟地址空间的数据结构mm_struct,在结构内增加成员变量longallocate_brk,记录为进程已经分配虚拟地址的堆顶位置。在进程虚拟地址空间初始化时,将值设置为0。Modify the data structure mm_struct that records the virtual address space of the process, and add the member variable longallocate_brk in the structure to record the heap top position where the process has allocated a virtual address. Sets the value to 0 when the process virtual address space is initialized.
在Linux系统中,include/linux/mm_types.h文件定义了和内存相关的数据结构,其中包括mm_struct。Linux系统中kernel/fork.c文件实现了新建进程的相关函数,其中在mm_init函数中会将mm_struct结构进行初始化,在这个结构中增加语句mm->allocated_brk=0,可以将新增的堆顶位置变量初始化为0。In the Linux system, the include/linux/mm_types.h file defines memory-related data structures, including mm_struct. The kernel/fork.c file in the Linux system implements the functions related to the new process. In the mm_init function, the mm_struct structure will be initialized. In this structure, the statement mm->allocated_brk=0 can be added to the newly added heap top position. Variables are initialized to 0.
令堆所对应的vma起始地址大页对齐:找到初始化brk段起始地址的函数。一般对于不同的体系结构,有不同的初始化函数。对于x86架构,在实现该体系结构特有功能的文件arch/x86/kernel/process.c中,实现了初始化brk段的功能函数arch_randomize_brk。在该函数中,将调用randomize_range函数得到的返回值向上大页对齐,可以使得堆所对应的vma起始地址大页对齐。Align the vma start address corresponding to the heap with large pages: find the function that initializes the start address of the brk segment. Generally, there are different initialization functions for different architectures. For the x86 architecture, in the file arch/x86/kernel/process.c that implements the unique functions of this architecture, the function arch_randomize_brk that initializes the brk segment is implemented. In this function, the return value obtained by calling the randomize_range function is aligned to the upward large page, which can make the corresponding vma start address of the heap large page aligned.
修改执行sys_brk函数的逻辑:在mm/mmap.c文件中定义了管理内存的操作,其中定义的函数SYSCALL_DEFINE1(brk,unsigned long,brk)实现了系统调用sys_brk的功能。Modify the logic of executing the sys_brk function: the memory management operation is defined in the mm/mmap.c file, and the function SYSCALL_DEFINE1(brk, unsigned long, brk) defined in it implements the function of the system call sys_brk.
该函数会计算当前堆顶位置和调用函数时,用户要求达到的堆顶位置。得到两个堆顶位置后,操作系统决定向上扩展堆或者收缩堆空间并释放内存。This function will calculate the current heap top position and the heap top position requested by the user when calling the function. After getting the two top positions of the heap, the operating system decides to expand the heap up or shrink the heap space and free the memory.
在语句in statement
newbrk=PAGE_ALIGN(brk);newbrk = PAGE_ALIGN(brk);
oldbrk=PAGE_ALIGN(mm->brk);oldbrk=PAGE_ALIGN(mm->brk);
中计算了原来的堆顶位置和新的堆顶位置。增加大页对齐的限制后,需要重新计算堆顶的位置。在上述语句后添加如下语句:The original heap top position and the new heap top position are calculated in . After increasing the huge page alignment limit, the position of the top of the heap needs to be recalculated. Add the following statement after the above statement:
newbrk=(brk+SUPERPAGE_MASK)&PMD_MASK;newbrk=(brk+SUPERPAGE_MASK)&PMD_MASK;
if(likely(mm->allocated_brk))oldbrk=mm->allocated_brk;if(likely(mm->allocated_brk))oldbrk=mm->allocated_brk;
新的堆顶位置为用户要求的堆顶位置向上大页对齐之后的位置,原来的堆顶位置为上次调用sys_brk函数已经分配内存的堆顶位置。The new heap top position is the heap top position required by the user after the large page alignment, and the original heap top position is the heap top position where the memory allocated by the sys_brk function was called last time.
如果调用sys_brk函数修改堆成功,函数会进入label set_brk所管辖的语句,用于设置mm_struct的相关成员变量,在这里增加语句mm->allocated_brk=newbrk更新堆顶指针。If calling the sys_brk function to modify the heap successfully, the function will enter the statement governed by label set_brk to set the relevant member variables of mm_struct, and here add the statement mm->allocated_brk=newbrk to update the top pointer of the heap.
在label set_brk内还需要增加逻辑:判断这次sys_brk调用用户是不是要缩小堆顶,如果是,检查用户缩小堆顶需要释放的虚拟内存是否分配了物理内存,如果分配了,将这部分内存清0。增加这一步是因为上层应用程序默认:调用sys_brk新申请的内存会全部初始化为0。在未修改的版本,因为所有的操作是以页为单位的,每次do_munmap时,会把这个区域内所有虚拟地址到物理地址的映射全部释放掉,而do_brk新分配的区域如果访问,一定会发生pagefault,进入Linux内核的内存分配程序分配物理内存时,Linux内核会将待分配的物理内存清0。在修改后的版本中,因为大页对齐,有部分内存不会被释放掉,需要检查这部分是否被分配物理内存了,如果被分配,需要清0。It is necessary to add logic in label set_brk: determine whether the user wants to shrink the top of the heap when calling sys_brk this time. If so, check whether the virtual memory that needs to be released by the user to shrink the top of the heap has allocated physical memory. If so, clear this part of memory 0. This step is added because the upper-layer application defaults: calling sys_brk newly requested memory will all be initialized to 0. In the unmodified version, because all operations are performed in units of pages, every time do_munmap will release all the mappings from virtual addresses to physical addresses in this area, and if the newly allocated area of do_brk is accessed, it will definitely be When a pagefault occurs and the memory allocation program of the Linux kernel allocates physical memory, the Linux kernel will clear the physical memory to be allocated to 0. In the modified version, some memory will not be released due to large page alignment. It is necessary to check whether this part has been allocated physical memory. If it is allocated, it needs to be cleared to 0.
提供一个实现方式以供参考:在label set_brk内加入以下代码Provide an implementation for reference: add the following code in label set_brk
这段代码首先检查是不是堆顶收缩的情况。如果满足条件,进入if语句,if语句中find_vma找到按原语义应该释放内存的vma区局,以页为单位,利用follow_page函数检查虚拟地址是否分配了物理内存,如果分配了内存,则清0。This code first checks to see if the top of the heap has shrunk. If the condition is met, enter the if statement. In the if statement, find_vma finds the vma area that should release the memory according to the original semantics. In units of pages, use the follow_page function to check whether the virtual address has allocated physical memory. If memory is allocated, clear it to 0.
增加大页优化后,运行流程描述:After adding large page optimization, run the process description:
在进程出始化时,额外对mm->allocated_brk变量清0。然后调用arch_randomize_brk函数初始化堆的起始地址。修改堆起始设置函数arch_randomize_brk,在函数返回时,将原来的返回地址向上大页对齐后,再减去16K,将计算得到的新地址作为函数返回值。以后程序调用堆顶设置函数sys_brk时,传入堆顶参数brk,表示进程要求设置的堆顶位置。通过这样的调整,如果函数使用的堆内存很少,那么这一部分内存低于大页对齐位置,会使用小页。可以有效避免给内存使用很少的程序分配过多内存造成浪费。对于内存使用量较大的程序,其余部分会分配大页,总体上保证较好的大页使用率。When the process is initialized, the mm->allocated_brk variable is additionally cleared to 0. Then call the arch_randomize_brk function to initialize the starting address of the heap. Modify the heap initial setting function arch_randomize_brk. When the function returns, align the original return address to the large page, then subtract 16K, and use the calculated new address as the function return value. When the program calls the heap top setting function sys_brk later, the heap top parameter brk is passed in, indicating the heap top position that the process requires to set. Through this adjustment, if the heap memory used by the function is very small, then this part of the memory is lower than the large page alignment position, and the small page will be used. It can effectively avoid the waste of allocating too much memory to programs that use little memory. For programs with large memory usage, large pages will be allocated for the rest, which generally ensures a better large page usage rate.
令mm->brk表示前一次进程要求设置的堆顶位置。Let mm->brk represent the heap top position requested by the previous process.
令newbrk表示进程要求设置的堆顶位置向上大页对齐之后的值。Let newbrk represent the heap top position required by the process to be set to the value after large page alignment.
变量oldbrk记录已经分配内存的最高堆地址。如果mm->allocated_brk变量值为0,表示堆还没有分配过内存,令oldbrk为堆的起始地址;否则令oldbrk=mm->allocated_brk。The variable oldbrk records the highest heap address of the allocated memory. If the value of the mm->allocated_brk variable is 0, it means that the heap has not allocated memory yet, and let oldbrk be the starting address of the heap; otherwise, let oldbrk=mm->allocated_brk.
比较oldbrk和newbrk的值:Compare the values of oldbrk and newbrk:
如果oldbrk<newbrk说明需要扩充vma,调用do_brk函数增大vma。If oldbrk<newbrk indicates that vma needs to be expanded, call the do_brk function to increase vma.
如果oldbrk>newbrk说明需要释放内存,调用do_munmap函数释放newbrk到oldbrk之间的内存。If oldbrk>newbrk indicates that memory needs to be released, call the do_munmap function to release the memory between newbrk and oldbrk.
比较brk和mm->brk的值,如果brk<mm->brk,检查brk到min(mm->brk,newbrk)之间的虚拟地址,如果被分配了物理内存,那么将这段内存清0。Compare the value of brk and mm->brk, if brk<mm->brk, check the virtual address between brk and min(mm->brk, newbrk), if physical memory is allocated, clear this memory to 0 .
系统调用成功,更新数据结构,令mm->brk=brk,mm->allocated_brk=newbrk,函数结束。The system call is successful, update the data structure, set mm->brk=brk, mm->allocated_brk=newbrk, and the function ends.
通过以上步骤可以实现堆的大页优化。Through the above steps, the large page optimization of the heap can be realized.
对于其他情况,体系结构只要支持不同大小的页映射,并且系统实现了动态内存分配,就可以使用本发明提出的方法进行优化。凡是通过设置堆顶指针的方式分配内存的系统,就可以利用本发明的方法,设置一个辅助变量,记录分配内存的位置;设置内存段起始位置大页对齐;并在增长和回收内存时重新计算对齐之后的地址,就能实现该优化,达到提升大页使用率的目的。For other cases, as long as the architecture supports page mappings of different sizes and the system realizes dynamic memory allocation, the method proposed by the present invention can be used for optimization. Any system that allocates memory by setting the heap top pointer can use the method of the present invention to set an auxiliary variable to record the location of allocated memory; set the starting position of the memory segment to align large pages; This optimization can be realized by calculating the address after alignment, and the purpose of improving the utilization rate of large pages can be achieved.
我们提出的优化操作大页使用率的方案,其主要技术特征是通过在操作系统中生成大页对齐的vma的方式,保证大页的使用率。其技术方案可以运用在所有支持分页机制的操作系统中。其主要方法是在堆部分限制堆生长和释放的粒度,达到令操作系统生成大页对齐vma,从而提升系统性能的目的。该方法有选择性地扩充内存,使得在提升系统性能的同时,有效避免了扩充内存可能带来的浪费。凡是通过合理扩充内存分配粒度,达到操作系统使用大页要求,从而提升大页使用率的方法,都在本专利保护范围内。The main technical feature of the scheme we propose to optimize the utilization of large pages is to ensure the utilization of large pages by generating a large-page-aligned vma in the operating system. The technical solution thereof can be applied to all operating systems that support the paging mechanism. The main method is to limit the granularity of heap growth and release in the heap part, so as to make the operating system generate large page aligned vma, thereby improving system performance. This method selectively expands the memory, so that while improving the system performance, it effectively avoids the waste that may be caused by expanding the memory. Any method that reasonably expands the granularity of memory allocation to meet the requirements of the operating system to use large pages, thereby increasing the utilization rate of large pages, is within the scope of protection of this patent.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410146873.3ACN103984599B (en) | 2014-04-14 | 2014-04-14 | Method for improving utilization rate of large pages of operating system |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410146873.3ACN103984599B (en) | 2014-04-14 | 2014-04-14 | Method for improving utilization rate of large pages of operating system |
| Publication Number | Publication Date |
|---|---|
| CN103984599A CN103984599A (en) | 2014-08-13 |
| CN103984599Btrue CN103984599B (en) | 2017-05-17 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201410146873.3AActiveCN103984599B (en) | 2014-04-14 | 2014-04-14 | Method for improving utilization rate of large pages of operating system |
| Country | Link |
|---|---|
| CN (1) | CN103984599B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106326144B (en)* | 2015-06-24 | 2019-08-06 | 龙芯中科技术有限公司 | Method for reading data and device based on big page mapping |
| CN105893269B (en)* | 2016-03-31 | 2018-08-21 | 武汉虹信技术服务有限责任公司 | EMS memory management process under a kind of linux system |
| CN106250239A (en)* | 2016-07-26 | 2016-12-21 | 汉柏科技有限公司 | The using method of memory cache cache and device in a kind of network equipment |
| CN106843906B (en)* | 2017-02-22 | 2021-02-02 | 苏州浪潮智能科技有限公司 | Method and server for adjusting system page size |
| CN108664419A (en)* | 2018-04-03 | 2018-10-16 | 郑州云海信息技术有限公司 | A kind of method and its device of determining memory big page number |
| CN110109761B (en)* | 2019-05-11 | 2021-06-04 | 广东财经大学 | Method and system for managing kernel memory of operating system in user mode |
| CN113326094B (en)* | 2020-04-08 | 2025-08-01 | 阿里巴巴集团控股有限公司 | Memory mapping method and device of host, electronic equipment and computer readable medium |
| CN112035379B (en)* | 2020-09-09 | 2022-06-14 | 浙江大华技术股份有限公司 | Method and device for using storage space, storage medium and electronic device |
| CN112596913B (en)* | 2020-12-29 | 2022-08-02 | 海光信息技术股份有限公司 | Method, device, user equipment, and storage medium for improving the performance of transparent large pages in memory |
| CN112905497B (en)* | 2021-02-20 | 2022-04-22 | 迈普通信技术股份有限公司 | Memory management method and device, electronic equipment and storage medium |
| CN113687873B (en)* | 2021-07-30 | 2024-02-23 | 济南浪潮数据技术有限公司 | Large page memory configuration method, system and related device in cloud service page table |
| CN115061954B (en)* | 2022-08-18 | 2022-11-29 | 统信软件技术有限公司 | Missing page interrupt processing method, computing device and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8417913B2 (en)* | 2003-11-13 | 2013-04-09 | International Business Machines Corporation | Superpage coalescing which supports read/write access to a new virtual superpage mapping during copying of physical pages |
| US8788739B2 (en)* | 2009-06-29 | 2014-07-22 | Hewlett-Packard Development Company, L.P. | Hypervisor-based management of local and remote virtual memory pages |
| CN102446136B (en)* | 2010-10-14 | 2014-09-03 | 无锡江南计算技术研究所 | Self-adaptive large-page allocation method and device |
| CN103019949B (en)* | 2012-12-27 | 2015-08-19 | 华为技术有限公司 | A kind of distribution method and device writing merging Attribute Memory space |
| CN103257929B (en)* | 2013-04-18 | 2016-03-16 | 中国科学院计算技术研究所 | A kind of virutal machine memory mapping method and system |
| Publication number | Publication date |
|---|---|
| CN103984599A (en) | 2014-08-13 |
| Publication | Publication Date | Title |
|---|---|---|
| CN103984599B (en) | Method for improving utilization rate of large pages of operating system | |
| CN102819497B (en) | A kind of memory allocation method, Apparatus and system | |
| US9529611B2 (en) | Cooperative memory resource management via application-level balloon | |
| US8176282B2 (en) | Multi-domain management of a cache in a processor system | |
| US8095736B2 (en) | Methods and systems for dynamic cache partitioning for distributed applications operating on multiprocessor architectures | |
| EP2581828B1 (en) | Method for creating virtual machine, virtual machine monitor and virtual machine system | |
| EP2472412A1 (en) | Explicitly regioned memory organization in a network element | |
| TWI641947B (en) | Method and apparatus for managing address translation and caching | |
| TWI648625B (en) | Managing address-independent page attributes | |
| CN112596913B (en) | Method, device, user equipment, and storage medium for improving the performance of transparent large pages in memory | |
| US10013360B2 (en) | Managing reuse information with multiple translation stages | |
| CN113760560A (en) | An inter-process communication method and inter-process communication device | |
| US20230418737A1 (en) | System and method for multimodal computer address space provisioning | |
| CN117093371B (en) | Cache resource allocation method, device, electronic device and storage medium | |
| US11403213B2 (en) | Reducing fragmentation of computer memory | |
| US8375194B2 (en) | System and method for freeing memory | |
| KR20120070326A (en) | A apparatus and a method for virtualizing memory | |
| US12326809B2 (en) | Dynamically allocatable physically addressed metadata storage | |
| CN118395421A (en) | Kernel data isolation method and system based on multi-kernel page table template | |
| US20190146693A1 (en) | Address space access control | |
| US11748269B2 (en) | Cache tuning device, cache tuning method, and cache tuning program | |
| CN114860439A (en) | Memory allocation method, host machine, distributed system and program product | |
| Siavashi et al. | Phoenix--A Novel Technique for Performance-Aware Orchestration of Thread and Page Table Placement in NUMA Systems | |
| Böhnert et al. | A dynamic virtual memory management under real-time constraints | |
| CN107273188A (en) | A kind of virtual machine central processing unit CPU binding method and device |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |