技术领域Technical Field
本发明公开一种适用GPGPU的动态共享内存多路复用方法及装置,涉及内存资源分配技术领域。The invention discloses a dynamic shared memory multiplexing method and device suitable for GPGPU, and relates to the technical field of memory resource allocation.
背景技术Background Art
通用图形处理单元(GPGPU)作为一种高效的并行计算平台,广泛应用于科学计算、图形处理、大数据分析等领域。然而,GPGPU架构中的共享内存管理方式往往采用静态分配策略,这种策略在多线程并发访问时存在效率低下的问题,限制了GPGPU的性能发挥。现有的GPGPU设计中,共享内存的分配与线程块的生命周期绑定,即便实际使用时间远小于分配时间,一旦分配即占用整个执行期间,这导致共享内存资源的利用率不高,且难以支持更多线程块的同时执行,影响了GPGPU的线程级并行性和整体计算吞吐量。As an efficient parallel computing platform, general-purpose graphics processing units (GPGPUs) are widely used in scientific computing, graphics processing, big data analysis and other fields. However, the shared memory management method in the GPGPU architecture often adopts a static allocation strategy, which is inefficient when accessed by multiple threads concurrently, limiting the performance of GPGPU. In existing GPGPU designs, the allocation of shared memory is bound to the life cycle of thread blocks. Even if the actual usage time is much less than the allocation time, once allocated, it will occupy the entire execution period. This leads to low utilization of shared memory resources and makes it difficult to support the simultaneous execution of more thread blocks, affecting the thread-level parallelism and overall computing throughput of GPGPU.
发明内容Summary of the invention
本发明针对现有技术的问题,提供一种适用GPGPU的动态共享内存多路复用方法及装置,能够在不增加硬件资源的前提下,显著提高共享内存的利用效率和GPGPU的计算吞吐量。通过本发明,可以更灵活地管理和调度共享内存资源,允许多个线程块高效共享和复用有限的共享内存空间,从而突破性能瓶颈。The present invention aims to solve the problems of the prior art and provides a method and device for dynamic shared memory multiplexing suitable for GPGPU, which can significantly improve the utilization efficiency of shared memory and the computing throughput of GPGPU without increasing hardware resources. Through the present invention, shared memory resources can be managed and scheduled more flexibly, allowing multiple thread blocks to efficiently share and reuse limited shared memory space, thereby breaking through performance bottlenecks.
本发明提出的具体方案是:The specific scheme proposed by the present invention is:
本发明提供一种适用GPGPU的动态共享内存多路复用方法,包括:The present invention provides a dynamic shared memory multiplexing method applicable to GPGPU, comprising:
步骤1:基于GPGPU框架,根据各线程块发出的内存任务请求,进行线程块的内存需求分析:Step 1: Based on the GPGPU framework, perform memory requirement analysis on thread blocks according to the memory task requests issued by each thread block:
步骤11:允许具有相同数据访问需求的线程块在共享内存池中共享可复用的共享内存分区,Step 11: Allow thread blocks with the same data access requirements to share reusable shared memory partitions in the shared memory pool.
步骤12:针对没有相同数据访问需求的线程块,分配读写锁用于在给定时刻锁定访问共享内存池中非复用的共享内存分区,其中利用读写锁索引表分配线程块读写锁,当读写锁索引表内对应的共享内存分段信号被拉高,表示所述共享内存分区已有线程块被分配读写锁,需要等待读写锁释放,当读写锁索引表内对应的共享内存分区信号被拉低,表示所述共享内存分区对应的读写锁已被释放;Step 12: for thread blocks that do not have the same data access requirements, a read-write lock is allocated to lock access to a non-reused shared memory partition in the shared memory pool at a given time, wherein the thread block read-write lock is allocated using a read-write lock index table. When the corresponding shared memory segment signal in the read-write lock index table is pulled high, it indicates that a thread block in the shared memory partition has been allocated a read-write lock and needs to wait for the read-write lock to be released. When the corresponding shared memory partition signal in the read-write lock index table is pulled low, it indicates that the read-write lock corresponding to the shared memory partition has been released.
步骤2:当线程块完成内存任务请求,释放共享内存并提供给其他线程块使用。Step 2: When the thread block completes the memory task request, the shared memory is released and provided to other thread blocks.
进一步,所述的一种适用GPGPU的动态共享内存多路复用方法的步骤1中还包括步骤10:根据线程块的内存需求,判断内存任务请求对应的地址空间是否存在有效数据以及共享内存池是否有足够的分配空间,若是则进行步骤11和步骤12。Furthermore, step 1 of the dynamic shared memory multiplexing method applicable to GPGPU also includes step 10: judging whether there is valid data in the address space corresponding to the memory task request and whether there is sufficient allocation space in the shared memory pool according to the memory requirements of the thread block, and if so, proceeding to steps 11 and 12.
进一步,所述的一种适用GPGPU的动态共享内存多路复用方法的步骤10中利用共享内存记分板记录共享内存中地址空间的有效信号,当地址空间被写入数据时,有效信号被拉高,视为地址空间存在有效数据,否则视为地址空间无有效数据,Furthermore, in step 10 of the dynamic shared memory multiplexing method applicable to GPGPU, a shared memory scoreboard is used to record a valid signal of the address space in the shared memory. When data is written into the address space, the valid signal is pulled high, and it is regarded that there is valid data in the address space. Otherwise, it is regarded that there is no valid data in the address space.
同时,利用寄存器记录共享内存的剩余空间大小,用于判断共享内存剩余空间是否足够分配。At the same time, the register is used to record the size of the remaining space of the shared memory to determine whether the remaining space of the shared memory is sufficient for allocation.
进一步,所述的一种适用GPGPU的动态共享内存多路复用方法的步骤11中根据内存任务请求中线程块访问共享内存的地址判断线程块是否具有相同数据访问需求,若线程块之间访问共享内存的地址相同,则具有相同数据访问需求,否则线程块之间没有相同数据访问需求。Furthermore, in step 11 of the dynamic shared memory multiplexing method applicable to GPGPU, it is determined whether the thread blocks have the same data access requirements based on the addresses of the thread blocks accessing the shared memory in the memory task request. If the addresses of the thread blocks accessing the shared memory are the same, they have the same data access requirements; otherwise, the thread blocks do not have the same data access requirements.
进一步,所述的一种适用GPGPU的动态共享内存多路复用方法的步骤2中将释放的共享内存的零散地址空间重新映射,使地址空间集中在连续的共享内存分区中,以便后续共享内存池分配连续的地址空间,以及进行连续地址空间的读取。Furthermore, in step 2 of the dynamic shared memory multiplexing method applicable to GPGPU, the released scattered address space of the shared memory is remapped so that the address space is concentrated in a continuous shared memory partition, so that the shared memory pool can subsequently allocate a continuous address space and read the continuous address space.
本发明还提供一种适用GPGPU的动态共享内存多路复用装置,包括需求分配模块和资源调整模块,The present invention also provides a dynamic shared memory multiplexing device suitable for GPGPU, including a demand allocation module and a resource adjustment module.
需求分配模块基于GPGPU框架,根据各线程块发出的内存任务请求,执行线程块的内存需求分析:The demand allocation module is based on the GPGPU framework and performs memory demand analysis of thread blocks according to the memory task requests issued by each thread block:
步骤11:允许具有相同数据访问需求的线程块在共享内存池中共享可复用的共享内存分区,Step 11: Allow thread blocks with the same data access requirements to share reusable shared memory partitions in the shared memory pool.
步骤12:针对没有相同数据访问需求的线程块,分配读写锁用于在给定时刻锁定访问共享内存池中非复用的共享内存分区,其中利用读写锁索引表分配线程块读写锁,当读写锁索引表内对应的共享内存分段信号被拉高,表示所述共享内存分区已有线程块被分配读写锁,需要等待读写锁释放,当读写锁索引表内对应的共享内存分区信号被拉低,表示所述共享内存分区对应的读写锁已被释放;Step 12: for thread blocks that do not have the same data access requirements, a read-write lock is allocated to lock access to a non-reused shared memory partition in the shared memory pool at a given time, wherein the thread block read-write lock is allocated using a read-write lock index table. When the corresponding shared memory segment signal in the read-write lock index table is pulled high, it indicates that a thread block in the shared memory partition has been allocated a read-write lock and needs to wait for the read-write lock to be released. When the corresponding shared memory partition signal in the read-write lock index table is pulled low, it indicates that the read-write lock corresponding to the shared memory partition has been released.
当线程块完成内存任务请求,资源调整模块释放共享内存并提供给其他线程块使用。When a thread block completes a memory task request, the resource adjustment module releases the shared memory and provides it to other thread blocks.
进一步,所述的一种适用GPGPU的动态共享内存多路复用装置的需求分配模块还执行步骤10:根据线程块的内存需求,判断内存任务请求对应的地址空间是否存在有效数据以及共享内存池是否有足够的分配空间,若是则执行步骤11和步骤12。Furthermore, the demand allocation module of the dynamic shared memory multiplexing device suitable for GPGPU also executes step 10: according to the memory requirements of the thread block, it is determined whether there is valid data in the address space corresponding to the memory task request and whether the shared memory pool has sufficient allocation space, and if so, steps 11 and 12 are executed.
进一步,所述的一种适用GPGPU的动态共享内存多路复用装置的需求分配模块执行步骤10时,利用共享内存记分板记录共享内存中地址空间的有效信号,当地址空间被写入数据时,有效信号被拉高,视为地址空间存在有效数据,否则视为地址空间无有效数据,Furthermore, when the demand allocation module of the dynamic shared memory multiplexing device suitable for GPGPU executes step 10, the shared memory scoreboard is used to record the valid signal of the address space in the shared memory. When data is written into the address space, the valid signal is pulled high, and it is regarded that there is valid data in the address space. Otherwise, it is regarded that there is no valid data in the address space.
同时,利用寄存器记录共享内存的剩余空间大小,用于判断共享内存剩余空间是否足够分配。At the same time, the register is used to record the size of the remaining space of the shared memory to determine whether the remaining space of the shared memory is sufficient for allocation.
进一步,所述的一种适用GPGPU的动态共享内存多路复用装置的需求分配模块执行步骤11时,根据内存任务请求中线程块访问共享内存的地址判断线程块是否具有相同数据访问需求,若线程块之间访问共享内存的地址相同,则具有相同数据访问需求,否则线程块之间没有相同数据访问需求。Furthermore, when the demand allocation module of the dynamic shared memory multiplexing device suitable for GPGPU executes step 11, it determines whether the thread blocks have the same data access requirements based on the addresses of the thread blocks accessing the shared memory in the memory task request. If the addresses of the thread blocks accessing the shared memory are the same, they have the same data access requirements; otherwise, the thread blocks do not have the same data access requirements.
进一步,所述的一种适用GPGPU的动态共享内存多路复用装置的资源调整模块将释放的共享内存的零散地址空间重新映射,使地址空间集中在连续的共享内存分区中,以便后续共享内存池分配连续的地址空间,以及进行连续地址空间的读取。Furthermore, the resource adjustment module of the dynamic shared memory multiplexing device suitable for GPGPU remaps the released scattered address space of the shared memory so that the address space is concentrated in a continuous shared memory partition, so that the shared memory pool can subsequently allocate continuous address space and read the continuous address space.
本发明的有益之处是:The benefits of the present invention are:
本发明复用并动态分配内存资源,确保了只有实际需要时才进行内存分配,有效避免了内存的不必要占用和浪费,实现了内存资源的高效流转。此外,快速释放机制确保了线程块完成计算任务后立即回收其占用的内存资源,显著减少了线程块因等待内存资源而产生的延迟,提升了线程调度的响应速度;The present invention reuses and dynamically allocates memory resources, ensuring that memory is allocated only when actually needed, effectively avoiding unnecessary memory occupation and waste, and achieving efficient circulation of memory resources. In addition, the fast release mechanism ensures that the memory resources occupied by the thread block are immediately recovered after the computing task is completed, significantly reducing the delay caused by the thread block waiting for memory resources and improving the response speed of thread scheduling;
多路复用策略允许多个线程块共享同一块内存资源,减少了因等待内存分配而产生的空闲时间,从而加快了线程块的执行速度,这种时间上的重叠使用减少了内存资源的闲置,使得更多的计算任务能够同时进行,提升了整体的计算吞吐量;The multiplexing strategy allows multiple thread blocks to share the same memory resource, reducing the idle time caused by waiting for memory allocation, thereby speeding up the execution of thread blocks. This overlapping use of time reduces the idleness of memory resources, allowing more computing tasks to be performed simultaneously, and improving the overall computing throughput;
搭配适应性内存调整能力,能够根据应用程序的具体特征和运行时行为,自适应地调整内存分配策略,实现最优的资源利用和性能表现,这种灵活性和扩展性,使得系统能够适应不同规模和特性的GPGPU应用程序;With adaptive memory adjustment capability, the system can adaptively adjust the memory allocation strategy according to the specific characteristics and runtime behavior of the application to achieve optimal resource utilization and performance. This flexibility and scalability enables the system to adapt to GPGPU applications of different scales and characteristics.
整体而言,本发明有效提升了GPGPU的计算效率和内存资源的利用率,显著增加了系统的吞吐量,同时降低了线程块的等待时间,提高了内存的并发访问能力,为用户提供了一个更加高效、稳定和可靠的计算环境。Overall, the present invention effectively improves the computing efficiency of GPGPU and the utilization of memory resources, significantly increases the throughput of the system, reduces the waiting time of thread blocks, improves the concurrent access capability of memory, and provides users with a more efficient, stable and reliable computing environment.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1本发明方法流程示意图。Fig. 1 is a schematic flow chart of the method of the present invention.
图2是本发明涉及的具有相同数据访问需求的线程块共享可复用的共享内存分区流程示意图。FIG2 is a schematic diagram of a flow chart of a reusable shared memory partition shared by thread blocks with the same data access requirements according to the present invention.
图3是本发明涉及的动态分配内存流程示意图。FIG. 3 is a schematic diagram of a dynamic memory allocation process according to the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments so that those skilled in the art can better understand the present invention and implement it, but the embodiments are not intended to limit the present invention.
实施例1:本发明提供一种适用GPGPU的动态共享内存多路复用方法,包括:Embodiment 1: The present invention provides a dynamic shared memory multiplexing method applicable to GPGPU, comprising:
步骤1:基于GPGPU框架,根据各线程块发出的内存任务请求,进行线程块的内存需求分析:Step 1: Based on the GPGPU framework, perform memory requirement analysis on thread blocks according to the memory task requests issued by each thread block:
步骤11:允许具有相同数据访问需求的线程块在共享内存池中共享可复用的共享内存分区。Step 11: Allow thread blocks with the same data access requirements to share reusable shared memory partitions in the shared memory pool.
其中根据内存任务请求中线程块访问共享内存的地址判断线程块是否具有相同数据访问需求,若线程块之间访问共享内存的地址相同,则具有相同数据访问需求,否则线程块之间没有相同数据访问需求。Whether the thread blocks have the same data access requirements is determined based on the addresses of the thread blocks accessing the shared memory in the memory task request. If the addresses of the thread blocks accessing the shared memory are the same, they have the same data access requirements. Otherwise, the thread blocks do not have the same data access requirements.
当其中一个具有相同数据访问需求的线程块读取可复用的共享内存分区的地址空间内数据后,可直接将该数据返回给其他具有相同数据访问需求的线程块,达到多路复用的效果。When one of the thread blocks with the same data access requirement reads the data in the address space of the reusable shared memory partition, the data can be directly returned to other thread blocks with the same data access requirement, thus achieving a multiplexing effect.
步骤12:针对没有相同数据访问需求的线程块,分配读写锁用于在给定时刻锁定访问共享内存池中非复用的共享内存分区,其中利用读写锁索引表分配线程块读写锁,当读写锁索引表内对应的共享内存分段信号被拉高,表示所述共享内存分区已有线程块被分配读写锁,需要等待读写锁释放,当读写锁索引表内对应的共享内存分区信号被拉低,表示所述共享内存分区对应的读写锁已被释放。Step 12: For thread blocks that do not have the same data access requirements, a read-write lock is allocated to lock access to a non-reused shared memory partition in the shared memory pool at a given time, wherein the thread block read-write lock is allocated using a read-write lock index table. When the corresponding shared memory segmentation signal in the read-write lock index table is pulled high, it indicates that a thread block in the shared memory partition has been allocated a read-write lock and needs to wait for the read-write lock to be released. When the corresponding shared memory partition signal in the read-write lock index table is pulled low, it indicates that the read-write lock corresponding to the shared memory partition has been released.
步骤2:当线程块完成内存任务请求,释放共享内存并提供给其他线程块使用。Step 2: When the thread block completes the memory task request, the shared memory is released and provided to other thread blocks.
例如,结合图2,线程块1至线程块n请求内存,对线程块1至线程块n进行需求分析,比如线程块1和线程块n具有相同数据访问需求,在共享内存池中共享可复用的共享内存分区,当线程块1和线程块n均完成请求内存任务,释放相应的共享内存分区,同时可判断是否存在其他线程块需此共享内存分区,若存在线程块2需要此共享内存分区,则分配给线程块2执行内存任务,完成线程块2的内存任务后,释放此共享内存分区,使其回归共享内存池。For example, in conjunction with Figure 2, thread blocks 1 to thread blocks n request memory, and a demand analysis is performed on thread blocks 1 to thread blocks n. For example, thread blocks 1 and thread blocks n have the same data access requirements, and share a reusable shared memory partition in the shared memory pool. When thread blocks 1 and thread blocks n complete the memory request task, the corresponding shared memory partition is released. At the same time, it can be determined whether there are other thread blocks that need this shared memory partition. If there is a thread block 2 that needs this shared memory partition, it is assigned to thread block 2 to perform the memory task. After completing the memory task of thread block 2, this shared memory partition is released and returned to the shared memory pool.
实施例2:基于实施例1,本发明方法的步骤1中还包括步骤10:根据线程块的内存需求,判断内存任务请求对应的地址空间是否存在有效数据以及共享内存池是否有足够的分配空间,若是则进行步骤11和步骤12。Embodiment 2: Based on embodiment 1, step 1 of the method of the present invention also includes step 10: according to the memory requirements of the thread block, determine whether there is valid data in the address space corresponding to the memory task request and whether the shared memory pool has sufficient allocation space, and if so, proceed to steps 11 and 12.
其中可进一步利用共享内存记分板记录共享内存中地址空间的有效信号,当地址空间被写入数据时,有效信号被拉高,视为地址空间存在有效数据,否则视为地址空间无有效数据。其中若对应地址的有效位未置一,视为地址不存在数据,分配失败,报错并返回该内存任务请求,等待下一次请求发送。The shared memory scoreboard can be further used to record the valid signal of the address space in the shared memory. When data is written to the address space, the valid signal is pulled high, which is regarded as valid data in the address space, otherwise it is regarded as no valid data in the address space. If the valid bit of the corresponding address is not set to 1, it is regarded as the address does not have data, the allocation fails, an error is reported, and the memory task request is returned, waiting for the next request to be sent.
同时,利用寄存器记录共享内存的剩余空间大小,用于判断共享内存剩余空间是否足够分配。若剩余大小小于请求空间大小,视为空间不足,分配失败,暂时挂起该内存任务请求,直到共享内存池中有足够的内存大小满足请求,此时回应请求并对发出请求的线程块进行内存分配。At the same time, the register is used to record the remaining space of the shared memory to determine whether the remaining space of the shared memory is sufficient for allocation. If the remaining size is less than the requested space size, it is considered that there is insufficient space and the allocation fails. The memory task request is temporarily suspended until there is enough memory in the shared memory pool to meet the request. At this time, the request is responded to and memory is allocated to the thread block that issued the request.
例如,结合图3,当线程块发起内存访问请求,先判断需访问的共享内存对应的地址空间是否存在有效数据,是则判断共享内存池内可分配的空间大小,空间不足则进行等待,空间满足分配需求则进行内存分配,供线程块执行任务,若任务执行完成则快速释放内存,实现资源回收,同时检测是否存在其他内存访问请求,若无则结束任务,若有则适应性调整内存,比如步骤2中将释放的共享内存的零散地址空间重新映射,使地址空间集中在连续的共享内存分区中,以便后续共享内存池分配连续的地址空间,以及进行连续地址空间的读取。For example, in conjunction with Figure 3, when a thread block initiates a memory access request, it first determines whether there is valid data in the address space corresponding to the shared memory to be accessed. If so, it determines the size of the allocatable space in the shared memory pool. If the space is insufficient, it waits. If the space meets the allocation requirements, it allocates memory for the thread block to execute the task. If the task is completed, the memory is quickly released to achieve resource recovery. At the same time, it detects whether there are other memory access requests. If not, the task is terminated. If so, the memory is adaptively adjusted. For example, in step 2, the scattered address space of the released shared memory is remapped so that the address space is concentrated in a continuous shared memory partition, so that the shared memory pool can subsequently allocate continuous address space and read the continuous address space.
实施例3:本发明还提供一种适用GPGPU的动态共享内存多路复用装置,包括需求分配模块和资源调整模块,Embodiment 3: The present invention also provides a dynamic shared memory multiplexing device suitable for GPGPU, including a demand allocation module and a resource adjustment module.
需求分配模块基于GPGPU框架,根据各线程块发出的内存任务请求,执行线程块的内存需求分析:The demand allocation module is based on the GPGPU framework and performs memory demand analysis of thread blocks according to the memory task requests issued by each thread block:
步骤11:允许具有相同数据访问需求的线程块在共享内存池中共享可复用的共享内存分区,Step 11: Allow thread blocks with the same data access requirements to share reusable shared memory partitions in the shared memory pool.
步骤12:针对没有相同数据访问需求的线程块,分配读写锁用于在给定时刻锁定访问共享内存池中非复用的共享内存分区,其中利用读写锁索引表分配线程块读写锁,当读写锁索引表内对应的共享内存分段信号被拉高,表示所述共享内存分区已有线程块被分配读写锁,需要等待读写锁释放,当读写锁索引表内对应的共享内存分区信号被拉低,表示所述共享内存分区对应的读写锁已被释放;Step 12: for thread blocks that do not have the same data access requirements, a read-write lock is allocated to lock access to a non-reused shared memory partition in the shared memory pool at a given time, wherein the thread block read-write lock is allocated using a read-write lock index table. When the corresponding shared memory segment signal in the read-write lock index table is pulled high, it indicates that a thread block in the shared memory partition has been allocated a read-write lock and needs to wait for the read-write lock to be released. When the corresponding shared memory partition signal in the read-write lock index table is pulled low, it indicates that the read-write lock corresponding to the shared memory partition has been released.
当线程块完成内存任务请求,资源调整模块释放共享内存并提供给其他线程块使用。When a thread block completes a memory task request, the resource adjustment module releases the shared memory and provides it to other thread blocks.
上述装置内各模块间信息交互、执行过程等内容,由于与本发明方法实施例基于同一构思,具体内容可参见本发明方法实施例中的叙述,此处不再赘述。As the information interaction and execution process between the modules in the above-mentioned device are based on the same concept as the embodiment of the method of the present invention, the specific contents can be found in the description of the embodiment of the method of the present invention and will not be repeated here.
同样地,本发明装置复用并动态分配内存资源,确保了只有实际需要时才进行内存分配,有效避免了内存的不必要占用和浪费,实现了内存资源的高效流转。此外,快速释放机制确保了线程块完成计算任务后立即回收其占用的内存资源,显著减少了线程块因等待内存资源而产生的延迟,提升了线程调度的响应速度;Similarly, the device of the present invention reuses and dynamically allocates memory resources, ensuring that memory allocation is performed only when actually needed, effectively avoiding unnecessary memory occupation and waste, and achieving efficient circulation of memory resources. In addition, the fast release mechanism ensures that the memory resources occupied by the thread block are immediately recovered after the computing task is completed, significantly reducing the delay caused by the thread block waiting for memory resources and improving the response speed of thread scheduling;
多路复用策略允许多个线程块共享同一块内存资源,减少了因等待内存分配而产生的空闲时间,从而加快了线程块的执行速度,这种时间上的重叠使用减少了内存资源的闲置,使得更多的计算任务能够同时进行,提升了整体的计算吞吐量;The multiplexing strategy allows multiple thread blocks to share the same memory resource, reducing the idle time caused by waiting for memory allocation, thereby speeding up the execution of thread blocks. This overlapping use of time reduces the idleness of memory resources, allowing more computing tasks to be performed simultaneously, and improving the overall computing throughput;
搭配适应性内存调整能力,能够根据应用程序的具体特征和运行时行为,自适应地调整内存分配策略,实现最优的资源利用和性能表现,这种灵活性和扩展性,使得系统能够适应不同规模和特性的GPGPU应用程序;With adaptive memory adjustment capability, the system can adaptively adjust the memory allocation strategy according to the specific characteristics and runtime behavior of the application to achieve optimal resource utilization and performance. This flexibility and scalability enables the system to adapt to GPGPU applications of different scales and characteristics.
整体而言,本发明装置有效提升了GPGPU的计算效率和内存资源的利用率,显著增加了系统的吞吐量,同时降低了线程块的等待时间,提高了内存的并发访问能力,为用户提供了一个更加高效、稳定和可靠的计算环境。In general, the device of the present invention effectively improves the computing efficiency of GPGPU and the utilization rate of memory resources, significantly increases the throughput of the system, reduces the waiting time of thread blocks, improves the concurrent access capability of memory, and provides users with a more efficient, stable and reliable computing environment.
需要说明的是,上述各流程和各装置结构中不是所有的步骤和模块都是必须的,可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的,可以根据需要进行调整。上述各实施例中描述的系统结构可以是物理结构,也可以是逻辑结构,即,有些模块可能由同一物理实体实现,或者,有些模块可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。It should be noted that not all steps and modules in the above-mentioned processes and device structures are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as needed. The system structure described in the above-mentioned embodiments can be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or some components in multiple independent devices may be implemented together.
以上所述实施例仅是为充分说明本发明而所举的较佳的实施例,本发明的保护范围不限于此。本技术领域的技术人员在本发明基础上所作的等同替代或变换,均在本发明的保护范围之内。本发明的保护范围以权利要求书为准。The above-described embodiments are only preferred embodiments for fully illustrating the present invention, and the protection scope of the present invention is not limited thereto. Equivalent substitutions or changes made by those skilled in the art based on the present invention are within the protection scope of the present invention. The protection scope of the present invention shall be subject to the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411000879.XACN118535356B (en) | 2024-07-25 | 2024-07-25 | Dynamic shared memory multiplexing method and device applicable to GPGPU |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411000879.XACN118535356B (en) | 2024-07-25 | 2024-07-25 | Dynamic shared memory multiplexing method and device applicable to GPGPU |
| Publication Number | Publication Date |
|---|---|
| CN118535356Atrue CN118535356A (en) | 2024-08-23 |
| CN118535356B CN118535356B (en) | 2024-10-29 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411000879.XAActiveCN118535356B (en) | 2024-07-25 | 2024-07-25 | Dynamic shared memory multiplexing method and device applicable to GPGPU |
| Country | Link |
|---|---|
| CN (1) | CN118535356B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119557115A (en)* | 2025-01-26 | 2025-03-04 | 山东浪潮科学研究院有限公司 | GPGPU thread block synchronization method, device and medium based on thread lock |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0784851A (en)* | 1993-09-13 | 1995-03-31 | Toshiba Corp | Shared data management method |
| CN101551761A (en)* | 2009-04-30 | 2009-10-07 | 浪潮电子信息产业股份有限公司 | Method for sharing stream memory of heterogeneous multi-processor |
| JP2010061522A (en)* | 2008-09-05 | 2010-03-18 | Internatl Business Mach Corp <Ibm> | Computer system for permitting exclusive access to shared data, method for the computer system, and computer readable recording medium |
| WO2018119951A1 (en)* | 2016-12-29 | 2018-07-05 | 深圳前海达闼云端智能科技有限公司 | Gpu virtualization method, device, system, and electronic apparatus, and computer program product |
| KR20180099420A (en)* | 2017-02-27 | 2018-09-05 | 한국과학기술원 | Data processor proceeding of accelerated synchronization between central processing unit and graphics processing unit |
| CN109298935A (en)* | 2018-09-06 | 2019-02-01 | 华泰证券股份有限公司 | A kind of method and application of the multi-process single-write and multiple-read without lock shared drive |
| CN109413432A (en)* | 2018-07-03 | 2019-03-01 | 北京中科睿芯智能计算产业研究院有限公司 | Multi-process coding method, system and device based on event and shared drive mechanism |
| CN111736987A (en)* | 2020-05-29 | 2020-10-02 | 山东大学 | A task scheduling method based on GPU space resource sharing |
| CN112162855A (en)* | 2020-09-21 | 2021-01-01 | 南开大学 | GPU page miss processing method, system and medium based on page-locked memory |
| CN112463356A (en)* | 2020-10-27 | 2021-03-09 | 苏州浪潮智能科技有限公司 | GPU heap manager memory address allocation method, system, terminal and storage medium |
| CN114880138A (en)* | 2022-04-22 | 2022-08-09 | 烽火通信科技股份有限公司 | A high-performance data model access method and device based on shared memory pool |
| CN116188246A (en)* | 2023-02-01 | 2023-05-30 | 海元利亨(青岛)医疗器械有限公司 | Method for reading internal memory of 4K medical image |
| CN116302617A (en)* | 2023-05-12 | 2023-06-23 | 苏州浪潮智能科技有限公司 | Method of shared memory, communication method, embedded system and electronic device |
| CN118113471A (en)* | 2024-03-06 | 2024-05-31 | 华中科技大学 | A GPU sharing method and device for server-unaware inference load |
| CN118260099A (en)* | 2022-12-27 | 2024-06-28 | 华为技术有限公司 | CC-NUMA server, lock request processing method and related device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0784851A (en)* | 1993-09-13 | 1995-03-31 | Toshiba Corp | Shared data management method |
| JP2010061522A (en)* | 2008-09-05 | 2010-03-18 | Internatl Business Mach Corp <Ibm> | Computer system for permitting exclusive access to shared data, method for the computer system, and computer readable recording medium |
| CN101551761A (en)* | 2009-04-30 | 2009-10-07 | 浪潮电子信息产业股份有限公司 | Method for sharing stream memory of heterogeneous multi-processor |
| WO2018119951A1 (en)* | 2016-12-29 | 2018-07-05 | 深圳前海达闼云端智能科技有限公司 | Gpu virtualization method, device, system, and electronic apparatus, and computer program product |
| KR20180099420A (en)* | 2017-02-27 | 2018-09-05 | 한국과학기술원 | Data processor proceeding of accelerated synchronization between central processing unit and graphics processing unit |
| CN109413432A (en)* | 2018-07-03 | 2019-03-01 | 北京中科睿芯智能计算产业研究院有限公司 | Multi-process coding method, system and device based on event and shared drive mechanism |
| CN109298935A (en)* | 2018-09-06 | 2019-02-01 | 华泰证券股份有限公司 | A kind of method and application of the multi-process single-write and multiple-read without lock shared drive |
| CN111736987A (en)* | 2020-05-29 | 2020-10-02 | 山东大学 | A task scheduling method based on GPU space resource sharing |
| CN112162855A (en)* | 2020-09-21 | 2021-01-01 | 南开大学 | GPU page miss processing method, system and medium based on page-locked memory |
| CN112463356A (en)* | 2020-10-27 | 2021-03-09 | 苏州浪潮智能科技有限公司 | GPU heap manager memory address allocation method, system, terminal and storage medium |
| CN114880138A (en)* | 2022-04-22 | 2022-08-09 | 烽火通信科技股份有限公司 | A high-performance data model access method and device based on shared memory pool |
| CN118260099A (en)* | 2022-12-27 | 2024-06-28 | 华为技术有限公司 | CC-NUMA server, lock request processing method and related device |
| CN116188246A (en)* | 2023-02-01 | 2023-05-30 | 海元利亨(青岛)医疗器械有限公司 | Method for reading internal memory of 4K medical image |
| CN116302617A (en)* | 2023-05-12 | 2023-06-23 | 苏州浪潮智能科技有限公司 | Method of shared memory, communication method, embedded system and electronic device |
| CN118113471A (en)* | 2024-03-06 | 2024-05-31 | 华中科技大学 | A GPU sharing method and device for server-unaware inference load |
| Title |
|---|
| 徐延东;华蓓;: "面向GPU的内存管理与应用", 电子技术, no. 07, 25 July 2017 (2017-07-25)* |
| 李涛;董前琨;张帅;孔令晏;康宏;杨愚鲁;: "基于线程池的GPU任务并行计算模式研究", 计算机学报, no. 10, 29 December 2017 (2017-12-29)* |
| 杨建强;: "Java中读写锁的实现及分析", 电脑学习, no. 02, 1 April 2006 (2006-04-01)* |
| 王磊;刘道福;陈云霁;陈天石;李玲;: "片上多核处理器共享资源分配与调度策略研究综述", 计算机研究与发展, no. 10, 21 March 2013 (2013-03-21)* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119557115A (en)* | 2025-01-26 | 2025-03-04 | 山东浪潮科学研究院有限公司 | GPGPU thread block synchronization method, device and medium based on thread lock |
| CN119557115B (en)* | 2025-01-26 | 2025-04-18 | 山东浪潮科学研究院有限公司 | GPGPU thread block synchronization method, equipment and medium based on thread lock |
| Publication number | Publication date |
|---|---|
| CN118535356B (en) | 2024-10-29 |
| Publication | Publication Date | Title |
|---|---|---|
| CN108595258B (en) | A dynamic extension method of GPGPU register file | |
| CN1238793C (en) | Distributed memory control and bandwidth optimization | |
| CN105579961B (en) | Data processing system and method of operation, hardware unit for data processing system | |
| CN113918101B (en) | A method, system, device and storage medium for writing data cache | |
| US9086920B2 (en) | Device for managing data buffers in a memory space divided into a plurality of memory elements | |
| CN118535356B (en) | Dynamic shared memory multiplexing method and device applicable to GPGPU | |
| CN110750356A (en) | Multi-core interactive method, system and storage medium suitable for non-volatile memory | |
| CN1963762A (en) | Stack management system and method | |
| US7490223B2 (en) | Dynamic resource allocation among master processors that require service from a coprocessor | |
| CN1650266A (en) | Supports time-division multiplexed speculative multithreading for single-threaded applications | |
| US20090083496A1 (en) | Method for Improved Performance With New Buffers on NUMA Systems | |
| TWI881835B (en) | Method and apparatus for configuring a relay register module, computing device, and computer-readable medium | |
| CN1928811A (en) | Processing operations management systems and methods | |
| CN111078394A (en) | GPU thread load balancing method and device | |
| US11429299B2 (en) | System and method for managing conversion of low-locality data into high-locality data | |
| CN116450298A (en) | GPU task fine granularity scheduling method and related device | |
| CN118860921A (en) | Dynamic mapping method for multi-bank access, electronic device and storage medium | |
| CN119003177A (en) | GPGPU-oriented on-chip resource management system | |
| CN116483536B (en) | Data scheduling method, computing chip and electronic equipment | |
| CN111338782A (en) | A Contention-Aware Node Allocation Method for Shared Burst Data Cache | |
| CN1851651A (en) | Method for realizing process priority scheduling for embedded SRAM operating system | |
| Daoud et al. | Processor allocation algorithm based on frame combing with memorization for 2d mesh cmps | |
| WO2022242777A1 (en) | Scheduling method, apparatus and system, and computing device | |
| Zhu et al. | EBIO: An Efficient Block I/O Stack for NVMe SSDs With Mixed Workloads | |
| JP2002278778A (en) | Scheduling device in symmetric multiprocessor system |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right | Effective date of registration:20250716 Address after:250000 Shandong Province, Jinan City, China (Shandong) Free Trade Pilot Zone, Shunhua Road Street, Inspur Road 1036, Building S01, 5th Floor Patentee after:Yuanqixin (Shandong) Semiconductor Technology Co.,Ltd. Country or region after:China Address before:250000 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province Patentee before:Shandong Inspur Scientific Research Institute Co.,Ltd. Country or region before:China |