Movatterモバイル変換


[0]ホーム

URL:


CN114995882B - Heterogeneous structure system systematic processing method - Google Patents

Heterogeneous structure system systematic processing method
Download PDF

Info

Publication number
CN114995882B
CN114995882BCN202210852711.6ACN202210852711ACN114995882BCN 114995882 BCN114995882 BCN 114995882BCN 202210852711 ACN202210852711 ACN 202210852711ACN 114995882 BCN114995882 BCN 114995882B
Authority
CN
China
Prior art keywords
packet
process object
command
dispatch
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210852711.6A
Other languages
Chinese (zh)
Other versions
CN114995882A (en
Inventor
曾臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltdfiledCriticalMuxi Integrated Circuit Shanghai Co ltd
Priority to CN202210852711.6ApriorityCriticalpatent/CN114995882B/en
Publication of CN114995882ApublicationCriticalpatent/CN114995882A/en
Application grantedgrantedCritical
Publication of CN114995882BpublicationCriticalpatent/CN114995882B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention relates to the technical field of packet processing methods, in particular to a method for packet processing of a heterogeneous structure system, which comprises the following steps: after the read command packet is returned, the returned command packet is added into a command queue of a reordering cache unit, the reordering cache unit analyzes the command packet, and when the command packet is an dispatch packet, a virtual address of a process object carried in the dispatch packet is sent to a pre-fetching scheduling module; a pre-fetching scheduling module acquires a process object; the reordering buffer unit adds the acquired process object into a process object queue of the reordering buffer unit and informs the microprocessor that the command packet is ready; the microprocessor analyzes the command packet, analyzes the corresponding process object when the command packet is the dispatch packet, and further completes the dispatch and execution of the corresponding task, solves the problem that the microprocessor still needs to wait for translating the physical address of the process object and reading the process object for a long time when analyzing the dispatch packet, and improves the operating efficiency of the microprocessor.

Description

Heterogeneous structure system systematic processing method
Technical Field
The invention relates to the technical field of package processing, in particular to a method for package processing of a heterogeneous structure system.
Background
An HSA heterogeneous system protocol defines an AQL (architecture Queuing structured packet), where the AQL packet is placed in a RING command packet queue (RING BUFFER) by a CPU, the RING command packet queue actually stores a command packet queue in a system main memory or a device memory, the AQL packet has packets in various packet formats, where a dispatch packet (kernel dispatch) is a command packet that the CPU notifies a device to execute a process task, each dispatch packet has a virtual address of a process object, the actual process object is stored in the system main memory or the device memory, and the process object includes a start address of a process command execution, an offset address of starting execution, and configuration parameters necessary for executing a process, such as hardware resource configuration.
In the HSA heterogeneous system, a Central Processing Unit (CPU) is responsible for control, including task assignment and complex control processing; the plurality of asynchronous devices execute calculation according to the CPU command; the command control unit is responsible for receiving and splitting the CPU command and issuing the command to the execution unit; specifically, a hardware processing module in the command control unit notifies a microprocessor after acquiring a command packet, the microprocessor acquires a process object according to a virtual address of the process object carried by the dispatch packet when analyzing the command packet as the dispatch packet, the work group dispatch unit configures a register for the work group dispatch unit after acquiring the process object, the work group dispatch unit divides the process into a plurality of work groups and then sends the work groups to an idle calculation execution unit, and the calculation execution unit executes a process instruction after reading a corresponding process instruction and data according to the register.
For the traditional packet processing method, a microprocessor in a command control unit needs to acquire and analyze a command packet firstly, if the command packet is dispatched, a process object needs to be read, and when the process object is read, virtual address translation and process object acquisition need to be carried out, wherein the virtual address translation needs to pass through the cache of a multi-level virtual address conversion unit, the process object acquisition needs to read a device memory and even a CPU side system main memory through the multi-level cache, the time consumption is very long, particularly, the time required for addressing to a CPU side physical address management unit is longer if the multi-level cache matching does not occur in the virtual address to physical address process, and therefore, the address translation process occupies a certain time of the microprocessor; after the physical address of the process object is obtained, the process object needs to be obtained from the device memory or the system main memory, which also consumes a large amount of time; since the microprocessor needs to wait until the current packet is processed before processing the next packet, the microprocessor is largely occupied by the processes of translating the physical address of the process object and reading the process object from the acquisition command packet, and the microprocessor can only wait until the process object is obtained, so that the microprocessor has a large amount of idle waiting time, and further the processing efficiency of the microprocessor is low.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a method for systematic processing of heterogeneous structures, which adopts the following technical solutions:
a method for heterogeneous structure system package processing is applied to a command control unit of asynchronous equipment, the command control unit comprises a microprocessor and a hardware processing module, the hardware processing module comprises a prefetch scheduling module, an address conversion unit, a reorder buffer unit and a work group dispatching unit, and the method comprises the following steps: after the read command packet is returned, adding the returned command packet into a command queue of a reordering cache unit, analyzing the command packet by the reordering cache unit, and sending a virtual address of a process object carried in a dispatch packet to a pre-fetching scheduling module when the command packet is the dispatch packet; the pre-fetching scheduling module sends the virtual address of the process object to an address translation unit to obtain the physical address of the process object, and initiates a process object reading command to a multi-level cache unit of the equipment; the reordering buffer unit adds the process object returned by the multi-level buffer into the process object queue of the reordering buffer unit and informs the microprocessor that the command packet is ready; the microprocessor analyzes the command packet, and analyzes a corresponding process object when the command packet is a dispatch packet, wherein each process comprises a plurality of working groups; the microprocessor configures a register for the work group dispatching unit, the work group dispatching unit divides the process into a plurality of work groups and then sends the work groups to the idle calculation execution unit, and the calculation execution unit reads corresponding process instructions and data according to the register and then executes the process instructions.
The embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the command packet and the process object corresponding to the dispatch packet are prefetched in advance before the microprocessor analyzes the packet, the command packet is added into the command queue, the process object is added into the corresponding process object queue, the corresponding process object can be obtained while the microprocessor analyzes the dispatch packet, and then the dispatch and execution of the corresponding task are completed, so that the problems that the microprocessor needs to wait for translating the physical address of the process object and read the process object according to the physical address when the microprocessor analyzes the dispatch packet in the prior art are solved, and the operating efficiency of the microprocessor is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a block diagram of a heterogeneous architecture system;
fig. 2 is a flowchart illustrating steps of a packet processing method based on a heterogeneous structure system according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for processing a packet based on a heterogeneous structure system according to an embodiment of the present invention;
FIG. 4 is a diagram of a conventional packet processing process;
FIG. 5 is a diagram illustrating a packet processing procedure according to an embodiment of the present invention;
FIG. 6 is a flow chart illustrating a process for reading a package and a process object according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the packet processing method based on the heterogeneous structure system according to the present invention with reference to the accompanying drawings and preferred embodiments shows the following detailed descriptions of the specific implementation, structure, features and effects thereof. In the following description, the different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the packet processing method based on the heterogeneous structure system provided by the present invention in detail with reference to the accompanying drawings.
Referring to fig. 1, a structure of a heterogeneous structure system applied in an embodiment of the present invention is shown, where the heterogeneous structure system includes two major parts: a central processing unit and a plurality of asynchronous devices. The Central Processing Unit (CPU) is responsible for control and is used for dispatching tasks and performing complex control processing; the asynchronous device performs calculations according to the commands of the CPU.
The central processing unit comprises a CPU system main memory and an address management unit. A CPU-initiated program may be broken down into tens of thousands of command queues, where several command queues are stored in the CPU main memory or the device memory, where each command queue is a packet queue made up of several packets that are mapped onto a command queue in a queue group in the command control units of several asynchronous devices. The address management unit is responsible for management and interpretation of the final physical address.
Each asynchronous device includes a command control unit, a plurality of parallel computing execution units, a virtual address translation secondary cache, a multi-level cache, a PCIE interface, an on-chip high-bandwidth memory, and a resource allocation module (not shown in the figure). The command control unit is responsible for receiving and splitting commands and issuing the commands to the parallel computing execution unit; the parallel computing execution unit is used for executing corresponding tasks; the virtual address secondary cache is used as a cache for temporarily exchanging data between the virtual address primary cache and an address management unit at the CPU side; the multi-level cache is a cache of a data reading channel; the on-chip high-width memory is a memory on the current asynchronous equipment side and is also called an equipment memory; the PCIE interface is a high-speed serial computer expansion bus standard, and the central processing unit and the asynchronous equipment communicate through the PCIE interface.
The command control unit comprises a plurality of microprocessors and hardware processing modules, the microprocessors are functional modules realized by software programming, and the hardware processing modules are formed by hardware circuits. The hardware processing module comprises a pre-fetching scheduling module, a virtual address translation first-level cache, a reading module, a plurality of work group dispatching units and a reordering cache unit for storing a command queue group. Because a microprocessor can serve several processes at the same time, and each process is disassembled into several command queues, a pre-fetching scheduling module is needed to schedule the corresponding command packet, and meanwhile, in order to solve the problem that the microprocessor is occupied for a long time, the process object also needs to be pre-fetched in advance in the embodiment of the invention, so the pre-fetching scheduling module also comprises the pre-fetching and scheduling of the process object. The virtual address translation primary cache, the virtual address translation secondary cache and the address management unit form an address translation system for translating virtual addresses into physical addresses. The reading module is mainly used for reading out the packet or the process object through the multi-level cache according to the obtained physical address. The reordering buffer unit is mainly used for temporarily storing the command queue and the process object queue. The work group dispatching unit is mainly used for configuring corresponding registers and grouping dispatching tasks.
Specifically, referring to fig. 2, after the CPU notifies the packet queue information corresponding to the pre-fetch scheduling module, the pre-fetch scheduling module generates a packet read virtual address, the address translation system translates the packet virtual address into a physical address, the read module initiates a packet read command to the multi-level cache unit, the multi-level cache stores the read packet from the system main memory on the CPU side and returns the read packet to the reordering cache unit, and the read command packet is added to the corresponding command queue; after the read command packet is returned, when the reordering cache unit judges that the returned command packet is a dispatch packet, the dispatch packet is analyzed to obtain a virtual address of the process object, the virtual address of the process object is sent to the pre-fetching scheduling module, and the address translation system carries out address translation on the virtual address of the process object to obtain a physical address of the process object; the reading module initiates a reading command of the process object to the multi-level cache, and the multi-level cache reads the process object from a main memory or an equipment memory of the system and returns the process object to the reordering cache unit; the reordering buffer unit adds the process object into the process object queue and informs the microprocessor that the packet is ready; the microprocessor analyzes the command packet, and analyzes the process object in the dispatching packet when the identification carried in the command packet is the dispatching packet; the dispatch package and the process object thereof are read before the microprocessor analyzes, so that after the microprocessor acquires the dispatch package and the corresponding process object thereof, the dispatch register is configured for the workgroup dispatch unit, the workgroup dispatch unit divides the process into a plurality of workgroups and issues the workgroups to the idle calculation execution unit, and the calculation execution unit reads process instructions and data from the main memory and the equipment memory of the system according to the dispatch register and executes corresponding instructions. The resource allocation module is used for selecting an idle calculation execution unit to send down a working group.
Referring to fig. 3, a flowchart of a packet processing method based on a heterogeneous structure system according to an embodiment of the present invention is shown, where the packet processing method is applied to a command control unit of an asynchronous device, where the command control unit includes a microprocessor and a hardware processing module, and the method includes the following steps:
step S001, after the read command packet is returned, adding the returned command packet into a command queue of a reordering cache unit, analyzing the command packet by the reordering cache unit, and sending a virtual address of a process object carried in a dispatch packet to a pre-fetching module when the command packet is the dispatch packet; the pre-fetching module sends the virtual address of the process object to the address translation unit to obtain the physical address of the process object and obtain the process object; the reordering buffer unit adds the acquired process object into the process object queue of the reordering buffer unit, and informs the microprocessor that the command packet is ready.
The AQL packets in the HSA heterogeneous system have various forms, such as dispatch packets, custom packets, and isochronous packets, and the tasks issued by the CPU are mainly completed by the dispatch packets, the dispatch packet ratio is the largest among the command packets in the command queue, only the dispatch packet in each command packet includes the virtual address of the process object, and the actual process object is stored in the device memory or the system main memory.
In the conventional packet processing method, a microprocessor needs to wait for the return of a request for reading a process object, and the reading of the process object needs to be performed through the conversion from a virtual address to a physical address and the long time of reading from a device memory through a multi-level cache. Specifically, the analysis of the dispatching package and the acquisition of the process object are processed by the hardware processing module, so that in the process of dispatching the previous command package by the microprocessor, the hardware processing module can analyze the package of the next dispatching package and acquire the process object in advance; after the microprocessor processes the dispatch of the previous command dispatch package, the microprocessor can directly analyze the hardware processing module and obtain the next dispatch package of the process object for new dispatch; because the process object and the package are pre-fetched through the hardware processing module, the microprocessor does not need to wait for the process object, and can directly process the dispatch package, the execution efficiency of the computing execution unit for completing the small task dispatch package can be greatly improved, the high efficiency of the hardware processing module for analyzing and processing the package can be fully exerted, the dispatch package is delivered to the hardware processing module for accelerated processing, and the processing time required by the package analysis and the package processing is greatly shortened.
Referring to fig. 6, the process of reading the command packet is almost the same as the process of reading the process object, and it is necessary to convert the virtual address of the packet or the process object into a physical address and then read the corresponding packet or the process object according to the physical address. The process of converting the virtual address into the physical address comprises the following specific steps: addressing in a first-level cache of virtual address translation according to the generated virtual address, and returning a physical address if matching is successful; if the matching fails, the inquiry to the virtual address translation secondary cache is needed, and if the matching of the virtual address translation secondary cache also fails, the address is addressed to the address management unit at the CPU side, so that the final physical address can be obtained. The process for reading a package or process object includes: after the corresponding physical address is obtained, the process object is read and returned through the memory of the multi-level cache access device according to the physical address, or the main memory of the system is accessed through the multi-level cache to read and return the packet.
The reordering buffer unit comprises a plurality of command queues and process object queues, each command queue comprises a plurality of command packets, and each process object queue comprises a plurality of process objects. And each queue follows a first-in first-out principle, the microprocessor reads the command packets in the command queue according to a natural sequence, and reads the process objects in the process object queue according to the natural sequence when the command packets are dispatch packets, wherein the command queue and the process object queue follow the first-in first-out principle. For example, the command queue is added according to the order of virtual address generation of the command packet, specifically, the order of virtual address generation is:packet 1 virtual address, packet 2 virtual address, packet 3 virtual address, … …, packet n virtual address; the command queue is:pack 1, pack 2, pack 3, … …, pack n; correspondingly, ifpacket 1, packet 3 … …, packet m is an dispatch packet, where m is less than n, then the process object queue is:package 1, package 3, … … and package m, and the arrangement sequence of the process objects in the process object queue is the same as the arrangement sequence of the dispatch packages in the command queue. Even if the packets and the process objects are returned out of order when returned, the reordering cache unit adds the packets and the process objects returned out of order to the corresponding queues according to the natural order.
After the dispatch package and the corresponding process objects are respectively placed into the corresponding queues, the microprocessor is informed that the package is ready, so that the occupied time of the microprocessor can be reduced.
Step S002, the microprocessor analyzes the command packet, and analyzes the corresponding process object when the command packet is a dispatch packet, wherein each process comprises a plurality of working groups; the microprocessor configures a register for the work group dispatching unit, the work group dispatching unit divides the process into a plurality of work groups and then sends the work groups to the idle calculation execution unit, and the calculation execution unit reads corresponding process instructions and data according to the register and then executes the process instructions.
Specifically, because the command packet sent by the CPU includes multiple types of formats, and the packet header definition in the command packet carries the format definition of the command packet, when the command packet is analyzed as the dispatch packet, the process object in the process object queue needs to be acquired, otherwise, the process object does not need to be taken out. Therefore, each time the microprocessor reads one command packet, whether the command packet is an dispatch packet is judged, if the command packet is the dispatch packet, the corresponding process object is taken out from the process object queue, and the process object queue and the command queue are put in and taken out in sequence. At the moment, the microprocessor can simultaneously acquire the process object while acquiring the corresponding dispatch packet, so that the address translation and reading processes of the process object do not occupy the microprocessor any more, and the problem of idle waiting caused by the occupied microprocessor is solved. Moreover, because the microprocessor is realized by software programming, although the running speed of the microprocessor is higher, the time delay of the read-write interactive processing of external hardware is larger; the hardware processing module is hardware, and the hardware processing speed is far higher than that of software, so that the pre-fetching tasks of the process objects of the package and the dispatch package are completed by the hardware in advance, the microprocessor software does not need to wait for the process objects, the dispatch package can be directly processed, and the execution efficiency of the small task dispatch package which can be completed by the computation execution unit can be greatly improved.
Each dispatch packet corresponds to one process object, each process comprises a plurality of work groups, each work group comprises a plurality of waves, each wave comprises a plurality of threads, and the threads are the minimum units for task execution, so that each work group comprises a plurality of tasks. The dispatch package includes information such as standard task amount of the workgroup, task amount of the current dispatch package, and the like, in addition to the virtual address of the process object. The process object includes a start address of the process command execution, an offset address of the start of the process command execution, and configuration parameters necessary for the process execution, such as hardware resource configuration.
The step of dividing the process into a plurality of working groups specifically comprises the following steps: and obtaining information of the standard task quantity of the workgroup and the task quantity of the current dispatch package contained in the dispatch package, and correspondingly grouping the task quantity of the current dispatch package according to the standard task quantity of the workgroup, wherein for example, the task quantity is divided into B/A groups when the standard task quantity of the workgroup is A and the current task quantity is B.
Referring to fig. 4 and 5, in order to further illustrate the beneficial effects of the present invention, the prior art is compared with the present solution, wherein fig. 4 shows the conventional processing procedure, and the difference between the prior art and the present solution is: pre-fetching a packet before a microprocessor analyzes the packet, but requiring the microprocessor to obtain a corresponding process object according to a virtual address of the process object carried by a dispatch packet when the command packet is the dispatch packet; in fig. 4, the first line shows the processing procedure of the packet prefetch scheduling module, the second line shows the processing procedure of the reorder buffer unit, and the third line shows the processing procedure of the microprocessor. Specifically, the processing procedure of the packet prefetch scheduling module includes: generating a virtual address of thepacket 1, converting the virtual address of thepacket 1 and reading thepacket 1; generating a packet 2 virtual address, packet 2 virtual address translation, and packet 2 reading; and the process is repeated until the virtual address of the packet n is converted and the packet n is read, and the process is sequentially executed. For the process of the reordering buffer unit: the delay of address translation may cause the time for returning the virtual address of the packet 2 to the physical address to be earlier than the time for returning the virtual address of the packet 1 to the physical address, for example, a page table Entry (Entry) of the packet 2 exists in the first-level cache unit of the virtual address, but the page table Entry of the packet 1 does not exist, at this time, the physical address of the packet 2 returns quickly, but the packet 1 needs to spend a large amount of time to access the multi-level page table, so the return time is later than that of the packet 2, and therefore the return time of virtual address translation is out of order; the delay of reading the packet may also cause the return time of the packet 2 to be earlier than the return time of the packet 1, because the packet may be in the device memory or the system main memory, if the packet 2 is in the device memory and the packet 1 is in the system main memory, the packet 2 will return quickly, the packet 1 needs to access the system main memory on the CPU side again, the return time of the packet 1 will be later than that of the packet 2, and therefore the return sequence of the packet is also out of order, but the reordering cache unit will reorder the returned packets, and then add the packets into the corresponding command queue according to the natural sequence. For the processing of the microprocessor: when the microprocessor reads the command packet, the command packet is also read according to a natural sequence and is sequentially executed, and only when the corresponding dispatch packet is read, the virtual address of the process object in the dispatch packet is utilized for address translation and reading; assuming that the task 2 of thepacket 1 is reading the process object, the process object of thepacket 1 is returned to the reordering cache unit, and the microprocessor can perform the next work after acquiring the process object of thepacket 1. FIG. 5 shows the packet processing process after the improvement provided by the embodiment of the present invention, in FIG. 5, the first line represents the processing process of the pre-fetch scheduling module, the second line represents the processing process of the re-order buffer unit, the third line represents the processing process of the microprocessor, and the fourth line represents the processing process of the execution unit; for the processing of the prefetch module: different from fig. 4, in the improved method, it is also necessary to read the process object in the dispatch packet in advance, that is, after thepacket 1 and the packet 2 return to the reorder buffer unit, the reorder buffer unit needs to analyze thepacket 1 and the packet 2, determine whether the dispatch packet is the dispatch packet, and if the dispatch packet is the dispatch packet, the prefetch scheduling module needs to initiate a request for reading the process object again according to the virtual address of the process object carried by the dispatch packet. For the process of the reordering buffer unit: because reading the packet and reading the process object require a certain time, the returned result sequence to the reordering buffer unit is disordered, but the reordering buffer unit puts the packet and the process object returned out of sequence into the corresponding queue according to the natural sequence for the microprocessor to read. For the processing of the microprocessor: because the package and the process object are ready, the corresponding process object can be obtained while the dispatch package is analyzed, and the task corresponding to the dispatch package can be distributed to the corresponding parallel computing execution unit for execution without waiting.
In summary, the embodiment of the present invention provides a packet processing method based on a heterogeneous structure system, where before a microprocessor parses a packet, a command packet and a process object corresponding to a dispatch packet are prefetched in advance, the command packet is added to a command queue, the process object is added to a corresponding process object queue, and when the microprocessor parses the dispatch packet, the corresponding process object can be obtained and then dispatched for a corresponding task, so as to solve the problem that in the prior art, the microprocessor needs to wait for translating a physical address of the process object and read a lengthy time of the process object according to the physical address when parsing the dispatch packet, thereby improving the operating efficiency of the microprocessor.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

after the read command packet is returned, adding the returned command packet into a command queue of a reordering cache unit, analyzing the command packet by the reordering cache unit, and sending a virtual address of a process object carried in a dispatch packet to a pre-fetching scheduling module when the command packet is the dispatch packet, wherein the process object comprises a start address, an offset address for starting execution and a hardware resource configuration parameter which are necessary when the process is executed; the pre-fetching scheduling module sends the virtual address of the process object to an address translation unit to obtain the physical address of the process object, and initiates a read command for obtaining the process object to a multi-level cache unit of the equipment; the reordering buffer unit adds the process object returned by the multi-level buffer into the process object queue of the reordering buffer unit and informs the microprocessor that the command packet is ready;
CN202210852711.6A2022-07-192022-07-19Heterogeneous structure system systematic processing methodActiveCN114995882B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210852711.6ACN114995882B (en)2022-07-192022-07-19Heterogeneous structure system systematic processing method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210852711.6ACN114995882B (en)2022-07-192022-07-19Heterogeneous structure system systematic processing method

Publications (2)

Publication NumberPublication Date
CN114995882A CN114995882A (en)2022-09-02
CN114995882Btrue CN114995882B (en)2022-11-04

Family

ID=83020949

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210852711.6AActiveCN114995882B (en)2022-07-192022-07-19Heterogeneous structure system systematic processing method

Country Status (1)

CountryLink
CN (1)CN114995882B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115374388B (en)*2022-10-242023-02-28沐曦集成电路(上海)有限公司Multidimensional array compression and decompression method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102473087A (en)*2009-07-022012-05-23三德动力有限公司Ordering a plurality of write commands associated with a storage device
CN110678847A (en)*2017-05-302020-01-10超威半导体公司Continuous analysis task for GPU task scheduling
CN111949578A (en)*2020-08-042020-11-17西安电子科技大学 DDR3 controller based on DFI standard
CN112416817A (en)*2020-12-022021-02-26海光信息技术股份有限公司 Prefetching method, information processing apparatus, device, and storage medium
CN113377524A (en)*2020-03-092021-09-10辉达公司Cooperative parallel memory allocation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9971542B2 (en)*2015-06-092018-05-15Ultrata, LlcInfinite memory fabric streams and APIs
US20220197696A1 (en)*2020-12-232022-06-23Advanced Micro Devices, Inc.Condensed command packet for high throughput and low overhead kernel launch
CN113722246B (en)*2021-11-022022-02-08超验信息科技(长沙)有限公司Method and device for realizing physical memory protection mechanism in processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102473087A (en)*2009-07-022012-05-23三德动力有限公司Ordering a plurality of write commands associated with a storage device
CN110678847A (en)*2017-05-302020-01-10超威半导体公司Continuous analysis task for GPU task scheduling
CN113377524A (en)*2020-03-092021-09-10辉达公司Cooperative parallel memory allocation
CN111949578A (en)*2020-08-042020-11-17西安电子科技大学 DDR3 controller based on DFI standard
CN112416817A (en)*2020-12-022021-02-26海光信息技术股份有限公司 Prefetching method, information processing apparatus, device, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于预取和缓存原理的片上Flash加速控制器设计;蒋进松等;《计算机工程与科学》;20161215(第12期);全文*
高性能计算机磁盘子系统的预读技术;何娟;《计算机工程与科学》;19931231(第02期);全文*

Also Published As

Publication numberPublication date
CN114995882A (en)2022-09-02

Similar Documents

PublicationPublication DateTitle
CN102609378B (en)A kind of message type internal storage access device and access method thereof
CN103019838B (en)Multi-DSP (Digital Signal Processor) platform based distributed type real-time multiple task operating system
CN112861468B (en) Method, device and medium for software-hardware co-simulation verification
US6868087B1 (en)Request queue manager in transfer controller with hub and ports
CN110109626B (en)NVMe SSD command processing method based on FPGA
CN102446087B (en)Instruction prefetching method and device
CN108710535A (en)A kind of task scheduling system based on intelligent processor
US11941440B2 (en)System and method for queuing commands in a deep learning processor
CN114564234B (en) Processing device, method and system for performing data processing on multiple channels
JP2007079789A (en)Computer system and event processing method
CN114995882B (en)Heterogeneous structure system systematic processing method
US11442879B2 (en)Interrupt request processing device
US20230153153A1 (en)Task processing method and apparatus
KR102326280B1 (en)Method, apparatus, device and medium for processing data
CN112559403B (en)Processor and interrupt controller therein
CN119011508B (en) A method for resolving message and host cache matching under virtio protocol
WO2025082094A1 (en)Method and apparatus for marking dirty page during live migration of virtual machine, back-end device, and chip
CN118484408A (en)Acquisition method based on DPU chip virtio multi Virtqueues descriptors
CN118349339A (en)Data processing method, device, processor, electronic equipment and storage medium
CN115328832A (en)Data scheduling system and method based on PCIE DMA
CN107807888B (en)Data prefetching system and method for SOC architecture
CN118069402B (en)Task package execution error processing method
CN117312202B (en) System on chip and data transmission method for system on chip
CN118796509B (en) Data transmission method, device, equipment and storage medium
CN112231250B (en)Performance isolation of storage devices

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp