Movatterモバイル変換


[0]ホーム

URL:


CN114995882A - Heterogeneous structure system systematic processing method - Google Patents

Heterogeneous structure system systematic processing method
Download PDF

Info

Publication number
CN114995882A
CN114995882ACN202210852711.6ACN202210852711ACN114995882ACN 114995882 ACN114995882 ACN 114995882ACN 202210852711 ACN202210852711 ACN 202210852711ACN 114995882 ACN114995882 ACN 114995882A
Authority
CN
China
Prior art keywords
packet
process object
command
dispatch
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210852711.6A
Other languages
Chinese (zh)
Other versions
CN114995882B (en
Inventor
曾臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltdfiledCriticalMuxi Integrated Circuit Shanghai Co ltd
Priority to CN202210852711.6ApriorityCriticalpatent/CN114995882B/en
Publication of CN114995882ApublicationCriticalpatent/CN114995882A/en
Application grantedgrantedCritical
Publication of CN114995882BpublicationCriticalpatent/CN114995882B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention relates to the technical field of packet processing methods, in particular to a method for packet processing of a heterogeneous structure system, which comprises the following steps: after the read command packet is returned, the returned command packet is added into a command queue of a reordering cache unit, the reordering cache unit analyzes the command packet, and when the command packet is an dispatch packet, a virtual address of a process object carried in the dispatch packet is sent to a pre-fetching scheduling module; a pre-fetching scheduling module acquires a process object; the reordering buffer unit adds the acquired process object into a process object queue of the reordering buffer unit and informs the microprocessor that the command packet is ready; the microprocessor analyzes the command packet, analyzes the corresponding process object when the command packet is the dispatch packet, and further completes the dispatch and execution of the corresponding task, solves the problem that the microprocessor still needs to wait for translating the physical address of the process object and reading the process object for a long time when analyzing the dispatch packet, and improves the operating efficiency of the microprocessor.

Description

Heterogeneous structure system systematic processing method
Technical Field
The invention relates to the technical field of package processing, in particular to a method for package processing of a heterogeneous structure system.
Background
An HSA heterogeneous system protocol defines an AQL (architecture Queuing structured packet), where the AQL packet is placed in a RING command packet queue (RING BUFFER) by a CPU, the RING command packet queue actually stores a command packet queue in a system main memory or a device memory, the AQL packet has packets in various packet formats, where a dispatch packet (kernel dispatch) is a command packet that the CPU notifies a device to execute a process task, each dispatch packet has a virtual address of a process object, the actual process object is stored in the system main memory or the device memory, and the process object includes a start address of a process command execution, an offset address of starting execution, and configuration parameters necessary for executing a process, such as hardware resource configuration.
In the HSA heterogeneous system, a Central Processing Unit (CPU) is responsible for control, including task assignment and complex control processing; the plurality of asynchronous devices execute calculation according to the CPU command; the command control unit is responsible for receiving and splitting the CPU command and issuing the command to the execution unit; specifically, a hardware processing module in the command control unit notifies a microprocessor after acquiring a command packet, the microprocessor acquires a process object according to a virtual address of the process object carried by the dispatch packet when analyzing the command packet as the dispatch packet, the work group dispatch unit configures a register for the work group dispatch unit after acquiring the process object, the work group dispatch unit divides the process into a plurality of work groups and then sends the work groups to an idle calculation execution unit, and the calculation execution unit executes a process instruction after reading a corresponding process instruction and data according to the register.
For the traditional packet processing method, a microprocessor in a command control unit needs to acquire and analyze a command packet firstly, if the command packet is dispatched, a process object needs to be read, and when the process object is read, virtual address translation and process object acquisition need to be carried out, wherein the virtual address translation needs to pass through the cache of a multi-level virtual address conversion unit, the process object acquisition needs to read a device memory and even a CPU side system main memory through the multi-level cache, the time consumption is very long, particularly, the time required for addressing to a CPU side physical address management unit is longer if the multi-level cache matching does not occur in the virtual address to physical address process, and therefore, the address translation process occupies a certain time of the microprocessor; after the physical address of the process object is obtained, the process object needs to be obtained from the device memory or the system main memory, which also consumes a large amount of time; since the microprocessor needs to wait until the current packet is processed before processing the next packet, the microprocessor is largely occupied by the processes of translating the physical address of the process object and reading the process object from the acquisition command packet, and the microprocessor can only wait until the process object is obtained, so that the microprocessor has a large amount of idle waiting time, and further the processing efficiency of the microprocessor is low.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a method for systematic processing of heterogeneous structures, which adopts the following technical solutions:
a heterogeneous structure system package processing method is applied to a command control unit of asynchronous equipment, the command control unit comprises a microprocessor and a hardware processing module, the hardware processing module comprises a prefetch scheduling module, an address conversion unit, a reordering cache unit and a work group dispatching unit, and the method comprises the following steps: after the read command packet is returned, the returned command packet is added into a command queue of a reordering cache unit, the reordering cache unit analyzes the command packet, and when the command packet is an dispatch packet, a virtual address of a process object carried in the dispatch packet is sent to a pre-fetching scheduling module; the pre-fetching scheduling module sends the virtual address of the process object to an address translation unit to obtain the physical address of the process object, and initiates a process object reading command to a multi-level cache unit of the equipment; the reordering buffer unit adds the process object returned by the multi-level buffer into the process object queue of the reordering buffer unit and informs the microprocessor that the command packet is ready; the microprocessor analyzes the command packet, and analyzes a corresponding process object when the command packet is a dispatch packet, wherein each process comprises a plurality of working groups; the microprocessor configures a register for the work group dispatching unit, the work group dispatching unit divides the process into a plurality of work groups and then sends the work groups to the idle calculation execution unit, and the calculation execution unit reads corresponding process instructions and data according to the register and then executes the process instructions.
The embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the command packet and the process object corresponding to the dispatch packet are prefetched in advance before the microprocessor analyzes the packet, the command packet is added into the command queue, the process object is added into the corresponding process object queue, the corresponding process object can be obtained while the microprocessor analyzes the dispatch packet, and then the dispatch and execution of the corresponding task are completed, so that the problems that the microprocessor needs to wait for translating the physical address of the process object and read the process object according to the physical address when the microprocessor analyzes the dispatch packet in the prior art are solved, and the operating efficiency of the microprocessor is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a block diagram of a heterogeneous architecture system;
fig. 2 is a flowchart illustrating steps of a packet processing method based on a heterogeneous structure system according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for processing a packet based on a heterogeneous structure system according to an embodiment of the present invention;
FIG. 4 is a diagram of a conventional packet processing process;
FIG. 5 is a diagram illustrating a packet processing process according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a reading flow of a package and a process object in the embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given for a packet processing method based on a heterogeneous structure system according to the present invention, with reference to the accompanying drawings and preferred embodiments, and the detailed description thereof, the structure, the features and the effects thereof. In the following description, the different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of a packet processing method based on a heterogeneous structure system in detail with reference to the accompanying drawings.
Referring to fig. 1, a structure of a heterogeneous structure system applied in an embodiment of the present invention is shown, where the heterogeneous structure system includes two major portions: a central processing unit and a plurality of asynchronous devices. The Central Processing Unit (CPU) is responsible for control and is used for task assignment and complex control processing; the asynchronous device performs calculations according to the commands of the CPU.
The central processing unit comprises a CPU system main memory and an address management unit. A CPU-initiated program may be broken down into tens of thousands of command queues, where several command queues are stored in the CPU main memory or the device memory, where each command queue is a packet queue made up of several packets that are mapped onto a command queue in a queue group in the command control units of several asynchronous devices. The address management unit is responsible for management and interpretation of the final physical address.
Each asynchronous device includes a command control unit, a plurality of parallel computing execution units, a virtual address translation secondary cache, a multi-level cache, a PCIE interface, an on-chip high-bandwidth memory, and a resource allocation module (not shown in the figure). The command control unit is responsible for receiving and segmenting commands and issuing the commands to the parallel computing execution unit; the parallel computing execution unit is used for executing corresponding tasks; the virtual address secondary cache is used as a cache for temporarily exchanging data between the virtual address primary cache and an address management unit at the CPU side; the multi-level cache is a cache of a data reading channel; the on-chip high-width memory is a memory on the current asynchronous equipment side and is also called an equipment memory; the PCIE interface is a high-speed serial computer expansion bus standard, and the central processing unit and the asynchronous equipment communicate through the PCIE interface.
The command control unit comprises a plurality of microprocessors and hardware processing modules, the microprocessors are functional modules realized by software programming, and the hardware processing modules are formed by hardware circuits. The hardware processing module comprises a pre-fetching scheduling module, a virtual address translation first-level cache, a reading module, a plurality of work group dispatching units and a reordering cache unit for storing a command queue group. Because a microprocessor can serve several processes at the same time, and each process is disassembled into several command queues, a pre-fetching scheduling module is needed to schedule the corresponding command packet, and meanwhile, in order to solve the problem that the microprocessor is occupied for a long time, the process object also needs to be pre-fetched in advance in the embodiment of the invention, so the pre-fetching scheduling module also comprises the pre-fetching and scheduling of the process object. The virtual address translation primary cache, the virtual address translation secondary cache and the address management unit form an address translation system for translating virtual addresses into physical addresses. The reading module is mainly used for reading out the packet or the process object through the multi-level cache according to the obtained physical address. The reordering buffer unit is mainly used for temporarily storing the command queue and the process object queue. The work group dispatching unit is mainly used for configuring corresponding registers and grouping dispatching tasks.
Specifically, referring to fig. 2, the CPU writes a command packet queue into a system main memory or a device memory, after the CPU notifies the prefetch scheduling module of corresponding packet queue information, the prefetch scheduling module generates a packet read virtual address, after the address translation system translates the packet virtual address into a physical address, the read module initiates a packet read command to the multi-level cache unit, the multi-level cache stores a read packet from the system main memory on the CPU side and returns the read packet to the reorder cache unit, and adds the read command packet into a corresponding command queue; after the read command packet is returned, when the reordering cache unit judges that the returned command packet is a dispatching packet, the dispatching packet is analyzed to obtain a virtual address of a process object, the virtual address of the process object is sent to a pre-fetching scheduling module, and an address translation system carries out address translation on the virtual address of the process object to obtain a physical address of the process object; the reading module initiates a reading command of the process object to the multi-level cache, and the multi-level cache reads the process object from a main memory of the system or a device memory and returns the process object to the reordering cache unit; the reordering buffer unit adds the process object into the process object queue and informs the microprocessor that the packet is ready; the microprocessor analyzes the command packet, and analyzes the process object in the dispatching packet when the identifier carried in the command packet is the dispatching packet; the dispatching package and the process objects thereof are read before the microprocessor analyzes, so that the dispatching register is configured for the workgroup dispatching unit after the microprocessor acquires the dispatching package and the corresponding process objects thereof, the workgroup dispatching unit divides the process into a plurality of workgroups and issues the workgroups to the idle calculation execution unit, and the calculation execution unit reads process instructions and data from the system main memory and the equipment memory according to the dispatching register and executes corresponding instructions. The resource allocation module is used for selecting an idle calculation execution unit to send down a working group.
Referring to fig. 3, a flowchart of a packet processing method based on a heterogeneous structure system according to an embodiment of the present invention is shown, where the packet processing method is applied to a command control unit of an asynchronous device, where the command control unit includes a microprocessor and a hardware processing module, and the method includes the following steps:
step S001, after the read command packet is returned, adding the returned command packet into a command queue of a reordering cache unit, analyzing the command packet by the reordering cache unit, and sending a virtual address of a process object carried in an dispatch packet to a pre-fetching module when the command packet is the dispatch packet; the pre-fetching module sends the virtual address of the process object to the address translation unit to obtain the physical address of the process object and obtain the process object; the reordering buffer unit adds the acquired process object into the process object queue of the reordering buffer unit, and informs the microprocessor that the command packet is ready.
The AQL packets in the HSA heterogeneous system have various forms, such as dispatch packets, custom packets, and isochronous packets, and the tasks issued by the CPU are mainly completed by the dispatch packets, the dispatch packet ratio is the largest among the command packets in the command queue, only the dispatch packet in each command packet includes the virtual address of the process object, and the actual process object is stored in the device memory or the system main memory.
In the conventional packet processing method, a microprocessor needs to wait for the return of a request for reading a process object, and the reading of the process object needs to be performed through the conversion from a virtual address to a physical address and the long time of reading from a device memory through a multi-level cache. Specifically, the analysis of the dispatching package and the acquisition of the process object are processed by the hardware processing module, so that in the process of dispatching the previous command package by the microprocessor, the hardware processing module can analyze the package of the next dispatching package and acquire the process object in advance; after the microprocessor processes the dispatch of the previous command dispatch package, the microprocessor can directly analyze the hardware processing module and obtain the next dispatch package of the process object for new dispatch; because the process object and the package are pre-fetched through the hardware processing module, the microprocessor does not need to wait for the process object, and can directly process the dispatch package, the execution efficiency of the computing execution unit for completing the small task dispatch package can be greatly improved, the high efficiency of the hardware processing module for analyzing and processing the package can be fully exerted, the dispatch package is delivered to the hardware processing module for accelerated processing, and the processing time required by the package analysis and the package processing is greatly shortened.
Referring to fig. 6, the process of reading the command packet is almost the same as the process of reading the process object, and it is necessary to convert the virtual address of the packet or the process object into the physical address, and then read the corresponding packet or the process object according to the physical address. The process of converting the virtual address into the physical address comprises the following specific steps: addressing in a first-level cache of virtual address translation according to the generated virtual address, and returning a physical address if matching is successful; if the matching fails, the inquiry to the virtual address translation secondary cache is needed, and if the matching of the virtual address translation secondary cache also fails, the address is addressed to the address management unit at the CPU side, so that the final physical address can be obtained. The process for reading a package or process object includes: after the corresponding physical address is obtained, the process object is read and returned through the memory of the multi-level cache access device according to the physical address, or the main memory of the system is accessed through the multi-level cache to read and return the packet.
The reordering buffer unit comprises a plurality of command queues and process object queues, each command queue comprises a plurality of command packets, and each process object queue comprises a plurality of process objects. And each queue follows a first-in first-out principle, the microprocessor reads the command packets in the command queue according to a natural sequence, and reads the process objects in the process object queue according to the natural sequence when the command packets are dispatch packets, wherein the command queue and the process object queue follow the first-in first-out principle. For example, the command queue is added according to the order of virtual address generation of the command packet, specifically, the order of virtual address generation is: the virtual address ofpacket 1, the virtual address of packet 2, the virtual address of packet 3, … …, the virtual address of packet n; the command queue is:bag 1, bag 2, bag 3, … …, bag n; accordingly, ifpackage 1, package 3 … …, package m, is an dispatch package, where m is less than n, the process object queue is:package 1, package 3, … …, and package m, wherein the sequence of the process objects in the process object queue is the same as the sequence of the dispatch packages in the command queue. Even if the packets and the process objects are returned out of order when returned, the reordering cache unit adds the packets and the process objects returned out of order to the corresponding queues according to the natural order.
After the dispatch package and the corresponding process objects are respectively placed into the corresponding queues, the microprocessor is informed that the package is ready, so that the occupied time of the microprocessor can be reduced.
Step S002, the microprocessor analyzes the command packet, and analyzes the corresponding process object when the command packet is a dispatch packet, wherein each process comprises a plurality of working groups; the microprocessor configures a register for the work group dispatching unit, the work group dispatching unit divides the process into a plurality of work groups and then sends the work groups to the idle calculation execution unit, and the calculation execution unit reads corresponding process instructions and data according to the register and then executes the process instructions.
Specifically, because the command packet sent by the CPU includes multiple types of formats, and the packet header definition in the command packet carries the format definition of the command packet, when the command packet is analyzed as the dispatch packet, the process object in the process object queue needs to be acquired, otherwise, the process object does not need to be taken out. Therefore, each time the microprocessor reads one command packet, whether the command packet is an dispatch packet is judged, if the command packet is the dispatch packet, the corresponding process object is taken out from the process object queue, and the process object queue and the command queue are put in and taken out in sequence. At the moment, the microprocessor can simultaneously acquire the process object while acquiring the corresponding dispatch packet, so that the address translation and reading processes of the process object do not occupy the microprocessor any more, and the problem of idle waiting caused by the occupied microprocessor is solved. Moreover, because the microprocessor is realized by software programming, although the running speed of the microprocessor is higher, the time delay of the read-write interactive processing of external hardware is larger; the hardware processing module is hardware, the speed of hardware processing is far higher than that of software, therefore, the pre-fetching tasks of the package and the process object of the dispatch package are completed in advance by the hardware, the microprocessor software does not need to wait for the process object, the dispatch package can be directly processed, and the execution efficiency of the small task dispatch package which can be completed by the calculation execution unit can be greatly improved.
Each dispatch packet corresponds to one process object, each process comprises a plurality of work groups, each work group comprises a plurality of waves, each wave comprises a plurality of threads, and the threads are the minimum units for task execution, so that each work group comprises a plurality of tasks. The dispatch package includes information such as standard task amount of the workgroup, task amount of the current dispatch package, and the like, in addition to the virtual address of the process object. The process object includes a start address of the process command execution, an offset address of the start of the process command execution, and configuration parameters necessary for the process execution, such as hardware resource configuration.
The step of dividing the process into a plurality of working groups specifically comprises the following steps: and obtaining information of the standard task quantity of the workgroup and the task quantity of the current dispatch package contained in the dispatch package, and correspondingly grouping the task quantity of the current dispatch package according to the standard task quantity of the workgroup, wherein for example, the task quantity is divided into B/A groups when the standard task quantity of the workgroup is A and the current task quantity is B.
Referring to fig. 4 and 5, in order to further illustrate the beneficial effects of the present invention, the prior art is compared with the present solution, wherein fig. 4 shows the conventional processing procedure, and the difference between the prior art and the present solution is: pre-fetching a packet before a microprocessor analyzes the packet, but requiring the microprocessor to obtain a corresponding process object according to a virtual address of the process object carried by a dispatch packet when the command packet is the dispatch packet; in fig. 4, the first line shows the processing procedure of the packet prefetch scheduling module, the second line shows the processing procedure of the reorder buffer unit, and the third line shows the processing procedure of the microprocessor. Specifically, the processing procedure of the packet pre-fetching scheduling module includes: generating a virtual address of thepacket 1, converting the virtual address of thepacket 1 and reading thepacket 1; generating a packet 2 virtual address, packet 2 virtual address translation, and packet 2 reading; and the process is repeated until the virtual address of the packet n is converted and the packet n is read, and the process is sequentially executed. For the process of the reordering buffer unit: the delay of address translation may cause the time for returning the virtual address of the packet 2 to the physical address to be earlier than the time for returning the virtual address of the packet 1 to the physical address, for example, a page table Entry (Entry) of the packet 2 exists in the first-level cache unit of the virtual address, but the page table Entry of the packet 1 does not exist, at this time, the physical address of the packet 2 will return quickly, but the packet 1 needs to spend a large amount of time to access the multi-level page table, so the return time will be later than that of the packet 2, and therefore the return time of virtual address translation is out of order; the delay of reading the packet may also cause the return time of the packet 2 to be earlier than the return time of the packet 1, because the packet may be in the device memory or the system main memory, if the packet 2 is in the device memory and the packet 1 is in the system main memory, the packet 2 will return quickly, the packet 1 needs to access the system main memory on the CPU side again, the return time of the packet 1 will be later than that of the packet 2, and therefore the sequence of packet return is also out of order, but the reorder buffer unit will reorder the returned packets, and then add the packets into the corresponding command queue according to the natural sequence. For the processing of the microprocessor: when the microprocessor reads the command packet, the command packet is also read according to a natural sequence and is sequentially executed, and only when the corresponding dispatch packet is read, the virtual address of the process object in the dispatch packet is utilized for address translation and reading; assuming that the task 2 of thepacket 1 is reading the process object, the process object of thepacket 1 is returned to the reordering cache unit, and the microprocessor can perform the next work after acquiring the process object of thepacket 1. FIG. 5 shows the packet processing process after the improvement provided by the embodiment of the present invention, in FIG. 5, the first line represents the processing process of the pre-fetch scheduling module, the second line represents the processing process of the re-order buffer unit, the third line represents the processing process of the microprocessor, and the fourth line represents the processing process of the execution unit; for the processing of the prefetch module: different from fig. 4, in the improved method, it is also necessary to read the process object in the dispatch packet in advance, that is, after thepacket 1 and the packet 2 return to the reorder buffer unit, the reorder buffer unit needs to analyze thepacket 1 and the packet 2, determine whether the dispatch packet is the dispatch packet, and if the dispatch packet is the dispatch packet, the prefetch scheduling module needs to initiate a request for reading the process object again according to the virtual address of the process object carried by the dispatch packet. For the process of the reordering buffer unit: because reading the packet and reading the process object require a certain time, the returned result sequence to the reordering buffer unit is disordered, but the reordering buffer unit puts the packet and the process object returned out of sequence into the corresponding queue according to the natural sequence for the microprocessor to read. For the processing of the microprocessor: because the package and the process object are ready, the corresponding process object can be obtained while the dispatch package is analyzed, and the task corresponding to the dispatch package can be distributed to the corresponding parallel computing execution unit for execution without waiting.
In summary, the embodiment of the present invention provides a packet processing method based on a heterogeneous structure system, where before a microprocessor parses a packet, a command packet and a process object corresponding to a dispatch packet are prefetched in advance, the command packet is added to a command queue, the process object is added to a corresponding process object queue, and when the microprocessor parses the dispatch packet, the corresponding process object can be obtained and then dispatched for a corresponding task, so as to solve the problem that in the prior art, the microprocessor needs to wait for translating a physical address of the process object and read a lengthy time of the process object according to the physical address when parsing the dispatch packet, thereby improving the operating efficiency of the microprocessor.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And that specific embodiments have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

after the read command packet is returned, the returned command packet is added into a command queue of a reordering cache unit, the reordering cache unit analyzes the command packet, and when the command packet is an dispatch packet, a virtual address of a process object carried in the dispatch packet is sent to a pre-fetching scheduling module; the pre-fetching scheduling module sends the virtual address of the process object to an address translation unit to obtain the physical address of the process object, and initiates a read command for obtaining the process object to a multi-level cache unit of the equipment; the reordering buffer unit adds the process object returned by the multi-level buffer into the process object queue of the reordering buffer unit and informs the microprocessor that the command packet is ready;
CN202210852711.6A2022-07-192022-07-19Heterogeneous structure system systematic processing methodActiveCN114995882B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210852711.6ACN114995882B (en)2022-07-192022-07-19Heterogeneous structure system systematic processing method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210852711.6ACN114995882B (en)2022-07-192022-07-19Heterogeneous structure system systematic processing method

Publications (2)

Publication NumberPublication Date
CN114995882Atrue CN114995882A (en)2022-09-02
CN114995882B CN114995882B (en)2022-11-04

Family

ID=83020949

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210852711.6AActiveCN114995882B (en)2022-07-192022-07-19Heterogeneous structure system systematic processing method

Country Status (1)

CountryLink
CN (1)CN114995882B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115374388A (en)*2022-10-242022-11-22沐曦集成电路(上海)有限公司Multidimensional array compression and decompression method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102473087A (en)*2009-07-022012-05-23三德动力有限公司Ordering a plurality of write commands associated with a storage device
US20160364171A1 (en)*2015-06-092016-12-15Ultrata LlcInfinite memory fabric streams and apis
CN110678847A (en)*2017-05-302020-01-10超威半导体公司Continuous analysis task for GPU task scheduling
CN111949578A (en)*2020-08-042020-11-17西安电子科技大学 DDR3 controller based on DFI standard
CN112416817A (en)*2020-12-022021-02-26海光信息技术股份有限公司 Prefetching method, information processing apparatus, device, and storage medium
CN113377524A (en)*2020-03-092021-09-10辉达公司Cooperative parallel memory allocation
CN113722246A (en)*2021-11-022021-11-30超验信息科技(长沙)有限公司Method and device for realizing physical memory protection mechanism in processor
US20220197696A1 (en)*2020-12-232022-06-23Advanced Micro Devices, Inc.Condensed command packet for high throughput and low overhead kernel launch

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102473087A (en)*2009-07-022012-05-23三德动力有限公司Ordering a plurality of write commands associated with a storage device
US20160364171A1 (en)*2015-06-092016-12-15Ultrata LlcInfinite memory fabric streams and apis
CN110678847A (en)*2017-05-302020-01-10超威半导体公司Continuous analysis task for GPU task scheduling
CN113377524A (en)*2020-03-092021-09-10辉达公司Cooperative parallel memory allocation
CN111949578A (en)*2020-08-042020-11-17西安电子科技大学 DDR3 controller based on DFI standard
CN112416817A (en)*2020-12-022021-02-26海光信息技术股份有限公司 Prefetching method, information processing apparatus, device, and storage medium
US20220197696A1 (en)*2020-12-232022-06-23Advanced Micro Devices, Inc.Condensed command packet for high throughput and low overhead kernel launch
CN113722246A (en)*2021-11-022021-11-30超验信息科技(长沙)有限公司Method and device for realizing physical memory protection mechanism in processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何娟: "高性能计算机磁盘子系统的预读技术", 《计算机工程与科学》*
蒋进松等: "基于预取和缓存原理的片上Flash加速控制器设计", 《计算机工程与科学》*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115374388A (en)*2022-10-242022-11-22沐曦集成电路(上海)有限公司Multidimensional array compression and decompression method and device
CN115374388B (en)*2022-10-242023-02-28沐曦集成电路(上海)有限公司Multidimensional array compression and decompression method and device

Also Published As

Publication numberPublication date
CN114995882B (en)2022-11-04

Similar Documents

PublicationPublication DateTitle
CN102609378B (en)A kind of message type internal storage access device and access method thereof
CN103019838B (en)Multi-DSP (Digital Signal Processor) platform based distributed type real-time multiple task operating system
US7664938B1 (en)Semantic processor systems and methods
US6868087B1 (en)Request queue manager in transfer controller with hub and ports
CN110109626B (en)NVMe SSD command processing method based on FPGA
US11941440B2 (en)System and method for queuing commands in a deep learning processor
CN114564234B (en) Processing device, method and system for performing data processing on multiple channels
JP2007079789A (en)Computer system and event processing method
CN114995882B (en)Heterogeneous structure system systematic processing method
US11442879B2 (en)Interrupt request processing device
CN116301627A (en) A kind of NVMe controller and its initialization, data reading and writing method
US20230153153A1 (en)Task processing method and apparatus
KR102326280B1 (en)Method, apparatus, device and medium for processing data
CN119311627B (en) Descriptor-based PCIe bus DMA high-speed data transmission method and device
CN112559403B (en)Processor and interrupt controller therein
CN113485643B (en)Method for data access and controller for data writing
WO2025082094A1 (en)Method and apparatus for marking dirty page during live migration of virtual machine, back-end device, and chip
CN115328832B (en)Data scheduling system and method based on PCIE DMA
CN118349339A (en)Data processing method, device, processor, electronic equipment and storage medium
CN107807888B (en)Data prefetching system and method for SOC architecture
CN117312202B (en) System on chip and data transmission method for system on chip
CN118796509B (en) Data transmission method, device, equipment and storage medium
CN112231250B (en)Performance isolation of storage devices
CN118194881A (en) Text generation system and method
CN120353743A (en)Data transmission method, DMA controller, system on chip and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp