CN110825438B

Movatterモバイル変換

Info

Publication number: CN110825438B
Application number: CN201810906709.6A
Authority: CN
Inventors: 柳嘉强
Original assignee: Kunlun Core Beijing Technology Co ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Kunlun Core Beijing Technology Co ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2022-07-19
Anticipated expiration: 2038-08-10
Also published as: CN110825438A

Abstract

Translated fromChinese

本申请实施例公开了用于模拟人工智能芯片的数据处理的方法和装置。该人工智能芯片包括至少一个模块，该方法的一具体实施方式包括：获取待处理的比特组序列和人工智能芯片的硬件规范信息，其中，硬件规范信息包括指令解析规则、支持的指令集合、指令集合中的指令涉及的模块的模块信息。根据指令解析规则从比特组序列中解析出至少一个指令。对于至少一个指令中的指令，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。该实施方式能够模拟人工智能芯片内部的时序，根据时序决定执行每条指令的顺序，增加模拟器运行结果与芯片运行结果的一致性。

The embodiments of the present application disclose a method and an apparatus for simulating data processing of an artificial intelligence chip. The artificial intelligence chip includes at least one module, and a specific implementation of the method includes: acquiring a sequence of bits to be processed and hardware specification information of the artificial intelligence chip, wherein the hardware specification information includes instruction parsing rules, supported instruction sets, instructions Module information for the modules involved in the directives in the collection. At least one instruction is parsed from the sequence of bit groups according to instruction parsing rules. For an instruction in at least one instruction, the simulation end time of the instruction is predicted according to the module information of the module involved in the instruction, and the instruction is simulated and executed in response to detecting that the current simulation time reaches the simulation end time of the instruction. This embodiment can simulate the internal timing of the artificial intelligence chip, determine the order of executing each instruction according to the timing, and increase the consistency between the running result of the simulator and the running result of the chip.

Description

Translated fromChinese

用于模拟人工智能芯片的数据处理的方法和装置Method and apparatus for simulating data processing of artificial intelligence chips

技术领域technical field

本申请实施例涉及计算机技术领域，具体涉及用于模拟人工智能芯片的数据处理的方法和装置。The embodiments of the present application relate to the field of computer technology, and in particular, to a method and apparatus for simulating data processing of an artificial intelligence chip.

背景技术Background technique

模拟器的作用是在芯片正式流片之前，替代硬件芯片，用于验证处理器设计功能的完善性(满足预期应用的需求)、实现的正确性(代码是否符合设计预期)。并用于软件栈的开发和上层应用性能的优化。The role of the simulator is to replace the hardware chip before the chip is officially taped out to verify the completeness of the processor design function (meeting the needs of the expected application) and the correctness of the implementation (whether the code meets the design expectations). And used for software stack development and upper-layer application performance optimization.

综上，模拟器需要在功能和性能两方面模拟芯片的行为。功能方面，在给定输入程序的情况下，模拟器的运行结果和芯片的运行结果必须一致。性能方面，模拟器需要输出芯片运行一段给定程序需要的时间(通常用周期数表示)。In summary, the simulator needs to simulate the behavior of the chip in terms of both functionality and performance. Functionally, given the input program, the results of the simulator and the chip must be consistent. Performance-wise, the simulator needs to output how long the chip takes to run a given program (usually expressed in cycles).

现有的模拟器通常需要模拟每条指令执行过程中每个周期流水线上传输的数据，并且仅仅是简单的逐条指令执行，未考虑指令的时序因素。Existing simulators usually need to simulate the data transmitted on the pipeline in each cycle during the execution of each instruction, and simply execute the instructions one by one without considering the timing factor of the instructions.

发明内容SUMMARY OF THE INVENTION

本申请实施例提出了用于模拟人工智能芯片的数据处理的方法和装置。The embodiments of the present application propose a method and apparatus for simulating data processing of an artificial intelligence chip.

第一方面，本申请实施例提供了一种用于模拟人工智能芯片的数据处理的方法，其中，人工智能芯片包括至少一个模块，方法包括：获取待处理的比特组序列和人工智能芯片的硬件规范信息，其中，硬件规范信息包括指令解析规则、支持的指令集合、指令集合中的指令涉及的模块的模块信息。根据指令解析规则从比特组序列中解析出至少一个指令。对于至少一个指令中的指令，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。In a first aspect, an embodiment of the present application provides a method for simulating data processing of an artificial intelligence chip, wherein the artificial intelligence chip includes at least one module, and the method includes: acquiring a sequence of bits to be processed and hardware of the artificial intelligence chip Specification information, wherein the hardware specification information includes instruction parsing rules, supported instruction sets, and module information of modules involved in the instructions in the instruction set. At least one instruction is parsed from the sequence of bit groups according to instruction parsing rules. For an instruction in at least one instruction, the simulation end time of the instruction is predicted according to the module information of the module involved in the instruction, and the instruction is simulated and executed in response to detecting that the current simulation time reaches the simulation end time of the instruction.

在一些实施例中，模块信息包括以下至少一项：模块之间的互联信息、模块的硬件结构信息以及模块之间交互的协议信息。In some embodiments, the module information includes at least one of the following: interconnection information between modules, hardware structure information of the modules, and protocol information for interaction between the modules.

在一些实施例中，模拟执行该指令，包括：调用预设的函数来模拟该指令的功能。In some embodiments, simulating execution of the instruction includes: calling a preset function to simulate the function of the instruction.

在一些实施例中，模拟执行该指令，包括：将该指令写入共享队列；从共享队列中取出该指令，以及调用预设的函数来模拟该指令的功能。In some embodiments, simulating execution of the instruction includes: writing the instruction into a shared queue; fetching the instruction from the shared queue, and calling a preset function to simulate the function of the instruction.

在一些实施例中，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，包括：根据该指令涉及的模块的模块信息模拟该指令被执行的过程。根据模拟该指令被执行的过程确定该指令的完成时间。根据当前模拟时间和完成时间确定该指令的模拟结束时间。In some embodiments, predicting the simulation end time of the instruction according to the module information of the modules involved in the instruction includes: simulating the execution process of the instruction according to the module information of the modules involved in the instruction. The completion time of the instruction is determined according to the process of simulating the execution of the instruction. Based on the current simulation time and completion time, determine the simulation end time of the instruction.

在一些实施例中，根据模拟该指令被执行的过程确定该指令的完成时间，包括：根据该指令所涉及的模块的硬件结构信息确定该指令所涉及的模块的内部处理时间。根据该指令所涉及的模块的模块之间交互的协议信息将该指令执行过程中各模块之间的交互分解为多次事务，以及确定事务涉及的流水线的时间和排队的时间。根据该指令所涉及的模块的内部处理时间、指令执行过程中事务涉及的流水线时间、排队的时间确定该指令的完成时间。In some embodiments, determining the completion time of the instruction according to the process of simulating the execution of the instruction includes: determining the internal processing time of the modules involved in the instruction according to hardware structure information of the modules involved in the instruction. The interaction between the modules during the execution of the instruction is decomposed into multiple transactions according to the protocol information of the interaction between the modules involved in the instruction, and the pipeline time and queuing time involved in the transaction are determined. The completion time of the instruction is determined according to the internal processing time of the module involved in the instruction, the pipeline time involved in the transaction during the execution of the instruction, and the queuing time.

第二方面，本申请实施例提供了一种用于模拟人工智能芯片的数据处理的装置，其中，人工智能芯片包括至少一个模块，装置包括：获取单元，被配置成获取待处理的比特组序列和人工智能芯片的硬件规范信息，其中，硬件规范信息包括指令解析规则、支持的指令集合、指令集合中的指令涉及的模块的模块信息。解析单元，被配置成根据指令解析规则从比特组序列中解析出至少一个指令。模拟单元，被配置成对于至少一个指令中的指令，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。In a second aspect, an embodiment of the present application provides an apparatus for simulating data processing of an artificial intelligence chip, wherein the artificial intelligence chip includes at least one module, and the apparatus includes: an obtaining unit configured to obtain a sequence of bit groups to be processed and hardware specification information of the artificial intelligence chip, wherein the hardware specification information includes instruction parsing rules, supported instruction sets, and module information of modules involved in the instructions in the instruction set. The parsing unit is configured to parse out at least one instruction from the sequence of bit groups according to the instruction parsing rule. The simulation unit is configured to, for an instruction in at least one instruction, predict the simulation end time of the instruction according to the module information of the module involved in the instruction, and execute the instruction in a simulation in response to detecting that the current simulation time reaches the simulation end time of the instruction .

在一些实施例中，模拟单元进一步被配置成：调用预设的函数来模拟该指令的功能。In some embodiments, the simulation unit is further configured to: call a preset function to simulate the function of the instruction.

在一些实施例中，模拟单元进一步被配置成：将该指令写入共享队列。从共享队列中取出该指令，以及调用预设的函数来模拟该指令的功能。In some embodiments, the emulation unit is further configured to: write the instruction to the shared queue. The instruction is fetched from the shared queue, and a preset function is called to simulate the function of the instruction.

在一些实施例中，模拟单元进一步被配置成：根据该指令涉及的模块的模块信息模拟该指令被执行的过程。根据模拟该指令被执行的过程确定该指令的完成时间。根据当前模拟时间和完成时间确定该指令的模拟结束时间。In some embodiments, the simulation unit is further configured to simulate a process in which the instruction is executed according to module information of the modules involved in the instruction. The completion time of the instruction is determined according to the process of simulating the execution of the instruction. Based on the current simulation time and completion time, determine the simulation end time of the instruction.

在一些实施例中，模拟单元进一步被配置成：根据该指令所涉及的模块的硬件结构信息确定该指令所涉及的模块的内部处理时间。根据该指令所涉及的模块的模块之间交互的协议信息将该指令执行过程中各模块之间的交互分解为多次事务，以及确定事务涉及的流水线的时间和排队的时间。根据该指令所涉及的模块的内部处理时间、指令执行过程中事务涉及的流水线时间、排队的时间确定该指令的完成时间。In some embodiments, the simulation unit is further configured to: determine the internal processing time of the module involved in the instruction according to the hardware structure information of the module involved in the instruction. The interaction between the modules during the execution of the instruction is decomposed into multiple transactions according to the protocol information of the interaction between the modules involved in the instruction, and the pipeline time and queuing time involved in the transaction are determined. The completion time of the instruction is determined according to the internal processing time of the module involved in the instruction, the pipeline time involved in the transaction during the execution of the instruction, and the queuing time.

第三方面，本申请实施例提供了一种模拟器，包括：一个或多个处理器。存储装置，其上存储有一个或多个程序，当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现如第一方面中任一的方法。In a third aspect, an embodiment of the present application provides a simulator, including: one or more processors. A storage device having one or more programs stored thereon, when the one or more programs are executed by one or more processors, causes the one or more processors to implement the method as in any one of the first aspects.

第四方面，本申请实施例提供了一种计算机可读介质，其上存储有计算机程序，其中，程序被处理器执行时实现如第一方面中任一的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable medium on which a computer program is stored, wherein the program implements the method according to any one of the first aspect when the program is executed by a processor.

本申请实施例提供的用于模拟人工智能芯片的数据处理的方法和装置，通过模拟指令的完成时间和指令执行完成后系统所处的状态。模拟人工智能芯片内部的时序时，不做实际的数据拷贝和计算，从而提高模拟器运行速度。并且根据时序模拟的结果决定执行每条指令的顺序，增加模拟运行结果与芯片运行结果的一致性。The method and device for simulating data processing of an artificial intelligence chip provided by the embodiments of the present application simulate the completion time of an instruction and the state of the system after the instruction execution is completed. When simulating the internal timing of the artificial intelligence chip, no actual data copying and calculation are performed, thereby improving the running speed of the simulator. And the order of executing each instruction is determined according to the result of the timing simulation, so as to increase the consistency between the simulation running result and the chip running result.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本申请的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是本申请的一个实施例可以应用于其中的示例性系统架构图；FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application may be applied;

图2是根据本申请的用于模拟人工智能芯片的数据处理的方法的一个实施例的流程图；2 is a flowchart of an embodiment of a method for simulating data processing of an artificial intelligence chip according to the present application;

图3是根据本申请的用于模拟人工智能芯片的数据处理的方法的一个应用场景的示意图；3 is a schematic diagram of an application scenario of the method for simulating data processing of an artificial intelligence chip according to the present application;

图4是根据本申请的用于模拟人工智能芯片的数据处理的方法的又一个实施例的流程图；4 is a flow chart of yet another embodiment of a method for simulating data processing of an artificial intelligence chip according to the present application;

图5是根据本申请的用于模拟人工智能芯片的数据处理的方法的又一个应用场景的示意图；5 is a schematic diagram of another application scenario of the method for simulating data processing of an artificial intelligence chip according to the present application;

图6是根据本申请的用于模拟人工智能芯片的数据处理的装置的一个实施例的结构示意图；6 is a schematic structural diagram of an embodiment of an apparatus for simulating data processing of an artificial intelligence chip according to the present application;

图7是适于用来实现本申请实施例的模拟器的计算机系统的结构示意图。FIG. 7 is a schematic structural diagram of a computer system suitable for implementing the simulator of the embodiment of the present application.

具体实施方式Detailed ways

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

图1示出了可以应用本申请的用于模拟人工智能芯片的数据处理的方法或用于模拟人工智能芯片的数据处理的装置的实施例的示例性系统架构100。FIG. 1 shows anexemplary system architecture 100 to which embodiments of the method for simulating data processing of an artificial intelligence chip or the apparatus for simulating data processing of an artificial intelligence chip of the present application may be applied.

如图1所示，系统架构100可以包括终端设备101、102、103，网络104和AI(Artificial Intelligence，人工智能)芯片105。AI芯片也被称为AI加速器或计算卡，即专门用于处理人工智能应用中的大量计算任务的模块(其他非计算任务仍由CPU负责)。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , thesystem architecture 100 may includeterminal devices 101 , 102 , and 103 , anetwork 104 and an AI (Artificial Intelligence, artificial intelligence)chip 105 . AI chips are also known as AI accelerators or computing cards, which are modules dedicated to processing a large number of computing tasks in artificial intelligence applications (other non-computing tasks are still handled by the CPU). Thenetwork 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与芯片105交互，以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用，例如芯片模拟器、网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use theterminal devices 101 , 102 and 103 to interact with thechip 105 through thenetwork 104 to receive or send messages and the like. Various communication client applications may be installed on theterminal devices 101 , 102 and 103 , such as chip simulators, web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备101、102、103可以是硬件，也可以是软件。当终端设备101、102、103为硬件时，可以是具有显示屏并且支持处理器模拟功能的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时，可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务)，也可以实现成单个软件或软件模块。在此不做具体限定。Theterminal devices 101, 102, and 103 may be hardware or software. When theterminal devices 101 , 102 , and 103 are hardware, they can be various electronic devices with display screens and supporting processor simulation functions, including but not limited to smart phones, tablet computers, laptop computers, and desktop computers. When theterminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg, to provide distributed services), or as a single software or software module. There is no specific limitation here.

AI芯片105可以是提供各种服务的芯片，例如对终端设备101、102、103上模拟的指令提供支持的语音识别芯片、图像识别芯片等。AI芯片105可以对接收到的指令进行译码、执行等处理，并将运行结果反馈给终端设备。终端设备还可从AI芯片获得待处理的指令，通过软件模拟指令的执行，并将软件模拟得到的运行结果和芯片运行结果进行比对，验证模拟效果。TheAI chip 105 may be a chip that provides various services, such as a speech recognition chip, an image recognition chip, and the like that provide support for commands simulated on theterminal devices 101 , 102 , and 103 . TheAI chip 105 can decode and execute the received instructions, and feed back the operation results to the terminal device. The terminal device can also obtain the instructions to be processed from the AI chip, simulate the execution of the instructions through software, and compare the running results obtained by the software simulation with the running results of the chip to verify the simulation effect.

需要说明的是，本申请实施例所提供的用于模拟人工智能芯片的数据处理的方法一般由终端设备101、102、103执行，相应地，用于模拟人工智能芯片的数据处理的装置一般设置于终端设备101、102、103中。It should be noted that the methods for simulating data processing of artificial intelligence chips provided by the embodiments of the present application are generally executed byterminal devices 101 , 102 and 103 , and correspondingly, the apparatuses for simulating data processing of artificial intelligence chips are generally set in theterminal devices 101 , 102 and 103 .

应该理解，图1中的终端设备、网络和AI芯片的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和AI芯片。It should be understood that the numbers of terminal devices, networks and AI chips in FIG. 1 are only illustrative. According to the implementation needs, it can have any number of terminal devices, network and AI chips.

继续参考图2，示出了根据本申请的用于模拟人工智能芯片的数据处理的方法的一个实施例的流程200。该人工智能芯片包括至少一个模块。该用于模拟人工智能芯片的数据处理的方法，包括以下步骤：Continuing to refer to FIG. 2 , aflow 200 of an embodiment of a method for simulating data processing of an artificial intelligence chip according to the present application is shown. The artificial intelligence chip includes at least one module. The method for simulating data processing of an artificial intelligence chip includes the following steps:

步骤201，获取待处理的比特组序列和人工智能芯片的硬件规范信息。Step 201: Obtain the bit group sequence to be processed and the hardware specification information of the artificial intelligence chip.

在本实施例中，用于模拟人工智能芯片的数据处理的方法的执行主体(例如图1所示的终端设备)可以通过有线连接方式或者无线连接方式从第三方服务器获取待处理的比特组序列和人工智能芯片的硬件规范信息。或者可从AI芯片获取待处理的比特组序列。还可以从终端获取待处理的比特组序列。比特组序列的输入的形式可以是文件，也可以是事先存储在内存中的数据。比特组序列中的每个比特组对应于一个指令。其中，硬件规范信息包括指令解析规则、支持的指令集合、指令集合中的指令涉及的模块的模块信息。指令解析规则包括译码规则和比特组序列的分隔规则。译码规则指的是按照预定的指令格式(例如，前两位表示操作码，后八位表示地址码)，对一个比特组进行拆分和解释，识别区分出不同的指令类别以及获取各种操作数。根据译码规则可以从一个比特组中解析出一条指令的操作码和地址码。例如，10bit长度的比特组，前两位表示操作码(例如，00表示加法，01表示乘法)，后八位表示地址码。比特组序列的分隔规则用于区分不同的指令对应的比特组。例如，对于不同的比特组，使用分隔符进行分隔。或者，对于不同的比特组，按照固定的比特组长度进行分隔。在存储上，可以借助于不同的分隔形式，来分隔不同的比特组序列。In this embodiment, the execution body of the method for simulating the data processing of the artificial intelligence chip (for example, the terminal device shown in FIG. 1 ) can obtain the sequence of bits to be processed from a third-party server through a wired connection or a wireless connection and hardware specification information for AI chips. Alternatively, the sequence of bits to be processed can be obtained from the AI chip. The sequence of bit groups to be processed can also be obtained from the terminal. The input form of the bit group sequence can be a file or data stored in memory in advance. Each bit group in the sequence of bit groups corresponds to an instruction. The hardware specification information includes instruction parsing rules, supported instruction sets, and module information of modules involved in the instructions in the instruction set. Instruction parsing rules include decoding rules and separation rules for bit group sequences. The decoding rule refers to splitting and interpreting a bit group according to a predetermined instruction format (for example, the first two bits represent the opcode, and the last eight bits represent the address code), identify and distinguish different instruction categories, and obtain various operand. According to the decoding rules, the operation code and address code of an instruction can be parsed from a bit group. For example, a bit group of 10bit length, the first two bits represent the operation code (for example, 00 means addition, 01 means multiplication), and the last eight bits represent the address code. The separation rule of the bit group sequence is used to distinguish the bit groups corresponding to different instructions. For example, for different bit groups, use a delimiter to separate them. Or, for different bit groups, they are separated by a fixed bit group length. In storage, different bit group sequences can be separated by means of different separation forms.

在本实施例的一些可选的实现方式中，模块信息包括以下至少一项：模块之间的互联信息、模块的硬件结构信息以及模块之间交互的协议信息。模块之间的互联信息指的是不同模块之间是否有连接，通过什么方式连接，例如，总线连接等。模块的硬件结构信息用于确定模块什么时候会和其他模块交互。模块之间交互的协议信息指的是模块之间交互涉及的各种参数，例如，每次从SRAM(Static Random-Access Memory，静态随机存取存储器)读取的数据大小，写入DRAM(Dynamic Random Access Memory，动态随机存取存储器)的数据的大小等影响指令执行时间的信息。In some optional implementations of this embodiment, the module information includes at least one of the following: interconnection information between modules, hardware structure information of the modules, and protocol information exchanged between modules. The interconnection information between modules refers to whether there is a connection between different modules and in what manner, for example, a bus connection. The hardware structure information of the module is used to determine when the module will interact with other modules. The protocol information interacted between modules refers to various parameters involved in the interaction between modules, for example, the size of data read from SRAM (Static Random-Access Memory, static random access memory) each time, written to DRAM (Dynamic Information that affects the execution time of instructions, such as the size of data in Random Access Memory, dynamic random access memory).

步骤202，根据指令解析规则从比特组序列中解析出至少一个指令。Step 202, parsing out at least one instruction from the bit group sequence according to the instruction parsing rule.

在本实施例中，根据指令解析规则，从输入的比特组序列中解析指令。首先，根据比特组序列的分隔规则将比特组序列分隔成至少一个比特组。然后，对于每个比特组，根据译码规则将该比特组译码成指令。这里需要说明两点，第一，在AI芯片中，指令可能是并行执行的。第二，比特组序列的解析过程取决于AI芯片对指令解析的规则。例如，如果AI芯片中有多个可以接收指令的单元，则输入指令可以是多个文件或者内存中的多段数据。或者只有一个文件或一段数据，但可以独立的由不同的单元进行解析。In this embodiment, the instruction is parsed from the input bit group sequence according to the instruction parsing rule. First, the bit group sequence is separated into at least one bit group according to the separation rule of the bit group sequence. Then, for each group of bits, the group of bits is decoded into instructions according to the decoding rules. Two points need to be explained here. First, in AI chips, instructions may be executed in parallel. Second, the parsing process of the bit group sequence depends on the rules of the AI chip for instruction parsing. For example, if there are multiple units in an AI chip that can receive instructions, the input instructions can be multiple files or multiple pieces of data in memory. Or there is only one file or piece of data, but it can be independently parsed by different units.

步骤203，对于至少一个指令中的指令，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。Step 203 , for an instruction in at least one instruction, predict the simulation end time of the instruction according to the module information of the modules involved in the instruction, and simulate and execute the instruction in response to detecting that the current simulation time reaches the simulation end time of the instruction.

在本实施例中，解析得到指令后，会根据该指令涉及的模块的模块信息，即硬件的具体实现，模拟每条指令实际执行过程中的时序，预测每条指令执行结束的时间。当前模拟时间指的是所模拟的芯片的内部时间。为了准确预测每条指令执行结束的时间，模拟器内部通常需要模拟AI芯片内部各个模块的交互过程。可以抽象模拟器内部维护芯片的状态(比如内存中的数据)，模拟执行指令功能后会更新该状态。In this embodiment, after the instruction is obtained by parsing, the time sequence during the actual execution of each instruction is simulated according to the module information of the module involved in the instruction, that is, the specific implementation of the hardware, and the execution end time of each instruction is predicted. The current simulation time refers to the internal time of the chip being simulated. In order to accurately predict when the execution of each instruction ends, the simulator usually needs to simulate the interaction process of each module inside the AI chip. The state of the chip (such as data in the memory) can be abstracted and maintained within the simulator, and the state will be updated after the simulation executes the instruction function.

需要说明的是，虽然为了准确预测每条指令执行结束的时间，需要考虑各个模块的交互和影响。但是在该阶段，并不需要真正按照芯片内部的流程去执行指令的功能。例如，并不需要真正分多次将数据从DRAM拷贝到SRAM。It should be noted that, although in order to accurately predict the end time of each instruction execution, it is necessary to consider the interaction and influence of each module. However, at this stage, it is not necessary to execute the function of the instruction according to the internal process of the chip. For example, it is not really necessary to copy data from DRAM to SRAM multiple times.

预测出一条指令执行结束的时间后，新起一个定时任务，当所模拟的芯片内的时间到达该指令执行结束的时间时，执行该指令的功能。这样做的原因是芯片内部指令执行可能是并行的，有可能其他指令执行结束的时间会早于当前指令。举例来说，在模拟时刻t0,模拟器预测到指令I1的执行结束时间是在t0+100cycle；而在一段时间后的模拟时刻t1，模拟器预测到指令I2的执行结束时间是30个cycle之后，即t1+30；也即相比I1，虽然模拟器较晚得到I2执行结束的时间(t0<t1)；但是在功能执行上，I2应该早于I1(t0+100>t1+30)。上述起定时任务的过程有不同的实现方法，这里不再赘述。After predicting the time when the execution of an instruction ends, a new timing task is started, and when the time in the simulated chip reaches the time when the execution of the instruction ends, the function of the instruction is executed. The reason for this is that the execution of instructions within the chip may be parallel, and it is possible that the execution of other instructions will end earlier than the current instruction. For example, at the simulation time t0, the simulator predicts that the execution end time of the instruction I1 is at t0+100cycle; and at the simulation time t1 after a period of time, the simulator predicts that the execution end time of the instruction I2 is 30 cycles later , that is, t1+30; that is, compared to I1, although the simulator gets the time when I2 execution ends later (t0<t1); but in function execution, I2 should be earlier than I1 (t0+100>t1+30). There are different implementation methods for the above process of starting the timing task, which will not be repeated here.

在本实施例的一些可选的实现方式中，模拟执行该指令，包括：调用预设的函数来模拟该指令的功能。可由模拟时序的线程来调用预设的函数来模拟该指令的功能。模拟时序的线程不仅可以模拟每条指令实际执行过程中时序，还能预测每条指令执行结束的时间。预设的函数可以模拟所有指令的功能。也可调用专用针对该指令的功能的函数来模拟该指令的功能。指令的功能可以是数据搬运、加法运算、乘法运算等。In some optional implementations of this embodiment, simulating execution of the instruction includes: calling a preset function to simulate the function of the instruction. The function of the instruction can be simulated by calling a preset function by the thread simulating the timing sequence. The thread that simulates timing can not only simulate the timing during the actual execution of each instruction, but also predict the time when the execution of each instruction ends. The preset functions can simulate the functions of all instructions. Functions specific to the function of the instruction can also be called to simulate the function of the instruction. The function of the instruction can be data movement, addition operation, multiplication operation, etc.

在本实施例的一些可选的实现方式中，模拟执行该指令，包括：将该指令写入共享队列。从共享队列中取出该指令，以及调用预设的函数来模拟该指令的功能。在模拟器内部用一个专门的线程模拟指令的功能。模拟时序的线程和模拟功能的线程之间通过一个共享队列传递数据。对于模拟时序的线程，当所模拟的芯片内的时间到达某条指令执行结束的时间时，只需要将指令写入队列。而对于模拟功能的线程，只需要从共享队列中取出指令并执行即可。从而可将功能模拟的代码和时序模拟的代码解耦，可以独立开发，加快模拟器开发迭代速度。将模拟器内部时序模拟和功能模拟运行在不同的线程，加快模拟器运行速度。In some optional implementations of this embodiment, simulating execution of the instruction includes: writing the instruction into a shared queue. The instruction is fetched from the shared queue, and a preset function is called to simulate the function of the instruction. Use a dedicated thread to simulate the functionality of the instruction inside the simulator. Data is passed between threads that simulate timing and threads that simulate functionality through a shared queue. For a thread that simulates timing, when the time in the simulated chip reaches the end of execution of an instruction, the instruction only needs to be written into the queue. For the thread of the simulation function, it is only necessary to fetch the instruction from the shared queue and execute it. In this way, the code of functional simulation and the code of timing simulation can be decoupled, which can be developed independently, and the iteration speed of simulator development can be accelerated. Run the internal timing simulation and function simulation of the simulator in different threads to speed up the running speed of the simulator.

继续参见图3，图3是根据本实施例的用于模拟人工智能芯片的数据处理的方法的应用场景的一个示意图。在图3的应用场景中，终端设备接收到待处理的比特组序列(0101001100，0010100110)之后，获取人工智能芯片的硬件规范信息。根据硬件规范信息从比特组序列中解析出至少一条指令(0101001100译码为第一指令，0010100110译码为第二指令)，然后根据硬件规范信息进行内部模块时序模拟，从而预测出每条指令的模拟结束时间(分别为t0+30cycle和t0+100cycle，其中t0是当前模拟时间)。按模拟结束时间先后放入完成指令队列。对于完成指令队列中的每条指令，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令的功能，即当前模拟时间到达t0+30cycle时执行第一指令的功能，当前模拟时间到达t0+100cycle时执行第二指令的功能。可将模拟执行的结果与AI芯片实际运行的结果对比，从而验证模拟器的性能和功能。Continue to refer to FIG. 3 , which is a schematic diagram of an application scenario of the method for simulating data processing of an artificial intelligence chip according to this embodiment. In the application scenario of FIG. 3 , the terminal device obtains the hardware specification information of the artificial intelligence chip after receiving the sequence of bits to be processed (0101001100, 0010100110). According to the hardware specification information, at least one instruction is parsed from the bit group sequence (0101001100 is decoded as the first instruction, and 0010100110 is decoded as the second instruction). Simulation end time (t0+30cycle and t0+100cycle respectively, where t0 is the current simulation time). According to the simulation end time, they are placed in the completion command queue. For each instruction in the completed instruction queue, in response to detecting that the current simulation time reaches the simulation end time of the instruction, the function of executing the instruction is simulated, that is, the function of executing the first instruction when the current simulation time reaches t0+30cycle, the current simulation The function of executing the second instruction when the time reaches t0+100cycle. The results of the simulation execution can be compared with the results of the actual operation of the AI chip to verify the performance and functionality of the simulator.

本申请的上述实施例提供的方法通过预测指令的结束时间，模拟芯片内部的时序，根据时序模拟的结果决定执行每条指令的顺序，增加模拟运行结果与芯片运行结果的一致性。The method provided by the above-mentioned embodiments of the present application simulates the internal timing of the chip by predicting the end time of the instruction, and determines the order of executing each instruction according to the result of timing simulation, thereby increasing the consistency between the simulation running result and the chip running result.

进一步参考图4，其示出了用于模拟人工智能芯片的数据处理的方法的又一个实施例的流程400。该用于模拟人工智能芯片的数据处理的方法的流程400，包括以下步骤：With further reference to FIG. 4, aflow 400 of yet another embodiment of a method for simulating data processing of an artificial intelligence chip is shown. Theprocess 400 of the method for simulating data processing of an artificial intelligence chip includes the following steps:

步骤401，获取待处理的比特组序列和人工智能芯片的硬件规范信息。Step 401: Acquire the bit group sequence to be processed and the hardware specification information of the artificial intelligence chip.

步骤401与步骤201基本相同，因此不再赘述。Step 401 is basically the same asstep 201, and thus will not be repeated.

步骤402，根据指令解析规则从比特组序列中解析出至少一个指令。Step 402, parse out at least one instruction from the sequence of bit groups according to the instruction parsing rule.

步骤402与步骤202基本相同，因此不再赘述。Step 402 is basically the same asstep 202, and thus will not be repeated.

步骤403，对于至少一个指令中的指令，根据该指令涉及的模块的模块信息模拟该指令被执行的过程。Step 403 , for an instruction in at least one instruction, simulate a process in which the instruction is executed according to the module information of the module involved in the instruction.

在本实施例中，硬件规范信息可包括指令集合中的指令涉及的模块的模块信息。即通过步骤401获得的硬件规范信息可知每条指令需要哪些模块共同完成。一条指令可由一个模块独自完成，例如，加法指令或乘法指令可由运算模块独自完成。而搬运指令需要多个模块配合完成。根据模块的模块信息中的模块之间的互联信息和模块之间交互的协议信息可确定指令相关数据在模块之间的流转过程。不需要真正按照芯片内部的流程去执行指令的功能。只需要在预测的结束时间将各模块的状态更新为芯片实际执行后应有的状态即可。In this embodiment, the hardware specification information may include module information of modules involved in the instructions in the instruction set. That is, through the hardware specification information obtained instep 401, it can be known which modules are required to complete each instruction together. An instruction can be performed by one module alone, for example, an addition instruction or a multiplication instruction can be performed by an arithmetic module alone. The handling instruction requires the cooperation of multiple modules to complete. According to the interconnection information between modules in the module information of the modules and the protocol information of the interaction between the modules, the flow process of the instruction-related data between the modules can be determined. There is no need to actually execute the function of the instruction according to the internal process of the chip. It is only necessary to update the status of each module to the status after the actual execution of the chip at the predicted end time.

为了准确预测每条指令执行结束的时间，模拟器内部通常需要模拟AI芯片内部各个模块的交互过程。举例来说，假设AI芯片内部结构如图5所示，其中模块1负责执行一条数据搬运指令，其功能是从芯片的内存中(DRAM)拷贝一段数据到内部的SRAM。通常来讲，在硬件实现时，该搬运过程会被拆分成多次总线事务(Transaction)来完成。而每笔事务对应的数据地址、长度等都会影响模块1最终完成该指令的时间。因此，模拟器也需要模拟拆分过程造成的影响。另外，如图5所示，除了模块1，模块2和模块3也可能访问DRAM，而模块2也会访问SRAM。因此，模拟器还需要模拟多个模块因共享同一个资源导致仲裁或排队带来的影响。In order to accurately predict when the execution of each instruction ends, the simulator usually needs to simulate the interaction process of each module inside the AI chip. For example, suppose the internal structure of the AI chip is shown in Figure 5, where module 1 is responsible for executing a data transfer instruction, and its function is to copy a piece of data from the chip's memory (DRAM) to the internal SRAM. Generally speaking, in hardware implementation, the transfer process will be split into multiple bus transactions (Transaction) to complete. The data address, length, etc. corresponding to each transaction will affect the time for the module 1 to finally complete the instruction. Therefore, the simulator also needs to simulate the impact of the splitting process. In addition, as shown in Figure 5, in addition to module 1, module 2 and module 3 may also access DRAM, and module 2 may also access SRAM. Therefore, the simulator also needs to simulate the impact of arbitration or queuing caused by multiple modules sharing the same resource.

步骤404，根据模拟该指令被执行的过程确定该指令的完成时间。Step 404: Determine the completion time of the instruction according to the process of simulating the execution of the instruction.

在本实施例中，对于不涉及模块间交互的指令，可根据该指令所涉及的模块的硬件结构信息确定该指令所涉及的模块的内部处理时间作为该指令的完成时间。对于涉及模块间交互的指令，完成时间还需要考虑因共享同一资源导致仲裁或排队的时间，还有不同模块间数据传输所产生的流水线时间。可根据该指令所涉及的模块的模块之间交互的协议信息将该指令执行过程中各模块之间的交互分解为多次事务，以及确定事务涉及的流水线的时间和排队的时间。从请求总线到完成总线使用的操作序列称为总线事务，它是在一个总线周期中发生的一系列活动。典型的总线事务包括请求操作、裁决操作、地址传输、数据传输和总线释放。对于其它的通信连接方式，也需要排队时间。对于涉及模块间交互的指令，指令的完成时间是该指令所涉及的模块的内部处理时间、指令执行过程中事务涉及的流水线时间、排队的时间之和。In this embodiment, for an instruction that does not involve interaction between modules, the internal processing time of the module involved in the instruction can be determined as the completion time of the instruction according to the hardware structure information of the module involved in the instruction. For instructions involving interaction between modules, the completion time also needs to consider the time for arbitration or queuing due to sharing the same resource, as well as the pipeline time generated by data transfer between different modules. The interaction between the modules during the execution of the instruction can be decomposed into multiple transactions according to the protocol information of the interaction between the modules involved in the instruction, and the pipeline time and queuing time involved in the transaction can be determined. The sequence of operations from requesting the bus to completing bus usage is called a bus transaction, which is a series of activities that occur during one bus cycle. Typical bus transactions include request operations, arbitration operations, address transfers, data transfers, and bus releases. For other communication connection methods, queuing time is also required. For an instruction involving interaction between modules, the completion time of the instruction is the sum of the internal processing time of the module involved in the instruction, the pipeline time involved in the transaction during the execution of the instruction, and the queuing time.

步骤405，根据当前模拟时间和完成时间确定该指令的模拟结束时间。Step 405: Determine the simulation end time of the instruction according to the current simulation time and the completion time.

在本实施例中，当前模拟时间是软件模拟芯片内部的时间。模拟结束时间是当前模拟时间与完成时间之和。假设当前模拟时间是t0，步骤404确定出的完成时间是t1。则指令的模拟结束时间为t0+t1。In this embodiment, the current simulation time is the time inside the software simulation chip. The simulation end time is the sum of the current simulation time and the completion time. Assuming that the current simulation time is t0, the completion time determined instep 404 is t1. Then the simulation end time of the instruction is t0+t1.

步骤406，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。Step 406, in response to detecting that the current simulation time reaches the simulation end time of the instruction, simulate and execute the instruction.

在本实施例中，步骤406的当前模拟时间不同于步骤405中的当前模拟时间。当前模拟时间是实时变化的。当检测到更新后的当前模拟时间为步骤405得到的t0+t1时，则模拟执行该指令。模拟执行的步骤与步骤203基本相同，因此不再赘述。In this embodiment, the current simulation time instep 406 is different from the current simulation time instep 405 . The current simulation time changes in real time. When it is detected that the updated current simulation time is t0+t1 obtained instep 405, the instruction is simulated and executed. The steps of the simulation execution are basically the same as those ofstep 203, and thus are not repeated here.

从图4中可以看出，与图2对应的实施例相比，本实施例中的用于模拟人工智能芯片的数据处理的方法的流程400突出了预测涉及模块间交互的指令的完成时间的步骤。由此，本实施例描述的方案可以在模拟芯片内部的时序时，不做实际的数据拷贝和计算，加快模拟器运行速度。As can be seen from FIG. 4 , compared with the embodiment corresponding to FIG. 2 , theflow 400 of the method for simulating data processing of an artificial intelligence chip in this embodiment highlights the importance of predicting the completion time of instructions involving interaction between modules. step. Therefore, the solution described in this embodiment can speed up the running speed of the simulator without performing actual data copying and calculation when simulating the timing inside the chip.

进一步参考图6，作为对上述各图所示方法的实现，本申请提供了一种用于模拟人工智能芯片的数据处理的装置的一个实施例，该装置实施例与图2所示的方法实施例相对应，该装置具体可以应用于各种电子设备中。Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for simulating data processing of an artificial intelligence chip, which is implemented with the method shown in FIG. 2 . Correspondingly, the device can be specifically applied to various electronic devices.

如图6所示，本实施例的用于模拟人工智能芯片的数据处理的装置600包括：获取单元601、解析单元602和模拟单元603。其中，获取单元601被配置成获取待处理的比特组序列和人工智能芯片的硬件规范信息，其中，硬件规范信息包括指令解析规则、支持的指令集合、指令集合中的指令涉及的模块的模块信息。解析单元602被配置成根据指令解析规则从比特组序列中解析出至少一个指令。模拟单元603被配置成对于至少一个指令中的指令，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。As shown in FIG. 6 , theapparatus 600 for simulating data processing of an artificial intelligence chip in this embodiment includes: anacquisition unit 601 , ananalysis unit 602 and asimulation unit 603 . The obtainingunit 601 is configured to obtain the sequence of bits to be processed and hardware specification information of the artificial intelligence chip, wherein the hardware specification information includes instruction parsing rules, supported instruction sets, and module information of modules involved in the instructions in the instruction set . Theparsing unit 602 is configured to parse out at least one instruction from the sequence of bit groups according to instruction parsing rules. Thesimulation unit 603 is configured to, for an instruction in at least one instruction, predict the simulation end time of the instruction according to the module information of the module involved in the instruction, and in response to detecting that the current simulation time reaches the simulation end time of the instruction, simulate and execute the instruction .

在本实施例中，用于模拟人工智能芯片的数据处理的装置600的获取单元601、解析单元602和模拟单元603的具体处理可以参考图2对应实施例中的步骤201、步骤202、步骤203。In this embodiment, for the specific processing of theacquisition unit 601 , theanalysis unit 602 and thesimulation unit 603 of theapparatus 600 for simulating the data processing of an artificial intelligence chip, reference may be made tosteps 201 , 202 and 203 in the corresponding embodiment of FIG. 2 .

在本实施例的一些可选的实现方式中，模块信息包括以下至少一项：模块之间的互联信息、模块的硬件结构信息以及模块之间交互的协议信息。In some optional implementations of this embodiment, the module information includes at least one of the following: interconnection information between modules, hardware structure information of the modules, and protocol information exchanged between modules.

在本实施例的一些可选的实现方式中，模拟单元603进一步被配置成：调用预设的函数来模拟该指令的功能。In some optional implementations of this embodiment, thesimulation unit 603 is further configured to: call a preset function to simulate the function of the instruction.

在本实施例的一些可选的实现方式中，模拟单元603进一步被配置成：将该指令写入共享队列。从共享队列中取出该指令，以及调用预设的函数来模拟该指令的功能。In some optional implementations of this embodiment, thesimulation unit 603 is further configured to: write the instruction into the shared queue. The instruction is fetched from the shared queue, and a preset function is called to simulate the function of the instruction.

在本实施例的一些可选的实现方式中，模拟单元603进一步被配置成：根据该指令涉及的模块的模块信息模拟该指令被执行的过程。根据模拟该指令被执行的过程确定该指令的完成时间。根据当前模拟时间和完成时间确定该指令的模拟结束时间。In some optional implementations of this embodiment, thesimulation unit 603 is further configured to: simulate the process of executing the instruction according to the module information of the modules involved in the instruction. The completion time of the instruction is determined according to the process of simulating the execution of the instruction. Based on the current simulation time and completion time, determine the simulation end time of the instruction.

在本实施例的一些可选的实现方式中，模拟单元603进一步被配置成：根据该指令所涉及的模块的硬件结构信息确定该指令所涉及的模块的内部处理时间。根据该指令所涉及的模块的模块之间交互的协议信息将该指令执行过程中各模块之间的交互分解为多次事务，以及确定事务涉及的流水线的时间和排队的时间。根据该指令所涉及的模块的内部处理时间、指令执行过程中事务涉及的流水线时间、排队的时间确定该指令的完成时间。In some optional implementations of this embodiment, thesimulation unit 603 is further configured to: determine the internal processing time of the module involved in the instruction according to the hardware structure information of the module involved in the instruction. The interaction between the modules during the execution of the instruction is decomposed into multiple transactions according to the protocol information of the interaction between the modules involved in the instruction, and the pipeline time and queuing time involved in the transaction are determined. The completion time of the instruction is determined according to the internal processing time of the module involved in the instruction, the pipeline time involved in the transaction during the execution of the instruction, and the queuing time.

下面参考图7，其示出了适于用来实现本申请实施例的电子设备(如图1所示的终端设备/服务器)的计算机系统700的结构示意图。图7示出的电子设备仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring next to FIG. 7 , it shows a schematic structural diagram of acomputer system 700 suitable for implementing the electronic device (the terminal device/server shown in FIG. 1 ) according to the embodiment of the present application. The electronic device shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

如图7所示，计算机系统700包括中央处理单元(CPU)701，其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中，还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7, acomputer system 700 includes a central processing unit (CPU) 701 which can be loaded into a random access memory (RAM) 703 according to a program stored in a read only memory (ROM) 702 or a program from astorage section 708 Instead, various appropriate actions and processes are performed. In theRAM 703, various programs and data necessary for the operation of thesystem 700 are also stored. TheCPU 701 , theROM 702 , and theRAM 703 are connected to each other through abus 704 . An input/output (I/O)interface 705 is also connected tobus 704 .

以下部件连接至I/O接口705：包括键盘、鼠标等的输入部分706；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707；包括硬盘等的存储部分708；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器710上，以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I/O interface 705: aninput section 706 including a keyboard, a mouse, etc.; anoutput section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; astorage section 708 including a hard disk, etc. ; and acommunication section 709 including a network interface card such as a LAN card, a modem, and the like. Thecommunication section 709 performs communication processing via a network such as the Internet. Adrive 710 is also connected to the I/O interface 705 as needed. Aremovable medium 711, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on thedrive 710 as needed so that a computer program read therefrom is installed into thestorage section 708 as needed.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分709从网络上被下载和安装，和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时，执行本申请的方法中限定的上述功能。需要说明的是，本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via thecommunication portion 709 and/or installed from theremovable medium 711 . When the computer program is executed by the central processing unit (CPU) 701, the above-described functions defined in the method of the present application are performed. It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中，例如，可以描述为：一种处理器包括获取单元、解析单元、模拟单元。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定，例如，获取单元还可以被描述为“获取待处理的比特组序列和所述人工智能芯片的硬件规范信息的单元”。The units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit can also be set in the processor, for example, it can be described as: a processor includes an acquisition unit, a parsing unit, and a simulation unit. Among them, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the acquisition unit can also be described as "a unit that acquires the sequence of bits to be processed and the hardware specification information of the artificial intelligence chip. ".

作为另一方面，本申请还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的装置中所包含的；也可以是单独存在，而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该装置执行时，使得该装置：获取待处理的比特组序列和人工智能芯片的硬件规范信息，其中，硬件规范信息包括指令解析规则、支持的指令集合、指令集合中的指令涉及的模块的模块信息。根据指令解析规则从比特组序列中解析出至少一个指令。对于至少一个指令中的指令，根据该指令涉及的模块的模块信息预测该指令的模拟结束时间，响应于检测到当前模拟时间到达该指令的模拟结束时间，模拟执行该指令。As another aspect, the present application also provides a computer-readable medium, which may be included in the apparatus described in the above-mentioned embodiments, or may exist independently without being assembled into the apparatus. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the apparatus, the apparatus causes the apparatus to: acquire the sequence of bits to be processed and the hardware specification information of the artificial intelligence chip, wherein the hardware specification The information includes instruction parsing rules, supported instruction sets, and module information of modules involved in the instructions in the instruction set. At least one instruction is parsed from the sequence of bit groups according to instruction parsing rules. For an instruction in at least one instruction, the simulation end time of the instruction is predicted according to the module information of the module involved in the instruction, and the instruction is simulated and executed in response to detecting that the current simulation time reaches the simulation end time of the instruction.

以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本申请中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离所述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.