CN108734288B

Movatterモバイル変換

Info

Publication number: CN108734288B
Application number: CN201710269049.0A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2021-01-29
Anticipated expiration: 2037-04-21
Also published as: CN109376852A; CN108734288A; CN109376852B

Abstract

Translated fromChinese

一种运算方法及装置，该运算装置包括输入模块，用于输入数据；模型生成模块，用于根据输入数据构建模型；神经网络运算模块，用于基于模型生成运算指令并缓存，以及根据运算指令对待处理数据进行运算得到运算结果；输出模块，用于输出运算结果。本公开的装置及方法，能够避免运行传统方法中整个软件架构带来的额外开销。

An operation method and device, the operation device includes an input module for inputting data; a model generation module for constructing a model according to the input data; a neural network operation module for generating and buffering operation instructions based on the model, and according to the operation instructions Perform operation on the data to be processed to obtain the operation result; the output module is used to output the operation result. The apparatus and method of the present disclosure can avoid the extra overhead caused by running the entire software architecture in the traditional method.

Description

Operation method and device

Technical Field

The present disclosure relates to the field of computer architecture, deep learning and neural networks, and more particularly, to an operation method and apparatus.

Background

Deep learning is a branch of machine learning that attempts to use algorithms that involve high-level abstractions of data using multiple processing layers that contain complex structures or are composed of multiple nonlinear transformations.

Deep learning is a method based on characterization learning of data in machine learning. An observation (e.g., an image) may be represented using a number of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, a specially shaped region, etc. Tasks (e.g., face recognition or facial expression recognition) are more easily learned from the examples using some specific representation methods.

Several deep learning architectures, such as deep neural networks, convolutional neural networks, deep belief networks, and recurrent neural networks, have been used in the fields of computer vision, speech recognition, natural language processing, audio recognition, and bioinformatics, and have achieved excellent results. In addition, deep learning has become a similar term, or brand remodeling of neural networks.

With the heat of deep learning (neural network), the neural network accelerator also works, and through the design of a special memory and an operation module, the neural network accelerator can obtain an acceleration ratio which is dozens of times or even hundreds of times that of a general processor when the neural network accelerator performs deep learning operation, and has smaller area and lower power consumption.

In order to facilitate the application of neural network accelerators to various network architectures for accelerating operations, programming software libraries and programming frameworks based thereon have also been and are being developed. In a conventional application scenario, a programming framework of a neural network accelerator is usually located at the uppermost layer, and currently, commonly used programming frameworks include Caffe, tensrflow, Torch, and the like, as shown in fig. 1, a neural network accelerator (dedicated hardware for neural network operation), a hardware driver (for software to call the neural network accelerator), a programming library of the neural network accelerator (for providing an interface for calling the neural network accelerator), a programming framework of the neural network accelerator, and a high-level application that needs to perform the neural network operation are sequentially arranged from the bottom layer to the upper layer. In some application scenarios with low memory and strong real-time performance, running the whole software architecture consumes excessive computing resources. Therefore, how to optimize the operation process for a specific application scenario is one of the problems to be solved.

Disclosure of Invention

Based on the above problems, the present disclosure is directed to a computing method and device for solving at least one of the above problems.

In order to achieve the above object, as one aspect of the present disclosure, the present disclosure proposes an arithmetic method comprising the steps of:

when the input data comprises data to be processed, network structure and weight data, executing the following steps:

step 11, inputting and reading input data;

step 12, constructing an offline model according to the network structure and the weight data;

step 13, analyzing the off-line model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;

step 14, according to the operation instruction, operating the data to be processed to obtain an operation result for outputting;

when the input data comprises the data to be processed and the off-line model, executing the following steps:

step 21, inputting and reading input data;

step 22, analyzing the offline model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;

step 23, according to the operation instruction, operating the data to be processed to obtain an operation result for outputting;

when the input data only comprises the data to be processed, the following steps are executed:

step 31, inputting and reading input data;

and step 32, calling the cached operation instruction, and operating the data to be processed to obtain an operation result for outputting.

Further, the step of obtaining the operation result by operating the data to be processed according to the operation instruction is realized by the neural network processing unit.

Further, the neural network processing unit has an instruction cache unit for caching the operation instruction for subsequent calculation call.

Further, the offline models include various neural network models including Cambridge _ model, AlexNet _ model, GoogleNet _ model, VGG _ model, R-CNN _ model, GAN _ model, LSTM _ model, RNN _ model, ResNet _ model, and the like.

Further, the data to be processed is input which can be processed by a neural network.

Further, the data to be processed includes a continuous single picture, voice or video stream.

Further, the network structure includes AlexNet, GoogleNet, ResNet, VGG, R-CNN, GAN, LSTM, RNN, ResNet, and possibly various neural network structures.

In order to achieve the above object, as another aspect of the present disclosure, the present disclosure proposes an arithmetic device comprising:

the input module is used for inputting data, and the data comprises data to be processed, network structure and weight data and/or offline model data;

the model generation module is used for constructing an offline model according to the input network structure and the weight data;

the neural network operation module is used for generating an operation instruction based on the offline model, caching the operation instruction, and operating the data to be processed based on the operation instruction to obtain an operation result;

the output module is used for outputting the operation result;

the control module is used for detecting the type of input data and executing the following operations:

when the input data comprises data to be processed, a network structure and weight data, the control input module inputs the network structure and the weight data into the model generation module to construct an offline model, and controls the neural network operation module to operate the data to be processed input by the input module based on the offline model input by the model generation module;

when the input data comprise data to be processed and an offline model, the control input module inputs the data to be processed and the offline model into the neural network operation module, controls the neural network operation module to generate and cache an operation instruction based on the offline model, and operates the data to be processed based on the operation instruction;

when the input data only comprises the data to be processed, the control input module inputs the data to be processed into the neural network operation module, and controls the neural network operation module to call the cached operation instruction to operate the data to be processed.

Further, the neural network operation module comprises a model analysis unit and a processing unit, wherein:

the model analysis unit is used for generating an operation instruction based on the offline model;

the neural network processing unit is used for caching the operation instruction for subsequent calculation and calling; or calling the cached operation instruction when the input data only comprises the data to be processed, and operating the data to be processed based on the operation instruction to obtain an operation result.

The operation method and the device provided by the disclosure have the following beneficial effects:

1. according to the method and the device, after the off-line model is generated, operation can be directly performed according to the off-line model, and extra overhead caused by running of the whole software framework including a deep learning framework is avoided;

2. the device and the method provided by the disclosure realize more efficient function reconstruction of the neural network processor, so that the neural network processor can fully exert the performance in an application environment with low memory and strong real-time performance, and the operation process is more concise and faster.

Drawings

FIG. 1 is a prior art programming framework;

fig. 2 is a flowchart illustrating an operation method according to an embodiment of the disclosure;

fig. 3 is a structural frame diagram of a computing device according to another embodiment of the disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

In this specification, the various embodiments described below are meant to be illustrative only and should not be construed in any way to limit the scope of the disclosure. The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of exemplary embodiments of the present disclosure as defined by the claims and their equivalents. The following description includes various specific details to aid understanding, but such details are to be regarded as illustrative only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Moreover, descriptions of well-known functions and constructions are omitted for clarity and conciseness. Moreover, throughout the drawings, the same reference numerals are used for similar functions and operations.

The present disclosure discloses an operation method, comprising the steps of:

step 11, inputting and reading input data;

step 21, inputting and reading input data;

step 31, inputting and reading input data;

In some embodiments of the present disclosure, the neural network processing unit operates the data to be processed according to the operation instruction to obtain an operation result; preferably, the neural network processing unit has an instruction cache unit, configured to cache the received operation instruction, where the operation instruction cached in advance is an operation instruction of a previous operation cached by the instruction cache unit.

In some embodiments of the present disclosure, the neural network processing unit further includes a data caching unit, configured to cache the data to be processed.

Based on the above operation method, the present disclosure also discloses an operation device, including:

the output module is used for outputting the operation result;

The neural network operation module comprises a model analysis unit and a neural network processing unit, wherein:

In some embodiments of the present disclosure, the neural network processing unit has an instruction cache unit, configured to cache the operation instruction for a subsequent computation call.

In some embodiments of the disclosure, the offline model is a text file defined according to a specific structure, and can be various neural network models, such as cambric _ model, AlexNet _ model, GoogleNet _ model, VGG _ model, R-CNN _ model, GAN _ model, LSTM _ model, RNN _ model, ResNet _ model, etc., but not limited to these models provided in this embodiment.

In some embodiments of the present disclosure, the data to be processed is input that can be processed with a neural network, such as any of a continuous single picture, a voice or a video stream.

In some embodiments of the present disclosure, the network structure may be various neural network structures, such as AlexNet, GoogleNet, ResNet, VGG, R-CNN, GAN, LSTM, RNN, ResNet, etc., but not limited to these structures proposed in this embodiment.

Specifically, according to the difference of the input data of the input module, the arithmetic device of the present disclosure has the following three working principles:

1. when the data input by the input module is the network structure, the weight data and the data to be processed, the control module controls the input module to transmit the network structure and the weight data to the model generation module and transmit the data to be processed to the model analysis module; the control module controls the model generation module to generate an offline model according to the network structure and the weight data and transmits the generated offline model to the model analysis unit; the control module controls the model analysis unit to analyze the received off-line model to obtain an operation instruction which can be identified by the neural network processing unit, and transmits the operation instruction and the data to be processed to the included neural network processing unit; the neural network processing unit operates the data to be processed according to the received operation instruction to obtain a determined operation result, and transmits the operation result to the output module for output.

2. When the data input by the input module is the offline model and the data to be processed, the control module controls the input module to directly transmit the offline model and the data to be processed to the model analysis unit, and the subsequent working principle is the same as that in the first case.

3. When the data input by the input module only contains the data to be processed, the control module controls the input module to directly transmit the data to be processed to the neural network processing unit through the model analysis unit, and the neural network processing unit performs operation on the data to be processed according to the cached operation instruction to obtain an operation result. This is typically not the case in first-use neural network processors to ensure that there are certain arithmetic instructions in the instruction cache.

Therefore, when the current network operation is different from the offline model of the last network operation, the data input by the input module comprises a network structure, weight data and data to be processed, and the model generation module generates a new offline model and then performs subsequent network operation; when the current network operation is the first network operation and a corresponding offline model is obtained in advance, the data input by the input module comprises the offline model and the data to be processed; when the current network operation is not the first time and is the same as the offline model of the last network operation, the data input by the input module only comprises the data to be processed.

In some embodiments of the present disclosure, the computing device described in the present disclosure is integrated as a sub-module into a central processor module of an entire computer system. The data to be processed and the off-line model are controlled by the central processing unit and transmitted to the arithmetic device. The model analysis unit analyzes the transmitted neural network offline model and generates an operation instruction. Then the operation instruction and the data to be processed are transmitted into the neural network processing unit, the operation result is obtained through operation processing, and the operation result is returned to the main memory unit. In the subsequent calculation process, the network structure is not changed any more, and the neural network calculation can be completed only by continuously transmitting data to be processed, so that an operation result is obtained.

The following describes the computing device and method proposed in the present disclosure in detail by specific embodiments.

Example 1

As shown in fig. 2, the present embodiment provides an operation method, including the following steps:

step 11, inputting and reading input data;

step 14, according to the operation instruction, operating the data to be processed to obtain a neural network operation result for outputting;

step 21, inputting and reading input data;

step 23, according to the operation instruction, operating the data to be processed to obtain a neural network operation result for outputting;

step 31, inputting and reading input data;

and step 32, calling the cached operation instruction, and operating the data to be processed to obtain a neural network operation result for outputting.

Processing the data to be processed according to the operation instruction through a neural network processing unit to obtain an operation result; the neural network processing unit is provided with an instruction cache unit and a data cache unit, and is used for caching the received operation instruction and the data to be processed respectively.

The input network structure provided in this embodiment is AlexNet, the weight data is bvlc _ alexnet.cafemodel, the data to be processed is a continuous single picture, and the offline model is Cambricon _ model.

In summary, the method provided by this embodiment can greatly simplify the operation flow using the neural network processor, and avoid the extra memory and IO overhead of calling the whole set of the conventional programming framework. By applying the method, the neural network accelerator can give full play to the operation performance in the environment with low memory and strong real-time performance.

Example 2

As shown in fig. 3, the present embodiment provides an arithmetic device, including: aninput module 101, amodel generation module 102, a neuralnetwork operation module 103, anoutput module 104 and acontrol module 105, wherein the neuralnetwork operation module 103 comprises amodel analysis unit 106 and aneural network processor 107

The key word of the device is executed in an off-line mode, namely the key word is used for generating an off-line model, then directly generating a related operation instruction by using the off-line model, transmitting weight data and carrying out processing operation on data to be processed. More specifically:

theinput module 101 is configured to input a combination of a network structure, weight data, and to-be-processed data, or a combination of an offline model and to-be-processed data. When the input is the network structure, the weight data and the data to be processed, the network structure and the weight data are transmitted to themodel generation module 102 to generate an offline model for executing the following operations. When the input is the offline model and the data to be processed, the offline model and the data to be processed are directly transmitted to themodel analysis unit 106 to perform the following operations.

Theoutput module 104 is configured to output the determined operation data generated according to the specific network structure and the set of data to be processed. Wherein the output data is computed by theneural network processor 107.

Themodel generating module 102 is configured to generate an offline model for use by a lower layer according to the input network structure parameter and the weight data.

Themodel analyzing unit 106 is configured to analyze the incoming offline model, generate an operation instruction that can be directly sent to theneural network processor 107, and send the data to be processed, which is sent from theinput module 101, to theneural network processor 107.

Theneural network processor 107 is configured to perform an operation according to the transmitted operation instruction and data to be processed, obtain a determined operation result, transmit the operation result to theoutput module 104, and has an instruction cache unit and a data cache unit.

Thecontrol module 105 is configured to detect an input data type and perform the following operations:

when the input data comprises data to be processed, a network structure and weight data, thecontrol input module 101 inputs the network structure and the weight data into themodel generation module 102 to construct an offline model, and controls the neuralnetwork operation module 103 to perform neural network operation on the data to be processed input by theinput module 101 based on the offline model input by themodel generation module 102;

when the input data comprises data to be processed and an offline model, thecontrol input module 101 inputs the data to be processed and the offline model into the neuralnetwork operation module 103, controls the neuralnetwork operation module 103 to generate and cache an operation instruction based on the offline model, and performs neural network operation on the data to be processed based on the operation instruction;

when the input data only includes the data to be processed, thecontrol input module 101 inputs the data to be processed into the neuralnetwork operation module 103, and controls the neuralnetwork operation module 103 to call the cached operation instruction, so as to perform neural network operation on the data to be processed.

The input network structure provided in this embodiment is AlexNet, the weight data is bvlc _ AlexNet. Themodel generation module 102 generates a new offline model Cambricon _ model according to the input network structure and the weight data, and the generated offline model Cambricon _ model can also be used alone as the next input; themodel parsing unit 106 may parse the offline model Cambricon _ model, thereby generating a series of operation instructions. Themodel analysis unit 106 transmits the generated operation instruction to an instruction cache unit on theneural network processor 107, and transmits the input image transmitted by theinput module 101 to a data cache unit on theneural network processor 107.

The processes or methods depicted in the preceding figures may be performed by processing logic that comprises hardware, software, or a combination of both. Although the processes or methods are described above in terms of some sequential operations, it should be understood that some of the operations described may be performed in a different order. Further, some operations may be performed in parallel rather than sequentially.

The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.