CN115762478A

Movatterモバイル変換

Info

Publication number: CN115762478A
Application number: CN202211192057.7A
Authority: CN
Inventors: 项水英; 张天瑞; 郭星星; 张雅慧; 郝跃
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-03-07
Anticipated expiration: 2042-09-28
Also published as: CN115762478B

Abstract

The invention discloses a voice recognition method based on a photon pulse neural network, which comprises the following steps: preprocessing an original voice data set to respectively obtain FBank characteristic values corresponding to a training set and a test set; training the convolutional pulse neural network by using the FBank characteristic values corresponding to the training set; processing FBank characteristic values corresponding to the training set and the test set by using the trained convolutional pulse neural network to obtain high-dimensional characteristics corresponding to the training set and the test set; training the photon pulse neural network by using the high-dimensional characteristics corresponding to the training set; and processing the high-dimensional characteristics corresponding to the test set by using the trained photon pulse neural network to obtain a voice recognition result. The multiple photon pulse neural network-based voice recognition methods provided by the invention not only have the advantages of low power consumption, high speed and short delay, but also have higher network complexity, can classify and recognize standard data sets with larger scale, and are suitable for networks with larger scale.

Description

Translated fromChinese

基于光子脉冲神经网络的语音识别方法Speech Recognition Method Based on Photon Pulse Neural Network

技术领域technical field

本发明属于语音识别技术领域，具体涉及一种基于光子脉冲神经网络的语音识别方法。The invention belongs to the technical field of voice recognition, and in particular relates to a voice recognition method based on a photon pulse neural network.

背景技术Background technique

脉冲神经网络是源于生物启发的新一代人工神经网络模型，其主要以脉冲神经元为基本单元，具有较强的生物基础支撑，属于非常接近大脑神经信号的处理问题的过程，是进行复杂时空信息处理的有效工具，比传统的ANN具有更好的生物学可行性，同时基于稀疏脉冲编码的方式，使其具有硬件友好性和节能性。The spiking neural network is a new generation of artificial neural network model inspired by biology. It mainly uses spiking neurons as the basic unit and has strong biological foundation support. It is a process very close to the processing of brain nerve signals. An effective tool for information processing, it has better biological feasibility than traditional ANN, and based on sparse pulse coding, it is hardware-friendly and energy-saving.

针对语音识别任务，目前大部分方案都是基于脉冲神经网络设计的。例如，方法一(Dennis J,Tran H D,Chng E S.Overlapping sound event recognition using localspectrogram features and the generalised hough transform[J].PatternRecognition Letters,2013,34(9):1085-1093)提出了直接在语谱图上提取局部特征LSF(Local Spectrogram Feature)，并将LSF的时间和频率二维信息转换为脉冲神经网络中相应的点火信息，再连接投票系统进行分类判别；方法二(Dennis J,Yu Q,Tang H,etal.Temporal coding of local spectrogram features for robust sound recognition[C]//2013IEEE International Conference on Acoustics,Speech and SignalProcessing.IEEE,2013:803-807)在方法一的基础上，将提取到的LSF特征首先注入到SOM网络(Self Organising Maps，自组织映射网络)，将SOM的输出作为脉冲神经网络的输入信息，再利用Tempotron学习算法进行学习训练。另外一类方案是将语谱图当作一种特殊的图片，借鉴卷积神经网络的处理方式，例如，方法三(Dong M,Huang X,Xu B.Unsupervisedspeech recognition through spike-timing-dependent plasticity in aconvolutional spiking neural network[J].PloS one,2018,13(11):e0204596)提出了使用卷积脉冲神经网络作为前置网络提取特征，使用softmax进行分类识别；方法四(ZhangZ,Liu Q.Spike-Event-Driven Deep Spiking Neural Network With Temporal Encoding[J].IEEE Signal Processing Letters,2021,28:484-488)同样提出了使用卷积脉冲神经网络作为前置提取特征，但其使用三种不同尺寸的卷积核进行卷积，将提取到的特征传入脉冲神经网络中，使用STDBP算法进行学习。For speech recognition tasks, most current solutions are designed based on spiking neural networks. For example, method one (Dennis J, Tran H D, Chng E S. Overlapping sound event recognition using local spectrogram features and the generalized hough transform [J]. Pattern Recognition Letters, 2013, 34(9): 1085-1093) proposes to directly The local feature LSF (Local Spectrogram Feature) is extracted from the spectrogram, and the time and frequency two-dimensional information of the LSF is converted into the corresponding ignition information in the spiking neural network, and then connected to the voting system for classification and discrimination; method 2 (Dennis J, Yu Q ,Tang H, etal.Temporal coding of local spectrogram features for robust sound recognition[C]//2013IEEE International Conference on Acoustics,Speech and SignalProcessing.IEEE,2013:803-807) On the basis of method one, the extracted The LSF feature is first injected into the SOM network (Self Organizing Maps, self-organizing map network), the output of the SOM is used as the input information of the spiking neural network, and then the Tempotron learning algorithm is used for learning and training. Another type of solution is to treat the spectrogram as a special picture and learn from the processing method of the convolutional neural network. For example, method 3 (Dong M, Huang X, Xu B. Unsupervised speech recognition through spike-timing-dependent plasticity in aconvolutional spiking neural network[J].PloS one,2018,13(11):e0204596) proposed to use convolutional spiking neural network as the front network to extract features, and use softmax for classification recognition; method four (ZhangZ, Liu Q.Spike -Event-Driven Deep Spiking Neural Network With Temporal Encoding[J].IEEE Signal Processing Letters,2021,28:484-488) also proposed to use convolutional spiking neural network as pre-extraction features, but it uses three different sizes The convolution kernel is used for convolution, and the extracted features are passed into the spiking neural network, and the STDBP algorithm is used for learning.

然而，上述语音识别算法在模型使用方面主要采用的是简单理想的脉冲神经元模型，其脉冲用冲激函数表示。这种表示方法缺失了实际产生脉冲的内在机制，无法有效准确地模拟生物神经网络中脉冲的产生、传递过程，以及脉冲神经元所具有的绝对不应期、相对不应期等性质，大大降低了网络复杂度，对于规模较大的网络无法适用。However, the above-mentioned speech recognition algorithm mainly adopts a simple and ideal spiking neuron model in terms of model usage, and its spikes are represented by impulse functions. This representation method lacks the internal mechanism of actually generating pulses, and cannot effectively and accurately simulate the generation and transmission process of pulses in biological neural networks, as well as the properties of absolute refractory period and relative refractory period of spiking neurons, which greatly reduces the It increases the complexity of the network and cannot be applied to large-scale networks.

发明内容Contents of the invention

为了解决现有技术中存在的上述问题，本发明提供了一种基于光子脉冲神经网络的语音识别方法。本发明要解决的技术问题通过以下技术方案实现：In order to solve the above-mentioned problems in the prior art, the present invention provides a speech recognition method based on a photon pulse neural network. The technical problem to be solved in the present invention is realized through the following technical solutions:

一种基于光子脉冲神经网络的语音识别方法，包括：A method for speech recognition based on a photon pulse neural network, comprising:

步骤1：对原始语音数据集进行预处理，得到FBank特征值；其中，所述原始语音数据集包括训练集和测试集，则得到的所述FBank特征值包括训练集对应的FBank特征值和测试集对应的FBank特征值；Step 1: Preprocessing the original voice data set to obtain the FBank feature value; wherein, the original voice data set includes a training set and a test set, and the obtained FBank feature value includes the corresponding FBank feature value and test set of the training set The FBank feature value corresponding to the set;

步骤2：构建卷积脉冲神经网络，并利用所述训练集对应的FBank特征值进行训练，得到训练好的卷积脉冲神经网络；Step 2: Construct a convolutional spiking neural network, and use the FBank eigenvalues corresponding to the training set to train to obtain a trained convolutional spiking neural network;

步骤3：利用训练好的卷积脉冲神经网络对所述训练集对应的FBank特征值和所述测试集对应的FBank特征值进行处理，得到训练集对应的高维度特征Feature和测试集对应的高维度特征Feature；Step 3: Use the trained convolutional pulse neural network to process the FBank eigenvalues corresponding to the training set and the FBank eigenvalues corresponding to the test set to obtain the high-dimensional feature Feature corresponding to the training set and the high-dimensional feature corresponding to the test set. Dimension feature Feature;

步骤4：构建光子脉冲神经网络，并利用所述训练集对应的高维度特征Feature进行训练，得到训练好的光子脉冲神经网络；Step 4: Construct a photon pulse neural network, and use the high-dimensional feature Feature corresponding to the training set to train to obtain a trained photon pulse neural network;

步骤5：利用训练好的光子脉冲神经网络对所述测试集对应的高维度特征Feature进行处理，得到语音识别结果。Step 5: Use the trained photon pulse neural network to process the high-dimensional feature Feature corresponding to the test set to obtain the speech recognition result.

在本发明的一个实施例中，在步骤1中，对原始语音数据集进行预处理，得到FBank特征值，包括：In one embodiment of the present invention, instep 1, the original voice data set is preprocessed to obtain the FBank feature value, including:

利用端点检测技术从所述原始语音数据集中提取有效语音段；Using endpoint detection technology to extract effective speech segments from the original speech data set;

对所述有效语音段进行特征提取，得到FBank特征值。Feature extraction is performed on the effective speech segment to obtain FBank feature values.

在本发明的一个实施例中，步骤2包括：In one embodiment of the invention,step 2 includes:

21)构建包括一个卷积层和一个池化层的卷积脉冲神经网络，并将所述训练集对应的FBank特征值进行时间编码，以将其转换为卷积层IF神经元的点火时间；21) Constructing a convolutional spiking neural network comprising a convolutional layer and a pooling layer, and carrying out time encoding of the FBank eigenvalues corresponding to the training set to convert it to the firing time of the convolutional layer IF neuron;

22)获取所述卷积层中每个IF神经元的点火信息；22) Obtain the firing information of each IF neuron in the convolutional layer;

23)设定抑制策略，以确定需要更新权值的IF神经元；23) Set the suppression strategy to determine the IF neuron that needs to update the weight;

24)基于所述点火信息和所述抑制策略，利用STDP算法更新所述IF神经元的权值，以对网络进行训练；24) based on the ignition information and the suppression strategy, using the STDP algorithm to update the weights of the IF neurons to train the network;

25)重复步骤23)-24)，直至达到预设最大训练次数，得到训练好的卷积脉冲神经网络。25) Steps 23)-24) are repeated until the preset maximum number of training times is reached, and a trained convolutional spiking neural network is obtained.

在本发明的一个实施例中，步骤23)包括：In one embodiment of the present invention, step 23) includes:

设定点火抑制策略：Set the ignition suppression strategy:

针对不同特征图的同一位置，保留点火时间最早的IF神经元；其中，若存在点火时间相同的IF神经元，则从中选择在点火时刻对应的膜电压最大的IF神经元保留；For the same position of different feature maps, retain the IF neuron with the earliest ignition time; among them, if there are IF neurons with the same ignition time, select the IF neuron with the largest membrane voltage corresponding to the ignition time to retain;

设定更新抑制策略：Set update suppression policy:

针对同一张特征图，若相邻位置存在多个IF神经元点火的情况，则确定一个最早点火的IF神经元，并对该IF神经元进行权值更新，剩余相邻的IF神经元不更新；其中，若相邻位置的多个IF神经元点火时间相同，则选取膜电压最大的IF神经元进行权值更新，剩余相邻的IF神经元不更新。For the same feature map, if there are multiple IF neurons ignited in adjacent positions, determine the earliest ignited IF neuron, and update the weight of the IF neuron, and the remaining adjacent IF neurons will not be updated ; Among them, if multiple IF neurons in adjacent positions have the same firing time, the IF neuron with the largest membrane voltage is selected for weight update, and the remaining adjacent IF neurons are not updated.

在本发明的一个实施例中，在步骤24)中，利用STDP算法更新所述IF神经元的权值的公式如下：In one embodiment of the present invention, in step 24), utilize STDP algorithm to update the formula of the weight value of described IF neuron as follows:

其中，t_i表示输入神经元i的点火时间，t_j表示输出神经元j的点火时间；w_ij表示神经元i与神经元j的连接权值；Δw_ij表示神经元i与神经元j之间的权值更新量；α⁺表示神经元i点火时间早于神经元j点火时间情况下，更新表达式的学习率；α^-表示神经元i点火时间早于神经元j点火时间时，更新表达式的学习率。Among them, t_i represents the firing time of input neuron i, t_j represents the firing time of output neuron j; w_ij represents the connection weight between neuron i and neuron j; Δw_ij represents the connection weight between neuron i and neuron j. α⁺ indicates that the learning rate of the update expression is updated when the firing time of neuron i is earlier than that of neuron j; α^- indicates that when the firing time of neuron i is earlier than that of neuron j, update The learning rate of the expression.

在本发明的一个实施例中，步骤4包括：In one embodiment of the present invention,step 4 includes:

41)构建包括输入层、输出层、判别层的光子脉冲神经网络；41) Constructing a photon pulse neural network comprising an input layer, an output layer, and a discrimination layer;

42)对所述光子脉冲神经网络进行初始化，并对所述训练集对应的高维度特征Feature进行时间编码，以将其转换为输出层VCSEL神经元的点火时间；42) Initialize the photon pulse neural network, and time encode the high-dimensional feature Feature corresponding to the training set, so as to convert it into the firing time of the output layer VCSEL neuron;

43)获取所述输出层VCSEL神经元的点火时间；43) Acquire the firing time of the output layer VCSEL neuron;

44)设置判别方式，并基于时间的权值更新所述输出层VCSEL神经元的权值，以对网络进行训练；44) Discrimination mode is set, and the weight of the output layer VCSEL neuron is updated based on the weight of time, to train the network;

45)重复步骤42)-44)，直至达到预设最大训练次数，得到训练好的光子脉冲神经网络。45) Steps 42)-44) are repeated until the preset maximum number of training times is reached, and a trained photon pulse neural network is obtained.

在本发明的一个实施例中，在步骤44)中，设置判别方式包括：In one embodiment of the present invention, in step 44), setting the discrimination method includes:

设定每一个输出VCSEL神经元对应一种样本，当输入为某一样本时，则对应该样本的输出VCSEL神经元最早点火，其余VCSEL神经元在该神经元之后点火或保持静息。It is set that each output VCSEL neuron corresponds to a sample. When the input is a certain sample, the output VCSEL neuron corresponding to the sample is ignited first, and the other VCSEL neurons are ignited or kept quiet after the neuron.

在本发明的一个实施例中，在步骤44)中，基于时间的权值更新所述输出层VCSEL神经元的权值包括：In one embodiment of the present invention, in step 44), updating the weights of the output layer VCSEL neurons based on time weights includes:

针对当前输入样本对应的输出VCSEL神经元n_ref，当且仅当该神经元没有点火时，按照下式更新该神经元n_ref对应的权值大小：For the output VCSEL neuron n_ref corresponding to the current input sample, if and only if the neuron is not ignited, update the weight corresponding to the neuron n_ref according to the following formula:

Δw＝α₁K(t_max-t_i-t_delay)，t_i<t_maxΔw=α₁ K(t_max -t_i -t_delay ), t_i <t_max

其中，Δw表示权值增量，α₁表示对应的正学习率，t_max表示输出层VCSEL神经元在仿真截止时间内输出功率取最大值时对应的时间，t_i表示输入层VCSEL神经元的点火时间，t_delay表示该VCSEL神经元的延迟，K函数对应STDP曲线。Among them, Δw represents the weight increment, α₁ represents the corresponding positive learning rate, t_max represents the corresponding time when the output power of VCSEL neurons in the output layer reaches the maximum value within the simulation cut-off time, t_i represents the time of VCSEL neurons in the input layer The ignition time, t_delay represents the delay of the VCSEL neuron, and the K function corresponds to the STDP curve.

在本发明的一个实施例中，在步骤44)中，基于时间的权值更新所述输出层VCSEL神经元的权值还包括：In one embodiment of the present invention, in step 44), updating the weights of the output layer VCSEL neurons based on time weights also includes:

针对除了当前输入样本对应的输出VCSEL神经元n_ref之外的其他VCSEL神经元n_o：For VCSEL neurons n_o other than the output VCSEL neuron n_ref corresponding to the current input sample:

若其点火时间t_o早于当前输入样本对应的输出VCSEL神经元n_ref的点火时间t_ref，按照下式更新神经元n_ref对应的权值大小：If its firing time t_o is earlier than the firing time t_ref of the output VCSEL neuron n_ref corresponding to the current input sample, update the weight corresponding to the neuron n_ref according to the following formula:

若t_o晚于t_ref，且两者的时间差不超过设定的时间阈值，则按照下式更新神经元n_o对应的权值大小：If t_o is later than t_ref , and the time difference between the two does not exceed the set time threshold, update the weight corresponding to neuron n_o according to the following formula:

其中，α₂表示负常量学习率，t_thre表示设定的时间阈值；Among them, α₂ represents the negative constant learning rate, and t_thre represents the set time threshold;

若t_o晚于t_ref，且两者的时间差大于时间阈值t_thre，则不更新权值。If t_o is later than t_ref and the time difference between the two is greater than the time threshold t_thre , the weight is not updated.

在本发明的一个实施例中，步骤5包括：In one embodiment of the present invention,step 5 includes:

将所述测试集对应的高维度特征Feature输入到训练好的光子脉冲神经网络中进行处理；Input the high-dimensional feature Feature corresponding to the test set into the trained photon pulse neural network for processing;

计算所述光子脉冲神经网络输出层VCSEL神经元的点火情况；Calculate the ignition situation of the VCSEL neuron in the output layer of the photon pulse neural network;

根据输出层VCSEL神经元的点火情况确定当前输入样本的预测标签，完成语音识别。According to the firing situation of VCSEL neurons in the output layer, the predicted label of the current input sample is determined to complete speech recognition.

本发明的有益效果：Beneficial effects of the present invention:

1、本发明提供多个基于光子脉冲神经网络的语音识别方法首先利用卷积脉冲神经网络对语音数据特征进行提取，确保该网络提取的特征值是离散的；然后利用光子脉冲神经网络进行编码识别处理，得到语音识别结果。该方法不仅具有功耗低、速度快、延迟短的优点，还具有较高的网络复杂度，能够对规模较大的标准数据集进行分类识别，适用于规模较大的网络；1. The present invention provides a plurality of speech recognition methods based on the photon pulse neural network. First, the convolution pulse neural network is used to extract the features of the voice data, ensuring that the feature values extracted by the network are discrete; then, the photon pulse neural network is used for encoding recognition Processing to get the speech recognition result. This method not only has the advantages of low power consumption, fast speed, and short delay, but also has high network complexity, can classify and identify large-scale standard data sets, and is suitable for large-scale networks;

2、本发明提供的基于光子脉冲神经网络的语音识别方法，在设计光子脉冲神经网络结构时，是基于实际激光器神经元模型，考虑了实际中的种种约束设计的，因而适用于硬件推理。2. The speech recognition method based on the photon pulse neural network provided by the present invention is based on the actual laser neuron model when designing the photon pulse neural network structure, considering various constraints in practice, so it is suitable for hardware reasoning.

以下将结合附图及实施例对本发明做进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

附图说明Description of drawings

图1是本发明实施例提供的一种基于光子脉冲神经网络的语音识别方法示意图；Fig. 1 is a kind of schematic diagram of the speech recognition method based on photonic impulse neural network provided by the embodiment of the present invention;

图2是本发明实施例提供的一种基于光子脉冲神经网络的语音识别方法的算法框架图；Fig. 2 is the algorithm frame diagram of a kind of speech recognition method based on photonic impulse neural network provided by the embodiment of the present invention;

图3是利用本发明提供的基于光子脉冲神经网络的语音识别方法得到的结果。Fig. 3 is the result obtained by using the speech recognition method based on the photonic pulse neural network provided by the present invention.

具体实施方式Detailed ways

下面结合具体实施例对本发明做进一步详细的描述，但本发明的实施方式不限于此。The present invention will be described in further detail below in conjunction with specific examples, but the embodiments of the present invention are not limited thereto.

实施例一Embodiment one

本发明针对语音识别任务，提出了一种基于光子脉冲神经网络的学习算法。首先利用卷积脉冲神经网络对语音信号特征进行提取，确保该网络提取的特征值是离散的，有利于光子脉冲神经网络进行编码；在光子脉冲神经网络中，提出新的判定标准，并依据此标准设计权值的更新算法。二者结合，实现了光子脉冲神经网络在语音识别方面的应用，使其能够对规模较大的标准数据集进行分类识别。Aiming at the speech recognition task, the invention proposes a learning algorithm based on photon pulse neural network. Firstly, the convolutional pulse neural network is used to extract the characteristics of the speech signal to ensure that the feature values extracted by the network are discrete, which is conducive to the encoding of the photon pulse neural network; in the photon pulse neural network, a new judgment standard is proposed, and based on this An update algorithm for standard design weights. The combination of the two realizes the application of the photonic pulse neural network in speech recognition, enabling it to classify and recognize large-scale standard data sets.

具体的，请联合参见图1-2，图1是本发明实施例提供的一种基于光子脉冲神经网络的语音识别方法示意图，图2是本发明实施例提供的一种基于光子脉冲神经网络的语音识别方法的算法框架图。Specifically, please refer to Figures 1-2. Figure 1 is a schematic diagram of a speech recognition method based on a photon pulse neural network provided by an embodiment of the present invention, and Figure 2 is a schematic diagram of a voice recognition method based on a photon pulse neural network provided by an embodiment of the present invention. Algorithm diagram of speech recognition method.

首先需要说明的是，本发明提供的网络架构需要使用GPU进行网络训练和测试结果。因而使用的主机须装有NVIDIA系列的GPU设备。在具体实现的时候，可将原始语音数据集放在当前工作目录下，并设置数据保存路径在Result文件夹下。First of all, it needs to be explained that the network architecture provided by the present invention needs to use GPU for network training and test results. Therefore, the host used must be equipped with NVIDIA series GPU devices. In the specific implementation, the original voice data set can be placed in the current working directory, and the data storage path can be set in the Result folder.

具体的，本发明的实现步骤包括：Specifically, the implementation steps of the present invention include:

步骤1：对原始语音数据集进行预处理，得到FBank特征值。Step 1: Preprocess the original speech data set to obtain the FBank feature value.

其中，所述原始语音数据集包括训练集和测试集，则得到的所述FBank特征值包括训练集对应的FBank特征值和测试集对应的FBank特征值。Wherein, the original speech data set includes a training set and a test set, and the obtained FBank feature values include the FBank feature values corresponding to the training set and the FBank feature values corresponding to the test set.

本实施例针对语音信号预处理部分，首先设定关键参数，利用端点检测技术从所述原始语音数据集中提取有效语音段。In this embodiment, for the speech signal preprocessing part, key parameters are first set, and effective speech segments are extracted from the original speech data set by using endpoint detection technology.

例如，针对.wav格式的原始语音数据集文件使用经典的双门限检测法，提取有效语音段，去除无效语音段。For example, the classic double-threshold detection method is used for the original speech data set file in .wav format to extract effective speech segments and remove invalid speech segments.

然后，对所述有效语音段进行特征提取，得到FBank特征值。Then, perform feature extraction on the effective speech segment to obtain FBank feature values.

具体而言，可通过设定分帧数量Frames和梅尔滤波器组中滤波器数量Mels，用于提取FBank特征值。Specifically, it can be used to extract the FBank feature value by setting the frame number Frames and the filter number Mels in the Mel filter bank.

对于得到的有效语音段，经过预加重、分帧、加窗、傅里叶变换、梅尔滤波器组滤波，获取FBank特征值矩阵Arr_feature，Arr_feature用图像表示为语谱图，作为一种特殊的灰度图像传入卷积脉冲神经网络。Frames和Mels分别是在分帧操作和梅尔滤波器组滤波操作中对应分帧数量和梅尔滤波器组中的滤波器数量，获得Arr_feature的size为{Frames,Mels}。For the obtained effective speech segment, after pre-emphasis, framing, windowing, Fourier transform, and Mel filter bank filtering, the FBank eigenvalue matrix Arr_feature is obtained. Arr_feature is represented as a spectrogram by image, as a special The grayscale image is fed into a convolutional spiking neural network. Frames and Mels are the number of frames corresponding to the number of frames and the number of filters in the Mel filter bank in the framing operation and the Mel filter bank filtering operation, and the size of the obtained Arr_feature is {Frames, Mels}.

对包括训练集和测试集在内的所有语音数据集，均进行上述操作，得到训练集对应的FBank特征值和测试集对应的FBank特征值。The above operations are performed on all voice data sets including the training set and the test set to obtain the FBank eigenvalues corresponding to the training set and the FBank eigenvalues corresponding to the test set.

步骤2：构建卷积脉冲神经网络，并利用所述训练集对应的FBank特征值进行训练，得到训练好的卷积脉冲神经网络。Step 2: Construct a convolutional spiking neural network, and use the FBank feature values corresponding to the training set for training to obtain a trained convolutional spiking neural network.

21)构建包括一个卷积层和一个池化层的卷积脉冲神经网络；并将所述训练集对应的FBank特征值进行时间编码，以将其转换为卷积层IF神经元的点火时间。21) Constructing a convolutional spiking neural network comprising a convolutional layer and a pooling layer; and encoding the FBank feature value corresponding to the training set to convert it into the firing time of the convolutional layer IF neuron.

具体的，本实施例构建的卷积脉冲神经网络包括一个卷积层和一个池化层，卷积层用于对FBank特征值进行卷积处理，池化层用于获取经卷积处理后得到的更高维度特征Feature。Specifically, the convolutional pulse neural network constructed in this embodiment includes a convolutional layer and a pooling layer. The convolutional layer is used to perform convolution processing on the FBank feature value, and the pooling layer is used to obtain The higher dimensional feature Feature.

对于卷积脉冲神经网络的训练则主要是针对卷积层各神经元权值的确定。The training of the convolutional spiking neural network is mainly aimed at the determination of the weights of each neuron in the convolutional layer.

在训练之前，需要将步骤1中得到的训练集对应的Arr_feature进行时间编码，将Arr_feature反比例地转换为输入层IF神经元的点火时间。Before training, it is necessary to time encode the Arr_feature corresponding to the training set obtained instep 1, and inversely convert the Arr_feature into the firing time of the input layer IF neurons.

22)获取所述卷积层中每个IF神经元的点火信息。22) Obtain the firing information of each IF neuron in the convolutional layer.

在本实施例中，获取所述卷积层中每个IF神经元的点火信息，同时包含局部连接和局部权值共享的卷积特性设定，具体如下：In this embodiment, the ignition information of each IF neuron in the convolution layer is obtained, and the convolution characteristic setting of local connection and local weight sharing is included, as follows:

针对卷积层IF神经元，其膜电压的计算表达式和点火时刻的判断如下：For the IF neurons in the convolutional layer, the calculation expression of the membrane voltage and the judgment of the ignition time are as follows:

t_conv＝t and v_conv＝V(t),when V(t)≥IF_thresholdt_conv ＝t and v_conv ＝V(t), when V(t)≥IF_threshold

其中，s_i和w_i分别表示输入层IF神经元i在t_i时刻产生的脉冲幅度和对应连接权值，其中输入层IF神经元产生的脉冲幅度s_i设置为1。Among them, s_i and w_i represent the pulse amplitude and the corresponding connection weight generated by input layer IF neuron i at time t_i respectively, and the pulse amplitude s_i generated by input layer IF neuron is set to 1.

根据上式，结合局部连接和局部权值共享策略，当卷积层IF神经元的膜电压超过点火阈值IF_threshold后，记录其点火时间和膜电压值{t_conv,v_conv}，且点火后不再产生脉冲。According to the above formula, combined with the local connection and local weight sharing strategy, when the membrane voltage of the IF neuron in the convolutional layer exceeds the ignition threshold IF_threshold, record its ignition time and membrane voltage value {t_conv , v_conv }, and do not Pulse again.

其中，局部连接和局部权值共享均与传统卷积神经网络存在较大差异，具体内容如下：Among them, the local connection and local weight sharing are quite different from the traditional convolutional neural network. The specific content is as follows:

局部连接：结合语谱图的谐波相关性，卷积核窗口在频率轴(Mels)上包含所有频率段，但在时间轴(Frames)上保留局部连接特性。卷积核窗口的尺寸可以表示为{Δt,f}。因此不同于CNN，语谱图经过该卷积核卷积后得到的特征图是一维的，表示在不同时间段内提取到的局部特征。Local connection: Combined with the harmonic correlation of the spectrogram, the convolution kernel window contains all frequency segments on the frequency axis (Mels), but retains the local connection characteristics on the time axis (Frames). The size of the convolution kernel window can be expressed as {Δt,f}. Therefore, unlike CNN, the feature map obtained after the spectrogram is convolved by the convolution kernel is one-dimensional, representing the local features extracted in different time periods.

局部权值共享：结合局部连接，因为特征图表示不同时间段内的局部特征，所以考虑将特征图分为多个段落对应不同的时间段，同一张特征图的不同段对应的卷积核不同，同一段落内的神经元使用同一种卷积核，实现局部权值共享。Local weight sharing: Combined with local connections, because the feature map represents local features in different time periods, it is considered to divide the feature map into multiple segments corresponding to different time periods, and the convolution kernels corresponding to different segments of the same feature map are different. , the neurons in the same paragraph use the same convolution kernel to achieve local weight sharing.

23)设定抑制策略，以确定需要更新权值的IF神经元。23) Set an inhibition strategy to determine the IF neurons whose weights need to be updated.

在得到卷积层IF神经元的点火信息[t_conv，v_conv]后，需要设定抑制策略确定需要进行权值更新的神经元，具体如下：After obtaining the ignition information [t_conv , v_conv ] of the convolutional layer IF neurons, it is necessary to set the suppression strategy to determine the neurons that need to update the weights, as follows:

首先，设定点火抑制策略：对不同特征图的同一位置处，仅能够存在一个点火神经元，其余均被抑制。则具体的点火抑制策略可以描述为：First, set the firing suppression strategy: for the same position of different feature maps, only one firing neuron can exist, and the rest are suppressed. Then the specific ignition suppression strategy can be described as:

然后，设定更新抑制策略：Then, set the update suppression policy:

24)基于所述点火信息和所述抑制策略，利用STDP算法更新所述IF神经元的权值，以对网络进行训练。24) Based on the ignition information and the suppression strategy, use the STDP algorithm to update the weights of the IF neurons to train the network.

具体的，在确定需要更新权值的神经元后，使用STDP算法进行更新，其更新表达式为：Specifically, after determining the neurons whose weights need to be updated, the STDP algorithm is used to update, and the update expression is:

其中，t_i表示输入神经元i的点火时间，t_j表示输出神经元j的点火时间；w_ij表示神经元i与神经元j的连接权值；Δw_ij表示神经元i与神经元j之间的权值更新量；α⁺表示神经元i点火时间早于神经元j点火时间情况下，更新表达式的学习率；α^-表示神经元i点火时间早于神经元j点火时间时，更新表达式的学习率。Among them, t_i represents the firing time of input neuron i, t_j represents the firing time of output neuron j; w_ij represents the connection weight between neuron i and neuron j; Δw_ij represents the connection weight between neuron i and neuron j. α⁺ indicates that the learning rate of the update expression is when the firing time of neuron i is earlier than that of neuron j; α^- indicates that when the firing time of neuron i is earlier than that of neuron j, update The learning rate of the expression.

至此，确定了卷积脉冲神经网络的权值。So far, the weights of the convolutional spiking neural network have been determined.

步骤3：利用训练好的卷积脉冲神经网络对所述训练集对应的FBank特征值和所述测试集对应的FBank特征值进行处理，得到训练集对应的高维度特征Feature和测试集对应的高维度特征Feature。Step 3: Use the trained convolutional pulse neural network to process the FBank eigenvalues corresponding to the training set and the FBank eigenvalues corresponding to the test set to obtain the high-dimensional feature Feature corresponding to the training set and the high-dimensional feature corresponding to the test set. Dimension feature Feature.

31)首先进行初始化，即读取步骤2得到的训练后的权值，并将训练集和测试集对应的FBank特征值数据分别传入网络。31) Initialize first, that is, read the trained weights obtained instep 2, and transfer the FBank eigenvalue data corresponding to the training set and the test set to the network respectively.

32)然后，在卷积脉冲神经网络中引入池化层，提取更高维度的特征Feature。32) Then, a pooling layer is introduced into the convolutional spiking neural network to extract higher-dimensional features.

具体的，按照步骤2中的方式(此处仅考虑点火抑制，不进行更新操作)得到卷积层神经元的点火情况。在卷积层之后添加池化层，对卷积层窗口内的神经元进行池化操作(统计操作)，详情如下：Specifically, according to the method in step 2 (here, only the ignition suppression is considered, no update operation is performed) to obtain the ignition status of the convolutional layer neurons. Add a pooling layer after the convolutional layer, and perform a pooling operation (statistical operation) on the neurons in the convolutional layer window. The details are as follows:

根据步骤22)中局部权值共享划分的特征图内部段落，与对应的池化层IF神经元相连，进行统计池化操作——计算池化层IF神经元的膜电压，其中卷积层IF神经元的脉冲点火强度为1，卷积层和池化层相连的权值等于1。因此，计算池化层IF神经元的膜电压，相当于是在计算相应时间段内点火的卷积层神经元数量。According to the internal paragraphs of the feature map divided by the local weight sharing in step 22), it is connected to the corresponding pooling layer IF neuron, and the statistical pooling operation is performed-calculating the membrane voltage of the pooling layer IF neuron, where the convolutional layer IF The pulse firing intensity of the neuron is 1, and the weight of the connection between the convolutional layer and the pooling layer is equal to 1. Therefore, calculating the membrane voltage of IF neurons in the pooling layer is equivalent to calculating the number of neurons in the convolutional layer fired during the corresponding time period.

通过卷积层和池化层的处理，得到了训练集对应的高维度特征Feature和测试集对应的高维度特征Feature。Through the processing of the convolutional layer and the pooling layer, the high-dimensional feature Feature corresponding to the training set and the high-dimensional feature Feature corresponding to the test set are obtained.

步骤4：构建光子脉冲神经网络，并利用所述训练集对应的高维度特征Feature进行训练，得到训练好的光子脉冲神经网络。Step 4: Construct a photon pulse neural network, and use the high-dimensional feature Feature corresponding to the training set for training to obtain a trained photon pulse neural network.

41)构建包括输入层、输出层、判别层的光子脉冲神经网络。41) Construct a photon pulse neural network including an input layer, an output layer, and a discrimination layer.

42)对所述光子脉冲神经网络进行初始化，并对所述训练集对应的高维度特征Feature进行时间编码，以将其转换为输出层VCSEL神经元的点火时间。42) Initialize the photon pulse neural network, and time encode the high-dimensional feature Feature corresponding to the training set, so as to convert it into the firing time of the VCSEL neuron in the output layer.

在本实施例中，光子脉冲神经网络初始化，包括输入层和输出层规模、训练权值初始化，以及针对更高维度特征Feature的时间编码。In this embodiment, the photon pulse neural network initialization includes input layer and output layer scales, training weight initialization, and time encoding for higher-dimensional features.

具体的，根据使用的数据集确定输出层大小，依据步骤3提取到的Feature特征的维度确定输入层尺寸，同时对网络的权值进行初始化。Specifically, the size of the output layer is determined according to the data set used, the size of the input layer is determined according to the dimension of the Feature feature extracted instep 3, and the weights of the network are initialized at the same time.

同时，由于步骤32)中的统计池化操作，得到的特征值为{f₁,f₂,…,f_n}，因此需要将相应的特征值转换为仿真截止时间T以内的点火时间。At the same time, due to the statistical pooling operation in step 32), the obtained eigenvalues are {f₁ , f₂ ,...,f_n }, so the corresponding eigenvalues need to be converted into ignition times within the simulation deadline T.

43)获取所述输出层VCSEL神经元的点火时间。43) Obtain the firing time of the VCSEL neuron in the output layer.

具体的，根据VCSEL神经元计算表达式，获取输出层VCSEL神经元的输出功率P_out，根据P_out是否超过阈值判断神经元是否点火。Specifically, according to the calculation expression of the VCSEL neuron, the output power P_out of the VCSEL neuron in the output layer is obtained, and whether the neuron is fired is judged according to whether P_out exceeds a threshold.

但是无论神经元是否点火，必然能在仿真截止时间T内求得P_out取最大值时对应的时间t_max，其计算方法为：However, no matter whether the neuron is fired or not, the time t_max corresponding to the maximum value of P_out must be obtained within the simulation deadline T, and its calculation method is:

t_max＝Max(P_out)t_max =Max(P_out )

因此，VCSEL神经元的点火时间t_out可表示为：Therefore, the firing time t_out of a VCSEL neuron can be expressed as:

44)设置判别方式，并基于时间的权值更新所述输出层VCSEL神经元的权值，以对网络进行训练。44) Set the discrimination mode, and update the weights of the VCSEL neurons in the output layer based on the time weights, so as to train the network.

首先，设置判别方式。First, set the discrimination method.

由于输出层大小和数据集的样本种类相同，因此，设定每一个输出VCSEL神经元对应一种样本。设定每一个输出神经元对应一种样本，当输入为某一样本时，则对应该样本的输出VCSEL神经元最早点火，其余VCSEL神经元在该神经元之后点火或保持静息。Since the size of the output layer is the same as the sample type of the data set, it is set that each output VCSEL neuron corresponds to a sample. It is set that each output neuron corresponds to a sample. When the input is a certain sample, the output VCSEL neuron corresponding to the sample is ignited first, and the other VCSEL neurons are ignited or kept quiet after the neuron.

然后，更新算法。Then, update the algorithm.

在本实施例中，对当前输入样本对应的输出VCSEL神经元n_ref和神经元n_ref之外的其他VCSEL神经元n_o的更新分开讨论。In this embodiment, the updating of the output VCSEL neuron n_ref corresponding to the current input sample and other VCSEL neurons n_o other than the neuron n_ref is discussed separately.

情况1：针对当前输入样本对应的输出VCSEL神经元n_ref，当且仅当该神经元没有点火时，需要增加其对应权值大小，则按照下式更新该神经元n_ref对应的权值大小：Case 1: For the output VCSEL neuron n_ref corresponding to the current input sample, if and only if the neuron is not ignited, its corresponding weight needs to be increased, then update the weight corresponding to the neuron n_ref according to the following formula :

其中，Δw表示权值增量，α₁表示对应的正学习率，t_max表示输出层VCSEL神经元在仿真截止时间内输出功率取最大值时对应的时间，t_i表示输入层VCSEL神经元的点火时间，t_delay表示该VCSEL神经元的延迟，K函数对应STDP曲线。Among them, Δw represents the weight increment, α₁ represents the corresponding positive learning rate, t_max represents the corresponding time when the output power of VCSEL neurons in the output layer reaches the maximum value within the simulation deadline, and t_i represents the time of VCSEL neurons in the input layer The ignition time, t_delay represents the delay of the VCSEL neuron, and the K function corresponds to the STDP curve.

具体的，K函数表示将部分STDP曲线映射到[0,t_thre]时间段内，针对Δw＝K(Δt)更新表达式获得更高的分辨率。Specifically, the K function means that part of the STDP curve is mapped to the [0,t_thre ] time period, and the expression is updated for Δw=K(Δt) to obtain higher resolution.

情况2：针对除了当前输入样本对应的输出VCSEL神经元n_ref之外的其他VCSEL神经元n_o：Case 2: For VCSEL neurons n_o other than the output VCSEL neuron n_ref corresponding to the current input sample:

若其点火时间t_o早于当前输入样本对应的输出VCSEL神经元n_ref的点火时间t_ref，仍需增加神经元n_ref对应的权值大小，则仍然按照情况1中的方法更新权值。If the firing time t_o is earlier than the firing time t_ref of the output VCSEL neuron n_ref corresponding to the current input sample, the weight corresponding to the neuron n_ref still needs to be increased, and the weight value is still updated according to the method incase 1.

其中，α₂表示负常量学习率，t_thre表示设定的时间阈值；可以看出，其更新幅度随着时间差的增大而减小。Among them,_α2 represents the negative constant learning rate, and_tthre represents the set time threshold; it can be seen that its update range decreases with the increase of time difference.

若t_o晚于t_ref，且两者的时间差大于时间阈值t_thre，认为这种情况不会影响判断，则不更新权值。If t_o is later than t_ref , and the time difference between the two is greater than the time threshold t_thre , it is considered that this situation will not affect the judgment, and the weight is not updated.

至此，确定了光子脉冲神经网络的权值。So far, the weights of the photon pulse neural network have been determined.

51)将所述测试集对应的高维度特征Feature输入到训练好的光子脉冲神经网络中进行处理；51) Input the high-dimensional feature Feature corresponding to the test set into the trained photon pulse neural network for processing;

52)计算所述光子脉冲神经网络输出层VCSEL神经元的点火情况；52) Calculate the ignition situation of the VCSEL neuron in the output layer of the photon pulse neural network;

52)根据输出层VCSEL神经元的点火情况，结合步骤44)提出的判别方式，确定当前输入样本的预测标签，完成语音识别。52) According to the ignition condition of VCSEL neurons in the output layer, combined with the discrimination method proposed in step 44), determine the prediction label of the current input sample, and complete the speech recognition.

本发明提供多个基于光子脉冲神经网络的语音识别方法首先利用卷积脉冲神经网络对语音数据特征进行提取，确保该网络提取的特征值是离散的；然后利用光子脉冲神经网络进行编码识别处理，得到语音识别结果。该方法不仅具有功耗低、速度快、延迟短的优点，还具有较高的网络复杂度，能够对规模较大的标准数据集进行分类识别，适用于规模较大的网络。同时，由于本发明在设计光子脉冲神经网络结构时，是基于实际激光器神经元模型，考虑了实际中的种种约束设计的，因而适用于硬件推理。The present invention provides a plurality of voice recognition methods based on the photon pulse neural network. Firstly, the convolution pulse neural network is used to extract the voice data features to ensure that the feature values extracted by the network are discrete; Get the speech recognition result. This method not only has the advantages of low power consumption, fast speed, and short delay, but also has high network complexity, can classify and identify large-scale standard data sets, and is suitable for large-scale networks. At the same time, since the present invention designs the photon pulse neural network structure based on the actual laser neuron model and considers various constraints in practice, it is suitable for hardware reasoning.

实施例二Embodiment two

下面通过一个具体例子对本发明提供的基于光子脉冲神经网络的语音识别方法进行举例说明。The speech recognition method based on the photonic pulse neural network provided by the present invention will be illustrated by a specific example below.

步骤一：对原始语音数据集进行预处理Step 1: Preprocessing the original speech dataset

假定原始语音数据集包括1600个训练样本，400个测试样本，保存经过双门限检测法处理后的语音数据。Assume that the original speech data set includes 1600 training samples and 400 test samples, and save the speech data processed by the double threshold detection method.

为获取Fbank特征值，设定分帧数量Frames＝41，梅尔滤波器组中滤波器的数量Mels＝40，得到FBank特征值矩阵Arr_feature，sizeof(Arr_feature)＝{40,41}。将Arr_feature变换为一维向量存储在.csv文件中，同时存储语音文件名称，用于获取label信息。In order to obtain the Fbank feature value, set the number of sub-frames Frames=41, the number of filters in the Mel filter bank Mels=40, and obtain the FBank feature value matrix Arr_feature, sizeof(Arr_feature)={40,41}. Convert Arr_feature into a one-dimensional vector and store it in the .csv file, and store the voice file name at the same time, which is used to obtain the label information.

步骤二：卷积脉冲神经网络训练Step 2: Convolutional Spiking Neural Network Training

先将每条语音数据对应的Arr_feature进行时间编码；First time code the Arr_feature corresponding to each piece of voice data;

然后，训练部分网络参数：Then, train some network parameters:

结合局部连接特性，设置卷积核大小为{40×6}，6表示时间轴上的局部连接，步长Stride＝1，因此得到的特征图的size＝{36×1}。Combined with the local connection characteristics, set the convolution kernel size to {40×6}, 6 represents the local connection on the time axis, and the step size Stride=1, so the size of the obtained feature map={36×1}.

结合局部权值共享特性，设定特征图内分段的大小Local_weight_sharing_size＝4，同一段内神经元共享一种Kernel，不同段内的神经元(同一特征图内)使用的Kernel不同。Combined with the local weight sharing feature, the size of the segment in the feature map is set Local_weight_sharing_size=4, neurons in the same segment share a Kernel, and neurons in different segments (in the same feature map) use different Kernels.

设定卷积层IF神经元的点火阈值IF_threshold＝33，特征图数量Feature_map＝50，计算卷积层IF神经元的膜电压。因此在卷积层得到尺寸为{36×50}的Arr_t和Arr_v矩阵，存储点火信息和对应点火时刻的膜电压。Set the firing threshold IF_threshold=33 of the convolutional layer IF neurons, and the number of feature maps Feature_map=50, and calculate the membrane voltage of the convolutional layer IF neurons. Therefore, the Arr_t and Arr_v matrices with a size of {36×50} are obtained in the convolutional layer, and the ignition information and the membrane voltage corresponding to the ignition moment are stored.

根据抑制策略，依据Arr_t和Arr_v确定需要进行权值更新的神经元{t_conv，v_conv}，根据STDP更新表达式进行更新。According to the suppression strategy, according to Arr_t and Arr_v, determine the neuron {t_conv , v_conv } that needs to update the weight, and update it according to the STDP update expression.

设定卷积脉冲神经网络的训练次数为6，在结束前重复上述步骤，保存训练后的w权值。Set the training times of the convolutional spiking neural network to 6, repeat the above steps before the end, and save the w weight after training.

步骤三：利用训练好的卷积神经网络获取更高维度特征FeatureStep 3: Use the trained convolutional neural network to obtain higher-dimensional features Feature

其中，引入池化层对卷积层进行统计池化操作，池化窗口的大小与Local_weight_sharing_size相同为4，因此得到池化层的尺寸为{9×50}。Among them, the pooling layer is introduced to perform statistical pooling operations on the convolutional layer, and the size of the pooling window is the same as the Local_weight_sharing_size of 4, so the size of the pooling layer is {9×50}.

根据训练后的权值，对训练集和测试集的FBank特征值进行特征提取，得到高维度特征值Feature，将{9×50}转换为一维向量形式{1×450}保存。According to the weights after training, feature extraction is performed on the FBank feature values of the training set and test set to obtain the high-dimensional feature value Feature, and convert {9×50} into a one-dimensional vector form {1×450} to save.

步骤四：训练光子脉冲神经网络Step 4: Train the Photon Spike Neural Network

根据卷积脉冲神经网络中统计池化操作和Local_weight_sharing_size可知，Feature特征值的取值范围是离散的、有限的，其可能的取值为{0,1,2,3,4}，对其进行反比例时间编码，转换后的点火时间为{T,9ns,8ns,7ns,6ns}，其中仿真截止时间T＝15ns，因此计算输出神经元的点火时间t<T，并且设置若神经元未点火时其t＝1s。According to the statistical pooling operation and Local_weight_sharing_size in the convolutional spiking neural network, the value range of the Feature feature value is discrete and limited, and its possible values are {0,1,2,3,4}. Inverse proportional time coding, the converted firing time is {T, 9ns, 8ns, 7ns, 6ns}, where the simulation cut-off time T=15ns, so calculate the firing time of the output neuron t<T, and set if the neuron is not firing Its t=1s.

设置光子脉冲神经网络的输入层大小为450，对应Feature的维度；输出层大小为10，对应样本种类数；初始化权值的尺寸是{450x10}；Set the input layer size of the photon pulse neural network to 450, corresponding to the dimension of Feature; the output layer size to 10, corresponding to the number of sample types; the size of the initialization weight is {450x10};

假设当前输入样本label＝0，在获取10个输出神经元的点火情况{t₀,t₁,…,t₉}后，需要对{t₀}和{t₁,t₂,…,t₉}对应的神经元分别进行更新。Assuming that the current input sample label=0, after obtaining the firing conditions {t₀ ,t₁ ,…,t₉ } of 10 output neurons, it is necessary to compare {t₀ } and {t₁ ,t₂ ,…,t₉ } The corresponding neurons are updated respectively.

对于{t₀}，当且仅当t₀＝1时(未点火)需要对其权值进行更新。无论是否点火，都要记录该神经元t_ref，若未点火t_ref＝t_max。For {t₀ }, its weight needs to be updated if and only when t₀ =1 (no ignition). No matter whether it is fired or not, the neuron t_ref should be recorded, if not fired t_ref =t_max .

对于{t₁,t₂,…,t₉}，当且仅当对应神经元点火时需要进行更新，依据其点火时间t_o，{t₀}的t_ref和有效时间差t_thre＝4ns，对相应神经元的权值进行更新。For {t₁ ,t₂ ,…,t₉ }, it needs to be updated if and only when the corresponding neuron fires, according to its firing time t_o , the t_ref of {t₀ } and the effective time difference t_thre =4ns, for The weights of the corresponding neurons are updated.

设定光子脉冲神经网络的训练次数为50次，在结束前重复上述步骤，保存训练后的w权值。Set the number of training times of the photon pulse neural network to 50, repeat the above steps before the end, and save the w weight after training.

步骤五：利用训练好的网络进行语音识别Step 5: Use the trained network for speech recognition

具体的，输入测试集样本数据，并导入训练后的权值，对样本类别进行推断。请参见图3，图3是利用本发明提供的基于光子脉冲神经网络的语音识别方法得到的结果。Specifically, input the test set sample data, and import the trained weights to infer the sample category. Please refer to FIG. 3 . FIG. 3 is the result obtained by using the speech recognition method based on the photonic pulse neural network provided by the present invention.

从图3可以看出，本发明的方法识别准确率可达93.3％，符合预期。但制约光子脉冲神经网络识别准确率的一个关键因素是输出层VCSEL神经元均未产生脉冲而导致无法判别样本类型(预测类型对应图3中横坐标的X)，该问题的影响要远大于误判的影响，这也正是光子脉冲神经网络较难训练的原因所在。It can be seen from Fig. 3 that the recognition accuracy rate of the method of the present invention can reach 93.3%, which meets expectations. However, a key factor that restricts the recognition accuracy of the photon pulse neural network is that none of the VCSEL neurons in the output layer generate pulses, which leads to the inability to distinguish the sample type (the prediction type corresponds to X in the abscissa in Figure 3), and the impact of this problem is far greater than the error. This is why photon pulsed neural networks are difficult to train.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明，不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于光子脉冲神经网络的语音识别方法，其特征在于，包括：1. A method for speech recognition based on photon pulse neural network, characterized in that, comprising:

2.根据权利要求1所述的基于光子脉冲神经网络的语音识别方法，其特征在于，在步骤1中，对原始语音数据集进行预处理，得到FBank特征值，包括：2. the speech recognition method based on photonic impulse neural network according to claim 1, is characterized in that, in step 1, original speech data set is carried out preprocessing, obtains FBank feature value, comprises:

3.根据权利要求1所述的基于光子脉冲神经网络的语音识别方法，其特征在于，步骤2包括：3. the speech recognition method based on photonic impulse neural network according to claim 1, is characterized in that, step 2 comprises:

4.根据权利要求3所述的基于光子脉冲神经网络的语音识别方法，其特征在于，步骤23)包括：4. the speech recognition method based on photonic impulse neural network according to claim 3, is characterized in that, step 23) comprises:

设定点火抑制策略：Set the ignition suppression strategy:

设定更新抑制策略：Set update suppression policy:

5.根据权利要求3所述的基于光子脉冲神经网络的语音识别方法，其特征在于，在步骤24)中，利用STDP算法更新所述IF神经元的权值的公式如下：5. the speech recognition method based on photon impulse neural network according to claim 3, is characterized in that, in step 24), utilizes STDP algorithm to update the formula of the weight value of described IF neuron as follows:

6.根据权利要求1所述的基于光子脉冲神经网络的语音识别方法，其特征在于，步骤4包括：6. the speech recognition method based on photonic impulse neural network according to claim 1, is characterized in that, step 4 comprises:

7.根据权利要求6所述的基于光子脉冲神经网络的语音识别方法，其特征在于，在步骤44)中，设置判别方式包括：7. the speech recognition method based on photon impulse neural network according to claim 6, is characterized in that, in step 44), setting discrimination mode comprises:

8.根据权利要求7所述的基于光子脉冲神经网络的语音识别方法，其特征在于，在步骤44)中，基于时间的权值更新所述输出层VCSEL神经元的权值包括：8. the speech recognition method based on photon pulse neural network according to claim 7, is characterized in that, in step 44), the weight value updating described output layer VCSEL neuron based on the weight value of time comprises:

9.根据权利要求8所述的基于光子脉冲神经网络的语音识别方法，其特征在于，在步骤44)中，基于时间的权值更新所述输出层VCSEL神经元的权值还包括：9. the speech recognition method based on photon pulse neural network according to claim 8, is characterized in that, in step 44), the weight value updating described output layer VCSEL neuron based on the weight value of time also comprises:

10.根据权利要求1所述的基于光子脉冲神经网络的语音识别方法，其特征在于，步骤5包括：10. the speech recognition method based on photonic pulse neural network according to claim 1, is characterized in that, step 5 comprises: