CN1964244B

Movatterモバイル変換

Info

Publication number: CN1964244B
Application number: CN2005101177279A
Authority: CN
Inventors: 吴倩; 林伯瀚; 林�源; 范莉
Original assignee: XIAMEN ZHISHENG TECHNOLOGY Co Ltd
Current assignee: Beijing Hezhong Sizhuang Space Time Material Union Technology Co ltd
Priority date: 2005-11-08
Filing date: 2005-11-08
Publication date: 2010-04-07
Anticipated expiration: 2025-11-08
Also published as: CN1964244A

Abstract

The disclosed method for transmitting-receiving digital signal by voice coder comprises: converting target signal into key speech character parameter by parameter mapping, synthesizing speech signal on the sending end; sending synthesized signal by GSM or CDMA voice coder; on receiving end, using speech analysis to extract key speech character parameter to recover original digital signal. This invention reduces transmission delay, and ensures interactive service quality.

Description

Translated fromChinese

一种用声码器收发数字信号的方法A Method of Sending and Receiving Digital Signals Using a Vocoder

技术领域technical field

本发明涉及通讯技术领域，特别是涉及一种用声码器收发任意数字信号并通过语音信道传送的方法。The invention relates to the technical field of communication, in particular to a method for sending and receiving arbitrary digital signals with a vocoder and transmitting them through a voice channel.

背景技术Background technique

人类的语音信号在现代电信网络中经数字化编码后加以传送。由于传输信道的带宽限制以及语音通信的质量指标等因素，多种不同的编码技术共存于现代电信网络中。在固定公共电话网络中，语音信号常以波形编码的方式，采用脉冲编码调制(PCM)或自适应差分脉冲编码调制(ADPCM)的编码技术，经数字化编码后以64kbps(PCM)或32kbps(ADPCM)的码率传输。然而，为实现更高的语音压缩率，例如将语音信号压缩至16kbps码率以下，波形编码技术已无能为力。在无线移动电话网络中，受限于可用信道带宽，语音信号则以声码器编码的方式，充分利用人类声道的模型参数及发音机理，在保证一定听觉语音质量的前提下，被压缩到16kbps码率以下传输。如GSM网络中全码率模式下，语音信号经RPE-LTP声码器编码后，以13kbps码率传输；GSM增强型全码率的语音声码器与CDMA网络中使用的EVRC声码器皆采取基于ACELP的技术，在几乎不降低通话质量的前提下，可将语音信号压缩至8-13kbps的码率进行传输；而美国国防部(DoD)使用的CELP声码器可将语音信号压缩至4.8kbps，仍保证不错的通话质量。Human speech signals are digitally encoded and transmitted in modern telecommunications networks. Due to factors such as the bandwidth limitation of the transmission channel and the quality index of voice communication, many different coding techniques coexist in modern telecommunication networks. In the fixed public telephone network, voice signals are often coded in the form of waveforms, using pulse code modulation (PCM) or adaptive differential pulse code ) bit rate transmission. However, in order to achieve a higher voice compression rate, for example, to compress the voice signal to a code rate below 16kbps, the waveform coding technology is powerless. In the wireless mobile phone network, limited by the available channel bandwidth, the voice signal is encoded by a vocoder, making full use of the model parameters and pronunciation mechanism of the human vocal tract, and is compressed to Transmission below 16kbps code rate. For example, in the full bit rate mode in the GSM network, the voice signal is encoded by the RPE-LTP vocoder and transmitted at a bit rate of 13kbps; the GSM enhanced full bit rate voice vocoder is the same as the EVRC vocoder used in the CDMA network. Using ACELP-based technology, the voice signal can be compressed to a bit rate of 8-13kbps for transmission without reducing the call quality; while the CELP vocoder used by the US Department of Defense (DoD) can compress the voice signal to 4.8kbps, still guarantees good call quality.

高度依赖于信源特性的声码器技术虽然实现了对语音信号的高压缩率编码，但是声码器工作原理决定了对于非语音信号的压缩编码则无能为力。众所周知，通过语音信道传送任意数字信号的调制解调技术在使用波形编码方式(PCM或ADPCM)的固定电话网络已被广泛使用。一般地，通过更改(调制)正弦连续波的某些特性，如频率、幅度以及相位等，可代表变化的数字信息码流。当前普遍使用的公共固定电话网(POTS)的数据调制解调器可达到56Kbps的码率。然而，这些数据调制解调技术生成的信号不再具有人类语音的特性，经声码器编解码作用后波形特性如幅度、频率和相位等无法被保存，数字信号因而无法通过基于声码器技术的无线移动通信网络(如GSM、CDMA)的语音信道传送。Although the vocoder technology, which is highly dependent on the characteristics of the signal source, realizes the high compression rate coding of the speech signal, the working principle of the vocoder determines that it is powerless for the compression coding of the non-speech signal. It is well known that the modulation and demodulation technology for transmitting arbitrary digital signals over voice channels has been widely used in fixed telephone networks using waveform coding methods (PCM or ADPCM). Generally, by changing (modulating) some characteristics of the sinusoidal continuous wave, such as frequency, amplitude and phase, etc., it can represent the changing digital information stream. The data modem of the public fixed telephone network (POTS) commonly used at present can reach the code rate of 56Kbps. However, the signals generated by these data modulation and demodulation technologies no longer have the characteristics of human speech, and the waveform characteristics such as amplitude, frequency, and phase cannot be preserved after being coded and decoded by the vocoder, and digital signals cannot be passed through the vocoder-based technology. Voice channel transmission of wireless mobile communication networks (such as GSM, CDMA).

尽管无线移动通信网络(如GSM、CDMA)提供了数据信道(如CSD/HSCSD，GPRS/EDGE，UMTS等)以解决对数字信号的基本传输问题，但一方面由于数据信道的高传输延迟(0.5秒～2秒)以及传输抖动等无法满足交互式实时信号对服务质量的要求；另一方面，电信运营商提供数据信道服务的范围远不如语音服务，且方式各异，因此跨运营商、跨网络或跨国的使用数据信道服务存在诸多互通互联的困难。Although wireless mobile communication networks (such as GSM, CDMA) provide data channels (such as CSD/HSCSD, GPRS/EDGE, UMTS, etc.) second to 2 seconds) and transmission jitter cannot meet the quality of service requirements of interactive real-time signals; There are many difficulties in interconnection and interconnection in the network or cross-border use of data channel services.

发明内容Contents of the invention

本发明的目的是提供一种用声码器收发数字信号的方法。The object of the present invention is to provide a method for sending and receiving digital signals with a vocoder.

为实现上述目的，本发明采取以下技术方案：一种用声码器收发数字信号的方法，其特征在于：将欲传输的源数字信号以参数映射的方式转换为语音合成模型的关键语音特性参数，在发送端通过语音合成处理生成语音信号；合成的语音信号通过GSM或CDMA的声码器发送；在接收端通过语音分析处理提取关键语音特性参数，恢复为原始的数字信号。To achieve the above object, the present invention adopts the following technical solutions: a method for sending and receiving digital signals with a vocoder, characterized in that: the source digital signal to be transmitted is converted into the key speech characteristic parameter of the speech synthesis model in the form of parameter mapping At the sending end, a speech signal is generated through speech synthesis processing; the synthesized speech signal is sent through a GSM or CDMA vocoder; at the receiving end, key speech characteristic parameters are extracted through speech analysis processing, and restored to the original digital signal.

上述的用声码器收发数字信号的方法，它具体包括有以下步骤：(1)对欲传送的源数字信号分帧处理，每一帧数字信号用于合成短时语音信号，将每一帧继续细分为长度不等的子帧，所述子帧的数量至少为三个；(2)将所述子帧对应生成线谱频率系数(LSP)索引、广义激励向量参数索引以及广义激励参数增益索引；(3)将第(2)步中生成的索引值分别在线谱频率系数参数表、广义激励向量参数表以及广义激励增益参数表中进行查表依次生成线谱频率系数参数、广义激励向量参数以及广义激励增益参数；(4)将第(3)步中生成的参数按CELP声码器的原理合成为语音信号；(5)将合成的语音信号通过CDMA或GSM声码器发送；(6)接收端接收到合成的语音信号后，对其进行语音分析，提取出线谱频率系数参数、广义激励向量参数以及广义激励增益参数；(7)将第(6)步中分析出的参数在各自对应的参数表：线谱频率系数参数表、激励向量参数表以及激励增益参数表中进行查表逆向生成线谱频率系数索引、广义激励参数索引以及广义激励参数增益索引；(8)将第(7)步中生成的索引值分别逆向还原为子帧，并将子帧重新组合为一帧，还原为最初的数字信号。所述第(1)步中，将数字信号码流分帧处理，每一帧数字信号码流用于产生10-30毫秒的短时语音信号。所述第(4)步中，将广义激励向量参数和广义激励增益参数首先通过激励信号发生器合成为激励信号，并将线谱频率系数参数经逆矢量量化后生成一线性预测系数，最后将该线性预测系数以及激励信号发生器合成出的激励信号一起输入到线性预测语音合成滤波器合成为语音信号。The above-mentioned method for sending and receiving digital signals with a vocoder specifically includes the following steps: (1) the source digital signal to be transmitted is processed in frames, and each frame of digital signal is used to synthesize a short-term voice signal, and each frame is Continue to subdivide into subframes of different lengths, the number of the subframes is at least three; (2) correspondingly generate the line spectrum frequency coefficient (LSP) index, generalized excitation vector parameter index and generalized excitation parameter for the subframe Gain index; (3) Look up the index values generated in step (2) respectively in the line spectrum frequency coefficient parameter table, generalized excitation vector parameter table and generalized excitation gain parameter table to generate line spectrum frequency coefficient parameters, generalized excitation Vector parameter and generalized excitation gain parameter; (4) the parameter generated in the (3) step is synthesized into speech signal by the principle of CELP vocoder; (5) the speech signal of synthesis is sent by CDMA or GSM vocoder; (6) After the receiving end receives the synthesized voice signal, it conducts voice analysis to extract the line spectrum frequency coefficient parameter, generalized excitation vector parameter and generalized excitation gain parameter; (7) the parameters analyzed in step (6) In the respective corresponding parameter tables: the line spectrum frequency coefficient parameter table, the excitation vector parameter table and the excitation gain parameter table, the table lookup is performed to reversely generate the line spectrum frequency coefficient index, the generalized excitation parameter index and the generalized excitation parameter gain index; (8) The index values generated in step (7) are reversely restored to subframes, and the subframes are reassembled into one frame, and restored to the original digital signal. In the step (1), the digital signal stream is divided into frames, and each frame of the digital signal stream is used to generate a short-term voice signal of 10-30 milliseconds. In the (4) step, the generalized excitation vector parameter and the generalized excitation gain parameter are first synthesized into an excitation signal by the excitation signal generator, and the line spectrum frequency coefficient parameter is generated into a linear prediction coefficient after inverse vector quantization, and finally the The linear prediction coefficients and the excitation signal synthesized by the excitation signal generator are input to the linear prediction speech synthesis filter and synthesized into a speech signal.

上述的用声码器收发数字信号的方法，具体包括有以下步骤：(1)对欲传送的源数字信号分帧处理，每一帧数字信号用于合成短时语音信号，将帧长为N位的一帧继续细分为长度不等的四个子帧，分别为X比特、Y比特、Z比特和G比特的码流，形成四个子帧；(2)X比特码流映射生成线谱频率系数参数索引值，Y比特码流映射生成基音参数索引值，Z比特码流映射生成激励向量参数索引值；G比特码流映射生成激励增益参数索引值；(3)将第(2)步中生成的索引值分别在线谱频率系数参数表、基音参数表、激励向量参数表以及激励增益参数表中查表得到真正的向量参数：线谱频率系数参数、基音参数、激励向量参数以及激励增益参数；(4)将第(3)步中生成的参数按CELP声码器的原理合成为语音信号；(5)将合成的语音信号通过CDMA或GSM声码器发送；(6)接收端接收到合成的语音信号后，对其进行语音分析，提取出线谱频率系数参数、基音参数、激励向量参数以及激励增益参数；(7)将第(6)步中提取出的参数分别在对应的线谱频率系数参数表、基音参数表、激励向量参数表以及激励增益参数表中进行查表逆向生成线谱频率系数参数索引、基音参数索引、激励向量参数索引以及激励增益参数索引；(8)将第(7)步中生成的索引值分别逆向还原为子帧，并将子帧重新组合为一帧，还原为最初的数字信号。所述第(1)步中，将数字信号码流分帧处理，每一帧数字信号码流用于产生10-30毫秒的短时语音信号。所述第(4)步中，将对应X比特码流的线谱频率系数参数量化向量参数经分割矢量量化的逆操作及转换得到线性预测系数参数，用于线性预测语音合成滤波器；将对应Y比特码流的基音参数向量，经基音合成处理生成基音激励信号；将对应Z比特码的激励向量参数，以及对应G比特码流的激励增益参数，输入到激励信号合成模块，生成激励信号；此激励信号以及基音激励信号作用于描述声道特性的线性预测语音合成滤波器，产生人工合成的语音信号。The above-mentioned method for sending and receiving digital signals with a vocoder specifically includes the following steps: (1) the source digital signal to be transmitted is processed in frames, and each frame of digital signal is used to synthesize a short-term voice signal, and the frame length is N A frame of bits is further subdivided into four subframes of different lengths, which are code streams of X bits, Y bits, Z bits and G bits respectively, forming four subframes; (2) X bit code stream mapping generates line spectrum Frequency coefficient parameter index value, Y bit stream mapping generates pitch parameter index value, Z bit stream mapping generates excitation vector parameter index value; G bit stream mapping generates excitation gain parameter index value; (3) the first ( 2) The index values generated in the step are respectively looked up in the line spectrum frequency coefficient parameter table, pitch parameter table, excitation vector parameter table and excitation gain parameter table to obtain the real vector parameters: line spectrum frequency coefficient parameters, pitch parameters, excitation vector parameters And excitation gain parameter; (4) the parameter that generates in the (3) step is synthesized into speech signal by the principle of CELP vocoder; (5) the speech signal of synthesis is sent by CDMA or GSM vocoder; (6) After receiving the synthesized speech signal, the receiving end conducts speech analysis to extract the line spectrum frequency coefficient parameters, pitch parameters, excitation vector parameters and excitation gain parameters; (7) The parameters extracted in step (6) are respectively in Perform a table lookup in the corresponding line spectrum frequency coefficient parameter table, pitch parameter table, excitation vector parameter table and excitation gain parameter table to reversely generate line spectrum frequency coefficient parameter index, pitch parameter index, excitation vector parameter index and excitation gain parameter index; ( 8) Reversely restore the index values generated in step (7) into subframes, recombine the subframes into one frame, and restore the original digital signal. In the step (1), the digital signal stream is divided into frames, and each frame of the digital signal stream is used to generate a short-term voice signal of 10-30 milliseconds. In the described (4) step, the line spectrum frequency coefficient parameter quantization vector parameter corresponding to the X bit stream is obtained through the inverse operation and conversion of the vector quantization of the segmentation to obtain the linear prediction coefficient parameter, which is used for the linear prediction speech synthesis filter; The pitch parameter vector corresponding to the Y bit code stream generates the pitch excitation signal through pitch synthesis processing; the excitation vector parameter corresponding to the Z bit code and the excitation gain parameter corresponding to the G bit code stream are input to the excitation signal synthesis module, An excitation signal is generated; the excitation signal and the pitch excitation signal act on a linear predictive speech synthesis filter describing the characteristics of the vocal tract to generate an artificially synthesized speech signal.

本发明由于采取以上设计，其具有以下优点：The present invention has the following advantages due to the adoption of the above design:

1、本发明提出的方法以一种与电信网络交换及传输设备无关的方式，透明地通过模拟或数字语音信道高质量地传送一定码率的任意数字信号，传输延迟及抖动远低于通过数据信道的方式，保证交互式实时信息收发的服务质量。1. The method proposed by the present invention transparently transmits any digital signal of a certain code rate with high quality through an analog or digital voice channel in a manner that has nothing to do with telecommunication network switching and transmission equipment, and the transmission delay and jitter are much lower than those passed through the data Channel way to ensure the service quality of interactive real-time information sending and receiving.

2、本发明由于只需使用运营商的语音服务，互通互联得到保障，使用范围大大拓宽，用户可在世界上任何有语音服务的地方保证服务质量地传送一定码率的任意数字信号。2. Since the present invention only needs to use the operator's voice service, the intercommunication and interconnection are guaranteed, and the scope of use is greatly expanded. Users can transmit any digital signal with a certain code rate with guaranteed service quality in any place in the world where there is voice service.

3、本发明可以应用于无线移动终端(GSM、CDMA手机，卫星电话等)，固定电话以及计算机设备中，可实现多种特殊及增值服务功能：(1)提高“一键通(PTT：Push-to-Talk)”无线组群通话增值服务的语音传输质量，并使该服务不再依赖于无线数据信道，实现PTT服务的独立运营；(2)为通过无线移动网络语音信道实现保密语音及数据通信提供关键技术支持：由于语音信号经高度数字化加密处理后呈现高度的随机性，已不具有任何语音特性，此技术与装置将使用户在有固话网络(POTS)及GSM/CDMA移动网络覆盖的世界任何地方，进行与现有网络交换及传输设备无关的保密语音及数据通讯。3. The present invention can be applied to wireless mobile terminals (GSM, CDMA mobile phones, satellite phones, etc.), fixed telephones and computer equipment, and can realize multiple special and value-added service functions: (1) improve "Push to Talk" (PTT: Push -to-Talk)" wireless group call value-added service voice transmission quality, and make the service no longer depend on the wireless data channel, and realize the independent operation of the PTT service; Data communication provides key technical support: Since the voice signal is highly random after being processed by a high degree of digital encryption, it no longer has any voice characteristics. Anywhere in the world covered, secure voice and data communications independent of existing network switching and transmission equipment.

4、本发明的第(2)、第(3)步中，将每一子帧的数字信号码流映射为相应的参数索引值而非参数本身，提供了预先选取用于合成语音的关键参数的灵活性：在参数的全部取值空间中选取部分相互差别大，易于提取的参数值纳入相应的参数代码表，对应于由子帧的数字信号码流映射而来的索引值；这样，以降低传输码率为代价保证了相近的输入数字信号产生区别足够大的模拟连续波语音信号，以利于接收端的语音分析处理得到正确的结果，有效降低误码率。4. In the steps (2) and (3) of the present invention, the digital signal code stream of each subframe is mapped to the corresponding parameter index value instead of the parameter itself, and the key parameters selected in advance for synthesizing speech are provided Flexibility: In the entire value space of the parameters, some of the parameter values that are quite different from each other are selected, and the parameter values that are easy to extract are included in the corresponding parameter code table, corresponding to the index value mapped from the digital signal stream of the subframe; in this way, to reduce The cost of the transmission code rate ensures that similar input digital signals can produce analog continuous wave voice signals with a large enough difference, so as to facilitate the voice analysis and processing at the receiving end to obtain correct results and effectively reduce the bit error rate.

附图说明Description of drawings

图1为本发明的结构方框示意图。Fig. 1 is a structural block diagram of the present invention.

图2为本发明一种实施方式的结构方框示意图。Fig. 2 is a schematic structural block diagram of an embodiment of the present invention.

具体实施方式Detailed ways

声码器是一种以人类声道参数模型与发音机理为基础的高压缩率语音编码技术，广泛应用于无线移动通信(GSM及CDMA)、卫星通信等网络系统中，在保证一定听觉质量的前提下，以低码率实现对语音信号的编码收发。然而，工作原理决定了声码器对不具有语音特性的信号无法实现有效编码及收发。本发明提出一种将数字信号通过语音声码器进行收发的技术，无需使用数据信道即可实现对任意数字信号低时延、少抖动的高质量传输。此技术可应用于无线移动及固定通信终端设备中，以一种与网络交换与传输设备无关的方式，通过模拟或数字语音信道传送任意数字信号。The vocoder is a high-compression speech coding technology based on the human vocal tract parameter model and pronunciation mechanism. It is widely used in network systems such as wireless mobile communications (GSM and CDMA) and satellite communications. Under the premise, the encoding and sending of voice signals can be realized at a low bit rate. However, the working principle determines that the vocoder cannot effectively encode and send and receive signals that do not have speech characteristics. The invention proposes a technology for transmitting and receiving digital signals through a voice vocoder, which can realize high-quality transmission of any digital signal with low delay and less jitter without using a data channel. This technology can be applied to wireless mobile and fixed communication terminal equipment, in a way that has nothing to do with network switching and transmission equipment, and transmits any digital signal through analog or digital voice channels.

如图1所示，为本发明所提供的一种用声码器收发数字信号的方法，参考CELP声码器原理，将欲传输的源数字信号以参数映射的方式转换为语音合成模型的关键语音特性参数，在发送端通过语音合成处理生成语音信号；合成的语音信号可通过GSM、CDMA以及其它语音信道传输；在接收端通过语音分析处理提取关键语音特性参数，恢复原始数字信号，实现对任意数字信号的发送与接收。As shown in Figure 1, it is a method for sending and receiving digital signals with a vocoder provided by the present invention. Referring to the principle of the CELP vocoder, the key to converting the source digital signal to be transmitted into a speech synthesis model in the form of parameter mapping Speech characteristic parameters, the speech signal is generated through speech synthesis processing at the sending end; the synthesized speech signal can be transmitted through GSM, CDMA and other speech channels; at the receiving end, key speech characteristic parameters are extracted through speech analysis and processing, and the original digital signal is restored to realize the Send and receive arbitrary digital signals.

具体来讲，该方法包括有以下步骤：(1)对欲传送的源数字信号分帧处理，每帧用于生成长度为10-30毫秒的短时语音信号，根据语音合成模型的参数及合成机理，继续将一帧细分为长度不等的子帧；由于每一子帧将以参数映射的方式产生语音合成模型的关键参数值，所以子帧的数量及长度(以比特位为单位)取决于用于合成语音信号而使用的模型参数种类及每一参数表中包含的表项数目，比如，线谱频率系数(LSP)、广义激励参数、以及广义激励参数增益，这三类参数为各种基于CELP技术的语音合成模型所常用，故所述子帧的数量一般至少为三个，以对应上述三种关键参数；(2)所述子帧以查表的方式实现参数映射：即预先将一定数量的关键参数存入参数表，将所述子帧分别对应成为各参数表的索引值，如线谱频率系数(LSP)参数表索引值、广义激励向量表索引值以及广义激励参数增益表索引值；(3)将第(2)步中生成的索引值分别在线谱频率系数(LSP)参数表、广义激励向量表以及广义激励参数增益表中进行查表依次生成线谱频率系数(LSP)参数、广义激励向量参数以及广义激励增益参数；(4)将第(3)步中生成的参数按照CELP技术的机理合成为语音信号；(5)将合成的语音信号通过声码器(如GSM或CDMA语音声码器)或其它语音信道发送；(6)接收端接收到合成的语音信号后，对其进行语音分析，提取出线谱频率系数(LSP)参数、广义激励向量参数以及广义激励增益参数；(7)将第(6)步中分析出的参数在各自对应的参数表：线谱频率系数(LSP)参数表、广义激励向量参数表以及广义激励增益参数表中进行查表逆向生成线谱频率系数(LSP)索引、广义激励参数索引以及广义激励参数增益索引；(8)将第(7)步中生成的索引值分别逆向还原为子帧，并将子帧重新组合为一帧，还原为最初的数字信号。Specifically, the method includes the following steps: (1) Framing and processing the source digital signal to be transmitted, each frame is used to generate a short-term speech signal with a length of 10-30 milliseconds, according to the parameters of the speech synthesis model and the synthesis The mechanism continues to subdivide a frame into subframes of different lengths; since each subframe will generate the key parameter values of the speech synthesis model in the form of parameter mapping, the number and length of the subframes (in bits) Depending on the type of model parameters used for synthesizing the speech signal and the number of entries contained in each parameter table, such as line spectral frequency coefficients (LSP), generalized excitation parameters, and generalized excitation parameter gains, these three types of parameters are Various speech synthesis models based on CELP technology are commonly used, so the quantity of the subframes is generally at least three, to correspond to the above three key parameters; (2) the subframes realize the parameter mapping in the form of table lookup: namely A certain number of key parameters are stored in the parameter table in advance, and the subframes are respectively corresponding to the index values of each parameter table, such as the index value of the line spectrum frequency coefficient (LSP) parameter table, the index value of the generalized excitation vector table, and the generalized excitation parameter Gain table index value; (3) The index value generated in the step (2) is respectively carried out in the line spectrum frequency coefficient (LSP) parameter table, the generalized excitation vector table and the generalized excitation parameter gain table to look up the table and generate the line spectrum frequency coefficient in turn (LSP) parameter, generalized excitation vector parameter and generalized excitation gain parameter; (4) the parameters generated in the (3) step are synthesized into speech signal according to the mechanism of CELP technology; (5) the speech signal of synthesis is passed through vocoder (such as GSM or CDMA voice vocoder) or other voice channel transmission; (6) after the receiving end receives the synthesized voice signal, it conducts voice analysis to extract line spectrum frequency coefficient (LSP) parameters, generalized excitation vector parameters and Generalized excitation gain parameters; (7) Check the parameters analyzed in step (6) in the respective corresponding parameter tables: line spectrum frequency coefficient (LSP) parameter table, generalized excitation vector parameter table and generalized excitation gain parameter table The table reversely generates the line spectral frequency coefficient (LSP) index, the generalized excitation parameter index and the generalized excitation parameter gain index; (8) Reversely restore the index values generated in step (7) into subframes, and recombine the subframes For one frame, it is restored to the original digital signal.

上述的第(4)步中，将广义激励向量参数和广义激励增益参数首先通过激励信号发生器合成为激励信号，并将线谱频率系数(LSP)参数经逆矢量量化后生成线性预测系数，最后将该线性预测系数以及激励信号发生器合成的激励信号一起输入到线性预测(LPC)语音合成滤波器合成为语音信号。区别于通常的语音合成处理，此处所述的语音合成操作只注重于突出表述该信号所携带的特性参数，而信号本身不必拥有任何语言意义。In the above step (4), the generalized excitation vector parameter and the generalized excitation gain parameter are first synthesized into an excitation signal by an excitation signal generator, and the line spectrum frequency coefficient (LSP) parameter is inverse vector quantized to generate a linear prediction coefficient, Finally, the linear prediction coefficient and the excitation signal synthesized by the excitation signal generator are input to a linear prediction (LPC) speech synthesis filter to synthesize a speech signal. Different from the usual speech synthesis processing, the speech synthesis operation described here only focuses on expressing the characteristic parameters carried by the signal, and the signal itself does not need to have any linguistic meaning.

此外，上述的第(1)步中，之所以将每一帧数字信号码流用于产生10-30毫秒的短时语音信号，主要是考虑完整包括语音的基音频率信息(要求大于10毫秒)并保证语音信号的统计稳定性(要求小于30毫秒)，以确保在接收端：线性预测滤波器可有效描述信号的短时自相关性，即有效描述语音发音的声道模型；以及，基音分析滤波器正确提取基音参数；In addition, in the above-mentioned step (1), the reason why each frame of digital signal stream is used to generate a short-term voice signal of 10-30 milliseconds is mainly to consider the pitch frequency information completely including voice (required to be greater than 10 milliseconds) and Ensure the statistical stability of the speech signal (required to be less than 30 milliseconds), to ensure that at the receiving end: the linear prediction filter can effectively describe the short-term autocorrelation of the signal, that is, the vocal tract model that effectively describes the speech pronunciation; and, the pitch analysis filter The device correctly extracts the pitch parameters;

上述的广义激励参数通常以两种形态存在：一为具有基音周期特性的脉冲串信号，用于合成浊音语音信号；另一种为随机信号(如高斯随机信号等)，用于合成清音语音信号；广义激励参数增益相应地包括用于调节脉冲串信号激励以及随机信号激励的增益参数。为了提高传输码率，可使用增加子帧的数量方法以达到增加帧长(以比特位为单位)的目的，如将基音频率特性参数(包含表达基音频率信息的基音延迟参数以及基音增益参数)作为独立的语音特性参数加以映射，用于合成语音时，可使用更多的子帧数量，相应地，激励参数可仅包含随机信号(如高斯随机信号等)激励。因此，本发明在具体实施的时候，还可以考虑引入基音频率参数(延迟与增益)作为独立的激励信号用于合成语音，这样，可以将一帧细分为长度不等的四个子帧，如图2所示，在发送端，帧长为N位的一帧数字码流被分为长度分别为X比特、Y比特、Z比特和G比特的码流，形成四个子帧；X比特码流通过线谱频率系数参数(LSP)映射生成线谱频率系数参数(LSP)索引值；Y比特码流通过基音参数(基音延迟和基音增益)映射生成基音参数索引值(基音延迟索引和基音增益索引)、Z比特码流通过激励向量参数映射生成激励向量参数索引值；G比特码流通过激励增益参数映射生成激励增益参数索引值；依据各索引值在相应的线谱频率系数参数表、基音参数表、激励向量参数表以及激励增益参数表中查表得到真正的向量参数，即线谱频率系数参数(LSP)、基音参数(基音延迟和基音增益)、激励向量参数以及激励增益参数；The above-mentioned generalized excitation parameters usually exist in two forms: one is a pulse train signal with pitch period characteristics, which is used to synthesize voiced speech signals; the other is a random signal (such as a Gaussian random signal, etc.), which is used to synthesize unvoiced speech signals ; The generalized excitation parameter gain correspondingly includes gain parameters for adjusting burst signal excitation and random signal excitation. In order to improve the transmission code rate, the method of increasing the number of subframes can be used to increase the frame length (in bits), such as the pitch frequency characteristic parameters (including pitch delay parameters and pitch gain parameters expressing pitch frequency information) It is mapped as an independent speech characteristic parameter, and when used for synthesizing speech, more subframes can be used. Correspondingly, the excitation parameters can only include random signal (such as Gaussian random signal, etc.) excitation. Therefore, when the present invention is specifically implemented, it is also possible to consider introducing pitch frequency parameters (delay and gain) as an independent excitation signal for synthesizing speech. In this way, one frame can be subdivided into four subframes with different lengths, such as As shown in Figure 2, at the sending end, a frame of digital code stream with a frame length of N bits is divided into code streams with lengths of X bits, Y bits, Z bits and G bits respectively, forming four subframes; The stream generates the line spectrum frequency coefficient parameter (LSP) index value through the line spectrum frequency coefficient parameter (LSP) mapping; the Y bit code stream generates the pitch parameter index value (the pitch delay index and pitch gain index), the Z-bit code stream generates the excitation vector parameter index value through the excitation vector parameter mapping; the G-bit code stream generates the excitation gain parameter index value through the excitation gain parameter mapping; according to each index value in the corresponding line spectrum frequency coefficient parameter table, pitch parameter table, excitation vector parameter table and excitation gain parameter table to get the real vector parameters, that is, line spectrum frequency coefficient parameter (LSP), pitch parameter (pitch delay and pitch gain), excitation vector parameter and excitation gain parameter;

进一步，将对应X比特码流的LSP量化向量参数经分割矢量量化(Split VQ)的逆操作及转换得到线性预测(LPC)系数参数，用于线性预测(LPC)语音合成滤波器；将对应Y比特码流的基音参数向量(基音延迟/增益)，经基音合成处理生成基音激励信号；将对应Z比特码的激励向量参数，以及对应G比特码流的激励增益参数，输入到激励信号发生器，生成激励信号；此激励信号以及基音激励信号作用于描述声道特性的线性预测(LPC)语音合成滤波器，产生人工合成的语音信号进行传输。Further, obtain the linear prediction (LPC) coefficient parameter through the inverse operation and conversion of the LSP quantization vector parameter corresponding to the X bit code stream through the split vector quantization (Split VQ), and use it for the linear prediction (LPC) speech synthesis filter; The pitch parameter vector (pitch delay/gain) of the Y bit code stream generates the pitch excitation signal through the pitch synthesis process; the excitation vector parameter corresponding to the Z bit code and the excitation gain parameter corresponding to the G bit code stream are input to The excitation signal generator generates an excitation signal; the excitation signal and the pitch excitation signal act on the linear prediction (LPC) speech synthesis filter describing the characteristics of the vocal tract to generate artificially synthesized speech signals for transmission.

此语音信号时域长度一般取为10毫秒至30毫秒之间。如小于10毫秒时，无法完整恢复基音频率信息；而大于30毫秒时，语音信号的统计稳定性将不再存在，因而线性预测模型不再有效。通常，每一帧数字信号可用于合成20毫秒(对应于ACELP，QCELP等)或30毫秒(对应于FS1016 DoD CELP)的语音信号。当以T表示合成语音信号的长度时，理论上可传送的数字信号码率R可表示为：R＝(N/T*1000)bps。The time domain length of the voice signal is generally taken between 10 milliseconds and 30 milliseconds. For example, when it is less than 10 milliseconds, the pitch frequency information cannot be completely recovered; and when it is greater than 30 milliseconds, the statistical stability of the speech signal will no longer exist, so the linear prediction model is no longer valid. Generally, each frame of digital signal can be used to synthesize a voice signal of 20 milliseconds (corresponding to ACELP, QCELP, etc.) or 30 milliseconds (corresponding to FS1016 DoD CELP). When T represents the length of the synthesized speech signal, the theoretically transmittable digital signal code rate R can be expressed as: R=(N/T*1000)bps.

在信号的接收端进行的语音分析处理为上述语音合成的逆向操作，即在最小均方差意义下分析接收信号，提取线性预测滤波器的系数，激励向量参数，激励增益参数以及基音参数。具体地，输入语音信号被首先输入到线性预测(LPC)分析模块，以20毫秒或30毫秒(对应于发送端的设置)为取样窗口，做自相关运算，利用Levinson-Durbin算法得到LPC滤波器的系数；LPC滤波器的系数经切比雪夫多项式(Chebyshev Polynomial)运算转换为频域的LSP系数，经分割矢量量化(Split VQ)算法得到量化的线谱频率系数(LSP)参数；对输入语音信号的基音分析由基音分析模块完成：基音分析的方法既可使用运算量较大的闭环搜索模型(closed-loop)，也可使用简化的开环搜索模型(open-loop)。当使用开环搜索模型(open-loop)时，输入语音信号经线性预测(LPC)语音合成滤波器处理后的残差信号溃入基音分析模块的基音预测滤波器，生成基音残差信号；在此基音残差信号最小均方差意义下，计算得到基音预测滤波器的两个重要参数的最优预测值，即基音延迟与基音增益；激励信号的确定则通过对激励参数表(codebook)的搜索匹配得到：激励信号(由激励向量与激励增益合成)通过线性预测滤波器与基音合成滤波器合成的语音信号与输入语音信号形成残差信号，在此残差信号最小均方差意义下，匹配得到最优激励信号，此激励信号可由激励向量与激励增益参数表示；而激励向量与激励增益参数在各自参数表中对应的索引值即为部分源数字信号；同样之前经线性预测语音分析以及基音分析得到的参数则对应各自的参数编码表分别得到LSP参数索引值和基音参数索引值。所述各索引值引按一定顺序经子帧汇聚处理后，得到每帧N位的输出数字码流。The speech analysis processing performed at the receiving end of the signal is the reverse operation of the above-mentioned speech synthesis, that is, the received signal is analyzed in the sense of minimum mean square error, and the coefficients of the linear prediction filter, excitation vector parameters, excitation gain parameters and pitch parameters are extracted. Specifically, the input speech signal is first input to the linear prediction (LPC) analysis module, with 20 milliseconds or 30 milliseconds (corresponding to the setting of the sending end) as the sampling window, doing autocorrelation calculation, and using the Levinson-Durbin algorithm to obtain the LPC filter Coefficients; the coefficients of the LPC filter are converted into LSP coefficients in the frequency domain through the Chebyshev Polynomial (Chebyshev Polynomial) operation, and the quantized line spectrum frequency coefficients (LSP) parameters are obtained through the split vector quantization (Split VQ) algorithm; for the input speech signal The pitch analysis is completed by the pitch analysis module: the method of pitch analysis can use either a closed-loop search model (closed-loop) with a large amount of computation, or a simplified open-loop search model (open-loop). When using the open-loop search model (open-loop), the residual signal of the input speech signal after the processing of the linear prediction (LPC) speech synthesis filter collapses into the pitch prediction filter of the pitch analysis module to generate the pitch residual signal; In the sense of the minimum mean square error of the pitch residual signal, the optimal prediction value of two important parameters of the pitch prediction filter is calculated, that is, pitch delay and pitch gain; the determination of the excitation signal is by searching the excitation parameter table (codebook) Matching is obtained: the excitation signal (synthesized by the excitation vector and the excitation gain) is synthesized by the linear prediction filter and the pitch synthesis filter and the input speech signal forms a residual signal. In the sense of the minimum mean square error of the residual signal, the matching is obtained The optimal excitation signal, which can be represented by the excitation vector and the excitation gain parameter; and the corresponding index values of the excitation vector and the excitation gain parameter in the respective parameter tables are part of the source digital signal; similarly, the linear prediction speech analysis and the pitch analysis The obtained parameters correspond to the respective parameter coding tables to obtain the LSP parameter index value and the pitch parameter index value respectively. After the index values are aggregated and processed by subframes in a certain order, an output digital code stream of N bits per frame is obtained.

具体实施例：Specific examples:

(1)对欲传输的源数字信号分帧，每帧长度为66比特位，用于合成长度为30毫秒的语音信号；每帧继续细分为四个子帧：子帧1长度为16比特位，子帧2长度为24比特位，子帧3长度为16比特位，子帧4长度为10比特位；各子帧将分别以参数映射的方式产生语音合成模型的关键参数值。(2)长度为16比特位的子帧1作为索引值检索一含有65536个表项的线谱频率系数(LSP)参数表，每表项为一个34比特的线谱频率系数(LSP)量化矢量；长度为24比特位的子帧2，其高14位作为索引值检索一含有16384个表项的基音延迟参数表，每表项为一个28比特的基因延迟参数，而其低10位则作为索引值检索一含有1024个表项的基音增益参数表，每表项为一个20比特的基因增益参数；长度为16比特位的子帧3为索引值检索一含有65536个表项的激励向量参数表，每表项为一个36比特的激励向量参数；长度为10比特位的子帧4作为索引值检索一含有1024个表项的激励增益参数表，每表项为一个20比特的激励增益参数；(3)将第(2)步中生成的索引值分别在线谱频率系数(LSP)参数表、基音参数表，激励向量表以及激励参数增益表中进行查表生成线谱频率系数(LSP)参数、基音参数(延迟与增益)、激励向量参数以及激励增益参数；(4)将第(3)步中生成的参数按照CELP技术的机理合成为语音信号：激励向量经激励增益参数调节后形成的激励信号与基音参数向量(基音延迟/增益)经基音合成处理生成的基音激励信号溃入线性预测(LPC)语音合成滤波单元，所述线性预测(LPC)滤波器的系数参数由线谱频率系数(LSP)量化矢量经逆矢量量化转化得到；(5)将合成的语音信号通过声码器(如GSM或CDMA语音声码器)发送；(6)接收端接收到合成的语音信号后，对其进行语音分析，提取出线谱频率系数(LSP)参数、基音参数、激励向量参数以及激励增益参数：首先，输入语音信号被输入到线性预测(LPC)分析模块，以30毫秒(对应于发送端的设置)为取样窗口，做自相关运算，利用Levinson-Durbin算法得到LPC滤波器的系数；此LPC滤波器的系数经切比雪夫多项式(Chebyshev Polynomial)运算转换为频域的LSP系数，经分割矢量量化(Split VQ)算法得到量化的线谱频率系数(LSP)参数，长度为34比特；对输入语音信号的基音分析由基音分析模块完成，基音分析的方法使用开环搜索模型(open-loop)：输入语音信号经线性预测(LPC)滤波器处理后的残差信号溃入基音分析模块的基音预测滤波器，生成基音残差信号；在此基音残差信号最小均方差意义下，计算得到基音预测滤波器的两个重要参数的最优预测值：28比特位的基音延迟与20比特位的基音增益；激励信号的确定则通过对激励参数表(codebook)的搜索匹配得到：激励信号(由激励向量与激励增益合成)通过线性预测滤波器与基音合成滤波器合成的语音信号与输入语音信号形成残差信号，在此残差信号最小均方差意义下，匹配得到最优激励信号，此激励信号可用36比特位的激励向量与20比特位的激励增益参数表示；(7)将第(6)步中提取出的参数分别在对应的参数表：线谱频率系数(LSP)参数表(含有65536个34比特位的量化LSP参数表项)、基音参数表(含有16384个28比特位的基音延迟参数表项，以及1024个20比特位的基音增益参数表项)，激励向量表(含有65536个36比特位的激励向量参数表项)以及激励参数增益表(含有1025个20比特位的激励增益参数表项)中进行查表逆向生成线谱频率系数(LSP)参数索引、基音参数索引、激励向量参数索引以及激励增益参数索引；(8)将第(7)步中生成的索引值分别逆向还原为子帧，并将子帧重新组合为一帧，还原为源数字信号的一帧，长度为66比特。因此本实施实例中可达到的传输码率为R＝66/30*1000＝2200比特/秒。(1) Frame the source digital signal to be transmitted, each frame length is 66 bits, and is used to synthesize a voice signal with a length of 30 milliseconds; each frame is further subdivided into four subframes: the length of subframe 1 is 16 bits , the length of subframe 2 is 24 bits, the length of subframe 3 is 16 bits, and the length of subframe 4 is 10 bits; each subframe will generate key parameter values of the speech synthesis model in a parameter mapping manner. (2) Subframe 1 with a length of 16 bits is used as an index value to retrieve a line spectrum frequency coefficient (LSP) parameter table containing 65536 entries, each entry being a 34-bit line spectrum frequency coefficient (LSP) quantization vector ; The subframe 2 with a length of 24 bits, its high 14 bits are used as an index value to retrieve a pitch delay parameter table containing 16384 entries, each entry is a 28-bit gene delay parameter, and its low 10 bits are used as The index value retrieves a pitch gain parameter table containing 1024 entries, and each entry is a 20-bit gene gain parameter; the subframe 3 with a length of 16 bits serves as an index value to retrieve an excitation vector parameter containing 65536 entries table, each entry is a 36-bit excitation vector parameter; the subframe 4 with a length of 10 bits is used as an index value to retrieve an excitation gain parameter table containing 1024 entries, and each entry is a 20-bit excitation gain parameter (3) the index values generated in the (2) step are respectively carried out in the line spectrum frequency coefficient (LSP) parameter table, the pitch parameter table, the excitation vector table and the excitation parameter gain table to look up the table to generate the line spectrum frequency coefficient (LSP) parameters, pitch parameters (delay and gain), excitation vector parameters, and excitation gain parameters; (4) the parameters generated in step (3) are synthesized into speech signals according to the mechanism of CELP technology: the excitation vectors are formed after the excitation gain parameters are adjusted The excitation signal of the excitation signal and the pitch parameter vector (pitch delay/gain) are processed into the linear prediction (LPC) speech synthesis filter unit by the pitch excitation signal generated by the pitch synthesis process, and the coefficient parameters of the linear prediction (LPC) filter are determined by the line spectrum frequency Coefficient (LSP) quantization vector obtains through inverse vector quantization conversion; (5) the speech signal of synthesis is sent through vocoder (such as GSM or CDMA speech vocoder); (6) after the receiving end receives the speech signal of synthesis, Carry out speech analysis to it, extract line spectrum frequency coefficient (LSP) parameter, pitch parameter, excitation vector parameter and excitation gain parameter: First, the input speech signal is input to linear prediction (LPC) analysis module, with 30 milliseconds (corresponding to sending Terminal settings) as the sampling window, do autocorrelation calculations, and use the Levinson-Durbin algorithm to obtain the coefficients of the LPC filter; the coefficients of the LPC filter are converted into LSP coefficients in the frequency domain by the Chebyshev Polynomial (Chebyshev Polynomial) operation, and divided The vector quantization (Split VQ) algorithm obtains the quantized line spectrum frequency coefficient (LSP) parameter, and the length is 34 bits; the pitch analysis of the input speech signal is completed by the pitch analysis module, and the method of pitch analysis uses an open-loop search model (open-loop ): the residual signal of the input speech signal is processed by the linear prediction (LPC) filter into the pitch prediction filter of the pitch analysis module to generate the pitch residual signal; in the sense of the minimum mean square error of the pitch residual signal, the calculated The optimal prediction value of two important parameters of the pitch prediction filter: 28-bit pitch delay and 20 The pitch gain of bits; the determination of the excitation signal is obtained by searching and matching the excitation parameter table (codebook): the excitation signal (combined by the excitation vector and the excitation gain) is synthesized by the linear prediction filter and the pitch synthesis filter. The input speech signal forms a residual signal. Under the meaning of the minimum mean square error of the residual signal, the optimal excitation signal is obtained by matching. This excitation signal can be represented by a 36-bit excitation vector and a 20-bit excitation gain parameter; (7) The parameters extracted in the step (6) are respectively in the corresponding parameter tables: the line spectrum frequency coefficient (LSP) parameter table (contains 65536 quantized LSP parameter table entries of 34 bits), the pitch parameter table (contains 16384 28-bit bit pitch delay parameter table entry, and 1024 20-bit pitch gain parameter table entries), excitation vector table (containing 65536 36-bit excitation vector parameter table entries) and excitation parameter gain table (containing 1025 20-bit In the excitation gain parameter entry of bit), carry out look-up table reverse generation line spectrum frequency coefficient (LSP) parameter index, pitch parameter index, excitation vector parameter index and excitation gain parameter index; (8) generate in the step (7) The index values are reversely restored to subframes, and the subframes are reassembled into a frame, which is restored to a frame of the source digital signal with a length of 66 bits. Therefore, the achievable transmission code rate in this implementation example is R=66/30*1000=2200 bits/second.

本发明由于采取以上设计，其具有以下特点：The present invention has the following characteristics due to the adoption of the above design:

Claims

1. method with vocoder transmitting-receiving digital signal is characterized in that: the source digital signal that tendency to develop is defeated is converted to the crucial characteristics of speech sounds parameter of phonetic synthesis model with parameter mapping, handles generating voice signal by phonetic synthesis at transmitting terminal; Synthetic voice signal sends by the vocoder of GSM or CDMA; Handle the crucial characteristics of speech sounds parameter of extraction at receiving terminal by speech analysis, revert to original digital signal.

3. method according to claim 2 is characterized in that: in described (1) step, digital channel number flow point frame is handled, each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond.

4. according to claim 2 or 3 described methods, it is characterized in that: in described (4) step, broad sense excitation vector parameter and broad sense excitation gain parameter are at first synthesized pumping signal by excitation signal generator, and line spectral frequencies coefficient parameter quantized back generation one linear predictor coefficient through inverse vector, the pumping signal that this linear predictor coefficient and excitation signal generator are synthesized is input to the linear prediction speech synthesis filter together and synthesizes voice signal at last.

5. method according to claim 1, it is characterized in that: described method with vocoder transmitting-receiving digital signal, it specifically includes following steps: the source digital signal that (1) is sent tendency to develop divides frame to handle, each frame of digital signal is used for synthetic Short Time Speech signal, with frame length is that a frame of N position continues to be subdivided into four subframes that are uneven in length, be respectively the code stream of X bit, Y bit, Z bit and G bit, form four subframes; (2) mapping of X bit stream generates line spectral frequencies coefficient parameter reference value, and the mapping of Y bit stream generates fundamental tone parameter reference value, and the mapping of Z bit stream generates excitation vector parameter reference value; The mapping of G bit stream generates excitation gain parameter reference value; (3) index value that generates in (2) step is tabled look-up in line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and excitation gain parameter list respectively obtain real vectorial parameter: line spectral frequencies coefficient parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter; (4) parameter that generates in (3) step is synthesized voice signal by the principle of CELP vocoder; (5) synthetic voice signal is sent by CDMA or GSM vocoder; (6) after receiving terminal receives synthetic voice signal, it is carried out speech analysis, extract line spectral frequencies coefficient parameter, fundamental tone parameter, excitation vector parameter and excitation gain parameter; (7) parameter that extracts in (6) step is tabled look-up in line spectral frequencies coefficient parameter list, fundamental tone parameter list, excitation vector parameter list and the excitation gain parameter list of correspondence respectively reverse generation line spectral frequencies coefficient parameter reference, fundamental tone parameter reference, excitation vector parameter reference and excitation gain parameter reference; (8) with the reverse respectively subframe that is reduced to of index value that generates in (7) step, and subframe is reconfigured is a frame, is reduced to initial digital signal.

6. method according to claim 5 is characterized in that: in described (1) step, digital channel number flow point frame is handled, each frame of digital signal code stream is used to produce the Short Time Speech signal of 10-30 millisecond.

7. according to claim 5 or 6 described methods, it is characterized in that: in described (4) step, with the line spectral frequencies coefficient parameter quantification of corresponding X bit stream vector parameter through cutting apart vector quantization inverse operation and be converted to the linear predictor coefficient parameter, be used for the linear prediction speech synthesis filter; With the fundamental tone parameter vector of corresponding Y bit stream, generate the fundamental tone pumping signal through synthetic processing of fundamental tone; With the excitation vector parameter of corresponding Z bit code, and the excitation gain parameter of corresponding G bit stream, be input to the pumping signal synthesis module, generate pumping signal; This pumping signal and fundamental tone pumping signal act on the linear prediction speech synthesis filter of describing the sound channel characteristic, produce artificial synthetic voice signal.