JPH0573092A

Movatterモバイル変換

Info

Publication number: JPH0573092A
Application number: JP3262889A
Authority: JP
Inventors: Maki Miyamoto; 牧宮本; Yukio Mitome; 幸夫三留
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1991-09-13
Filing date: 1991-09-13
Publication date: 1993-03-26

Abstract

PURPOSE:To simplify a selecting process by using various tables when unnaturalness is reduced by using data selected in diverse continuous speeches as to respective unit speeches for speech synthesis for synthesizing an optional word. CONSTITUTION:Speech parameters obtained by analyzing a natural speech which is continuously voiced in advance, correspondence relation between unit speeches and the speech parameter, and a phoneme series in the voicing of the speech parameters are stored in a unit speech data table 4, which is referred to for each unit speech according to phoneme and rhythm information on an inputted character string to select the best unit speech in unit speech data according to a determined selection reference; and the speech parameter of the unit speech data selected by extraction 6 from the speech parameters 7 according to the information in the unit speech data table is used to synthesize a speech.

Description

Translated fromJapanese

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字列の入力に基づ
き、音声を合成し、出力する規則音声合成方式に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a regular voice synthesizing system for synthesizing and outputting a voice based on a character string input.

【０００２】[0002]

【従来の技術】第一の従来例として、三留らによっ
て、’８５年の音声研究会資料Ｓ８５−３１（１９８５
−７）にホルマント、ＣＶ−ＶＣ型規則合成と題して発
表されたものがある。これは、出力内容の多様性に対処
するため、ＣＶ，ＶＣ（Ｃは子音、Ｖは母音）を単位と
して音声パラメータを編集し、合成音声を出力する任意
語の音声合成方式である。これに類して、音素、ＣＶ
Ｃ，ＶＣＶなどを単位音声として合成を行うものが多数
ある。これらの単位音声は、通常、各一個、予め準備さ
れる。2. Description of the Related Art As a first conventional example, by Sandome et al., S85-31 (1985), audio study group material.
-7) was published under the title of formant and CV-VC type rule composition. This is a voice synthesis method for an arbitrary word in which voice parameters are edited in units of CV and VC (C is a consonant, V is a vowel) to output a synthesized voice in order to cope with a variety of output contents. Similar to this, phonemes, CV
There are many devices that synthesize C, VCV, etc. as a unit voice. Each of these unit voices is usually prepared in advance.

【０００３】しかし、自然な合成音声を作成するために
は、出力内容の変化に対応して複数の単位音声（もしく
は音声単位）が必要であることが指摘されてきた。However, it has been pointed out that a plurality of unit voices (or voice units) are required corresponding to changes in output contents in order to create a natural synthesized voice.

【０００４】第二の従来例としては、特開平１−２０９
５００がある。これは、前記の指摘された問題点に対処
するために、予め任意の長さの音韻連接単位からなる音
声単位集合を準備し、入力に応じて使用可能性のあるす
べての候補を選出し、その中から所定の基準に基づいて
使用すべき音声単位を決定する合成方式を提案してい
る。As a second conventional example, Japanese Patent Laid-Open No. 1-209
There are 500. In order to deal with the problems pointed out above, a speech unit set consisting of phoneme concatenation units of arbitrary length is prepared in advance, and all possible candidates are selected according to the input, We propose a synthesis method that decides the voice unit to be used based on a predetermined standard.

【０００５】[0005]

【発明が解決しようとする課題】第一の従来例の合成方
式では、予め準備された単位音声が各一個と少ない。そ
のため、様々な出力内容であっても、同じ単位音声を使
用する箇所では、同一のデータが選択されて用いられ
る。このような画一的なデータを用いて合成される音声
は、不自然なものとなる。より自然な合成音声を作成す
るためには、より多種類の単位音声、もしくは各単位音
声についてより多数のデータを準備し、効率よく適切な
単位音声を選択しなければならないという課題が生じ
る。In the synthesis method of the first conventional example, the number of unit voices prepared in advance is as small as one. Therefore, even if the output contents are various, the same data is selected and used at a location where the same unit voice is used. The voice synthesized by using such uniform data becomes unnatural. In order to create a more natural synthesized voice, a problem arises in that more types of unit voices or more data for each unit voice must be prepared and an appropriate unit voice must be selected efficiently.

【０００６】しかしながら、従来の技術のうち多種類の
音声単位を準備して合成音声を作成する第二の場合に
は、任意の長さの音韻連接単位を音声単位とするため、
入力文字列を音声単位に分解する際の処理の煩雑化や、
種類拡張の方針が適当でないと効率よくデータの多様化
がなされない、使用単位決定時の判定基準の複雑化など
の新たな問題が生じていた。However, in the second case of preparing various types of voice units among the conventional techniques to create a synthesized voice, since a phoneme concatenation unit of an arbitrary length is used as the voice unit,
Complicated processing when decomposing the input character string into voice units,
If the type expansion policy is not appropriate, new problems such as not being able to diversify data efficiently and making the judgment criteria when deciding the unit of use occur.

【０００７】このほかに、固定的な単位音声を使用する
合成方式の場合には、従来、単位音声収録の際に単位音
声ごとの孤立発声を行わせることが多かった。強制的に
孤立発声された単位音声データの周波数特性は、通常の
連続発声中に当該単位音声が出現する場合の周波数特性
との差異が大きく、このようなデータを接続して連続的
に合成音声を作成しても、聴感上、自然な音声を得るこ
とは難しいという課題があった。In addition to this, in the case of the synthesizing method using a fixed unit voice, conventionally, isolated voice is often made for each unit voice when recording the unit voice. The frequency characteristics of the unit speech data forcibly isolated are largely different from the frequency characteristics when the unit speech appears during normal continuous speech. However, there is a problem in that it is difficult to obtain a natural sound from the viewpoint of hearing even when creating.

【０００８】この発明は、上記のような課題を解決する
ためのものであり、単位音声の種類を予め限定した上
で、各単位音声を含む多数の連続発声された音声の中か
ら、予め定めた選択時の評価関数を用いて最適な連続音
声を選択し、選択した音声の音声パラメータの中から当
該単位音声を抽出して、簡単に、より自然性の高い合成
音声を作成することができる音声合成方式を提供するこ
とを目的とする。The present invention is intended to solve the above-mentioned problems, in which the kind of the unit voice is limited in advance, and a predetermined voice is selected from a large number of continuously uttered voices including each unit voice. It is possible to easily create a more natural synthesized speech by selecting the optimum continuous speech using the selected evaluation function and extracting the unit speech from the speech parameters of the selected speech. The purpose is to provide a speech synthesis method.

【０００９】[0009]

【課題を解決するための手段】本願発明は、予め連続発
声された自然音声を分析した音声パラメータと、予め定
められた単位音声ごとに、当該単位音声と前記音声パラ
メータとの対応関係と当該音声パラメータの発声時の音
韻列を単位音声データテーブルに記憶し、入力された文
字列の音韻、韻律情報に基づいて、入力文字列を単位音
声に分解し、各単位音声ごとに、単位音声データテーブ
ルを参照し、前記単位音声データ中から当該単位音声に
ついて最適なものを選択し、選択された単位音声データ
の音声パラメータを使用して音声合成を行うものであ
る。SUMMARY OF THE INVENTION According to the present invention, a voice parameter obtained by analyzing a natural voice that is continuously uttered in advance, and a correspondence relationship between the unit voice and the voice parameter for each predetermined unit voice and the voice The phoneme string when the parameter is uttered is stored in the unit voice data table, the input character string is decomposed into unit voices based on the phoneme and prosody information of the input character string, and the unit voice data table is provided for each unit voice. With reference to, the optimum voice for the unit voice is selected from the unit voice data, and voice synthesis is performed using the voice parameter of the selected unit voice data.

【００１０】第一の発明は、単位音声データの中から当
該単位音声について最適なものを選択する場合に、前記
単位音声データ中における当該単位音声の前後の音韻列
に対し、当該単位音声との位置関係に応じた重み付けを
行い、入力される音韻情報中における当該単位音声の前
後の音韻列と、前記単位音声データ中における当該単位
音声の前後の音韻列の一致状況を予め定めた評価関数に
基づいて評価し、最適と判定される単位音声データを選
択することを特徴とする。According to the first aspect of the present invention, when selecting the optimum one for the unit voice from the unit voice data, the phoneme sequence before and after the unit voice in the unit voice data is regarded as the unit voice. Weighting is performed according to the positional relationship, and the matching situation between the phoneme sequence before and after the unit voice in the input phoneme information and the phoneme sequence before and after the unit voice in the unit voice data is set to a predetermined evaluation function. It is characterized in that the unit voice data which is evaluated based on the evaluation is determined to be optimum.

【００１１】第二の発明は、単位音声データの中から当
該単位音声について最適なものを選択する場合に、前記
単位音声データ中における当該単位音声の前後の音韻列
とアクセント記号列に対し、当該単位音声との位置関係
に応じた重み付けを行い、入力される音韻情報中におけ
る当該単位音声の前後の音韻列およびアクセント記号列
と、前記単位音声データとされた連続音声中における当
該単位音声の前後の音韻列およびアクセント記号列の一
致状況を併せて、予め定めた評価関数に基づいて評価
し、最適と判定される単位音声データを選択することを
特徴とする。A second aspect of the present invention relates to selecting the optimum one of the unit voice data from the unit voice data, with respect to the phoneme sequence and the accent symbol sequence before and after the unit voice in the unit voice data. Weighting is performed according to the positional relationship with the unit voice, and phoneme sequences and accent symbol sequences before and after the unit voice in the input phoneme information and before and after the unit voice in the continuous voice that is the unit voice data. It is characterized in that the matching state of the phoneme string and the accent symbol string is combined and evaluated based on a predetermined evaluation function, and unit voice data determined to be optimum is selected.

【００１２】第三の発明は、単位音声データの中から当
該単位音声について最適なものを選択する場合に、前記
単位音声データの構文解析あるいは形態素解析結果と、
前記入力音韻、韻律情報中における当該単位音声の置か
れた言語的な環境とを比較し、さらに前記単位音声デー
タ中における当該単位音声の前後の音韻記号列に対し、
当該単位音声との位置関係に応じた重み付けを行い、入
力される音韻情報中における当該単位音声の前後の音韻
記号列と、前記単位音声データ音声中の当該単位音声の
前後の音韻記号列の一致状況を予め定めた評価関数に基
づいて評価し、その上で、前記言語的環境の比較結果と
前記音韻一致状況の評価結果を併せて最適と判定される
単位音声データを選択することを特徴とする。According to a third aspect of the present invention, when the optimum one of the unit voice data is selected from the unit voice data, the result of syntax analysis or morpheme analysis of the unit voice data,
The input phoneme, comparing the linguistic environment where the unit voice is placed in the prosody information, further for the phoneme symbol string before and after the unit voice in the unit voice data,
Weighting is performed according to the positional relationship with the unit voice, and the phoneme symbol string before and after the unit voice in the input phoneme information matches the phoneme symbol string before and after the unit voice in the unit voice data voice. It is characterized in that the situation is evaluated based on a predetermined evaluation function, and then the unit speech data that is determined to be optimal is selected by combining the comparison result of the linguistic environment and the evaluation result of the phoneme matching situation. To do.

【００１３】[0013]

【作用】この発明による音声合成方式は、各連続音声が
内部に含む規定された単位音声ごとに、大量の連続音声
の単位音声データをテーブルの状態で蓄積しておくの
で、簡単に使用する連続音声を選択することができる。
また、請求項１の発明では、連続音声の中から最適なも
のを選択する時に、当該単位音声前後の音韻並びを考慮
し、請求項２の発明では、音韻並びに加えて韻律情報を
考慮し、請求項３の発明では、音韻、韻律情報に加えて
言語情報を考慮して連続音声中から単位音声を抽出し合
成音声を作成する。In the speech synthesis system according to the present invention, since a large amount of unit voice data of continuous voice is stored in the form of a table for each prescribed unit voice contained in each continuous voice, it is easy to use continuous voice. The voice can be selected.
Further, in the invention of claim 1, when selecting the optimum one from the continuous speech, the phoneme sequence before and after the unit voice is taken into consideration, and in the invention of claim 2, the phoneme and the prosodic information are taken into consideration, According to the third aspect of the invention, the unit voice is extracted from the continuous voice in consideration of the linguistic information in addition to the phoneme and the prosody information to create the synthetic voice.

【００１４】このように前後の音韻やアクセント位置を
考慮すると、出力したい音声を実際に人間が発声した場
合の口や喉の動きの変化により近い単位音声が選ばれ
て、その結果、より自然な音声を合成できる。これによ
り前記第一の従来例の問題点が解決される。In this way, considering the front and rear phonemes and accent positions, a unit voice that is closer to the change in the movement of the mouth and throat when the voice to be output is actually uttered by a human is selected, resulting in a more natural sound. Can synthesize voice. This solves the problem of the first conventional example.

【００１５】また、予め定められた単位音声を接続する
という方法と、連続音声の形式（単語、句、節、文章
等）に依存しない特性を利用した選択基準を用いるの
で、大量のデータ中から統一的な方法で高速かつ簡易
に、最適なデータを選択することができる。Further, since a method of connecting predetermined unit voices and a selection criterion utilizing a characteristic independent of the format of continuous voices (words, phrases, sections, sentences, etc.) are used, a large amount of data can be selected. Optimal data can be selected quickly and easily by a unified method.

【００１６】[0016]

【実施例】この発明の実施例を図面を用いて説明する。
図１は、請求項１，２，３の発明による音声合成方式の
概要を示すブロック図である。図１において、入力端子
１には、合成したい音声に対応した音韻記号、韻律記
号、構文情報記号列が入力される。ここで、音韻記号は
個々の音韻を表す記号であり、韻律記号はアクセントや
イントネーション、区切り位置などの情報を表す記号で
あり、構文情報記号は、構文情報、品詞情報、格等の言
語的な情報を表す記号である。Embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing an outline of a voice synthesis system according to the inventions of claims 1, 2 and 3. In FIG. 1, a phoneme symbol, a prosody symbol, and a syntactic information symbol string corresponding to a voice to be synthesized are input to an input terminal 1. Here, phonological symbols are symbols that represent individual phonemes, prosodic symbols are symbols that represent information such as accents, intonations, and delimiter positions, and syntactic information symbols are linguistics such as syntactic information, part-of-speech information, and cases. It is a symbol that represents information.

【００１７】入力された音韻記号列は合成単位生成部２
に入力され、各単位音声列に分解される。単位音声は予
め定められており、固定的な単位である。合成単位生成
部２における処理の結果の単位音声分解結果は、音声デ
ータ探索部３に送られる。The input phoneme symbol string is used as a synthesis unit generation unit 2
Is input to and is decomposed into each unit voice string. The unit voice is a predetermined unit and is a fixed unit. The unit voice decomposition result of the processing result in the synthesis unit generation unit 2 is sent to the voice data search unit 3.

【００１８】音声データ探索部３は、各単位音声をキー
にして連続音声データを分類登録した単位音声データテ
ーブル４の中から、各単位音声ごとに候補となる連続音
声データ群を得る。The voice data searching unit 3 obtains a candidate continuous voice data group for each unit voice from the unit voice data table 4 in which the continuous voice data is classified and registered by using each unit voice as a key.

【００１９】単位音声データテーブル４から選ばれた各
単位音声の単位音声データテーブルファイルは単位音声
データ選択部５に送られ、その内容を順に参照される。
単位音声データ選択部は、与えられた単位音声データテ
ーブル中の各単位音声データについて、各請求項ごとに
定められた選択基準に従い当該単位音声について最適と
判定される単位音声データを選択する。選択された各単
位音声データは、単位音声抽出部６に送られる。The unit voice data table file of each unit voice selected from the unit voice data table 4 is sent to the unit voice data selecting section 5 and the contents thereof are referred to in order.
The unit voice data selection unit selects, for each unit voice data in the given unit voice data table, unit voice data determined to be optimum for the unit voice according to a selection criterion defined in each claim. Each selected unit voice data is sent to the unit voice extraction unit 6.

【００２０】単位音声抽出部６では、単位音声データテ
ーブル４を参照しながら、音声パラメータ蓄積部８にあ
る選択された単位音声データの音声パラメータ中から当
該単位音声に該当する部分を抽出する。音声パラメータ
蓄積部は、連続発声された自然音声を分析した結果の音
声パラメータを記憶している。The unit voice extraction unit 6 refers to the unit voice data table 4 and extracts a portion corresponding to the unit voice from the voice parameters of the selected unit voice data stored in the voice parameter storage unit 8. The voice parameter storage unit stores voice parameters as a result of analysis of continuously uttered natural voices.

【００２１】単位音声抽出部６で抽出された各単位音声
は単位音声接続部８に与えられ、編集、接続処理が行わ
れる。単位音声接続部の接続結果に応じて音源生成部９
において合成フィルタの駆動音源波形が生成される。単
位音声接続部の結果が音声合成部１０に与えられ、音源
生成部９で生成された音源により合成フィルタを励振し
て合成音声を生成する。生成された音声は出力端子に出
力される。Each unit voice extracted by the unit voice extraction unit 6 is given to the unit voice connection unit 8 to be edited and connected. The sound source generation unit 9 according to the connection result of the unit voice connection unit
At, the driving sound source waveform of the synthesis filter is generated. The result of the unit voice connection unit is given to the voice synthesis unit 10, and the sound source generated by the sound source generation unit 9 excites the synthesis filter to generate a synthesized voice. The generated voice is output to the output terminal.

【００２２】図２は、この発明の実施例に用いられる単
位音声データテーブルの一例を示す図である。この単位
音声データテーブルは、単位音声ｋａに対する単位音声
データの一部を示している。このファイルは、単位音声
ごとの候補最終決定の効率を高めるために単位音声ｋａ
をその内部に含む単語をすべてリストアップしたもので
あり、単語コードＮｏ、単語の音声記号表記、アクセン
ト型、音韻の開始時刻、開始フレームＮｏ、時間長、フ
レーム数、形態素、構文情報等の形式に整理されてい
る。単位音声が決定した時点で、全単語に対してサーチ
を行わなくとも、本テーブルをみることでデータを絞る
ことが容易になると同時に、単位音声データの特性の一
部に関する情報を得ることができる。FIG. 2 is a diagram showing an example of a unit voice data table used in the embodiment of the present invention. This unit voice data table shows a part of the unit voice data for the unit voice ka. This file contains unit voice ka in order to improve the efficiency of final decision of candidates for each unit voice.
This is a list of all the words that include in the word, such as word code No, phonetic notation of word, accent type, phoneme start time, start frame No, time length, number of frames, morpheme, and syntax information. Are organized in. When the unit voice is determined, it is possible to easily narrow down the data by looking at this table without performing a search for all words, and at the same time, it is possible to obtain information about a part of the characteristics of the unit voice data. ..

【００２３】図３は、請求項１の単位音声データ選択部
の一例を示すブロック図である。音声データ探索部から
の出力情報が入力端子１２から入力され、音韻系列比較
部１３において、入力音韻情報中における当該単位音声
の前後の音韻列と、単位音声データにおける当該単位音
声の前後の音韻列の一致状況を比較し、類似度評価部１
４において単位音声データ中の当該単位音声との位置関
係に応じた重み付けを行った上で、音韻系列比較部１３
の結果に応じて類似度を計算し、その評価値の最も高い
ものを最適な単位音声データとして選択する。選択結果
は出力端子１５から出力される。FIG. 3 is a block diagram showing an example of the unit audio data selection unit of claim 1. Output information from the voice data search unit is input from the input terminal 12, and in the phoneme sequence comparison unit 13, phoneme sequences before and after the unit voice in the input phoneme information and phoneme sequences before and after the unit voice in the unit voice data. The similarity evaluation unit 1
4, weighting is performed according to the positional relationship with the unit voice in the unit voice data, and then the phoneme sequence comparison unit 13
The similarity is calculated according to the result of, and the one with the highest evaluation value is selected as the optimum unit voice data. The selection result is output from the output terminal 15.

【００２４】類似度の評価は、音韻列の一致度に音韻の
位置に応じた適当な重み付けを行ったのち、コスト計算
を行うことによりなされる。一致度の評価は、ａ）完全に一致しているもののみに値を与える。ｂ）音韻を分類し、対応音韻の属するカテゴリー相互の
距離を値として与える。ｃ）対応音韻の組み合わせに応じて予め定めた一致度を
テーブル化しその値を用いる。等様々な方法が考えられる。The evaluation of the degree of similarity is performed by weighting the degree of coincidence of the phoneme sequence according to the position of the phoneme and then performing cost calculation. The evaluation of the degree of coincidence is as follows. b) The phonemes are classified, and the distance between the categories to which the corresponding phonemes belong is given as a value. c) A table of the degree of coincidence determined in advance according to the combination of corresponding phonemes is used and the value is used. Various methods are possible.

【００２５】例えば、単位音声の両側で、入力音韻列に
おける当該単位音声の前後と完全に一致している音韻に
のみ値を与えることとし、階段状に一致度評価値に重み
付けを行った上で、入力音声中と単位音声データ中の位
置的に対応する音韻列についてコスト計算を行う場合を
考える。今、単位音声に近い順に、４、３、２と重みを
与え、４文字目以降は一律に１の重みを与えたとする。
入力音韻列と単語中の音韻列が、単位音声の前方に２文
字、後方に１文字一致していた時には、コストは、３＋
４＋４＝１１と計算される。同様に前方４文字のみ一致
していた時には、コストは、１＋２＋３＋４＝１０と計
算される。For example, on both sides of the unit voice, values are given only to the phonemes that completely match the front and rear of the unit voice in the input phoneme sequence, and the coincidence evaluation value is weighted stepwise. , Consider the case where the cost calculation is performed on the phoneme sequence corresponding to the position in the input voice and the unit voice data. Now, it is assumed that weights of 4, 3, and 2 are given in order of being closer to the unit voice, and a weight of 1 is uniformly given to the fourth and subsequent characters.
If the input phoneme sequence and the phoneme sequence in the word match two characters before the unit voice and one character after the unit voice, the cost is 3+.
It is calculated as 4 + 4 = 11. Similarly, when only the front four characters match, the cost is calculated as 1 + 2 + 3 + 4 = 10.

【００２６】図４は、請求項２の単位音声データ選択部
の一例を示すブロック図である。音声データ探索部から
の出力情報が入力端子１から入力され、音韻系列比較部
１３において、入力音韻情報中における当該単位音声の
前後の音韻列と、単位音声データにおける当該単位音声
の前後の音韻列の一致状況を比較する。アクセント記号
系列比較部１６では、入力音韻情報中における当該単位
音声の前後の音韻に付加されたアクセント記号系列と単
位音声データにおける当該単位音声の前後の音韻列に付
加されたアクセント記号系列の一致状況を比較し、類似
度評価部１７において、単位音声データ中の当該単位音
声との位置関係に応じた重み付けを行った上で、音韻系
列比較部１３およびアクセント記号系列比較部１７の結
果に従って類似度を計算し、その評価値の最も高いもの
を最適な単位音声データとして選択する。選択結果は出
力端子１５から出力される。FIG. 4 is a block diagram showing an example of the unit audio data selection unit of claim 2. Output information from the voice data search unit is input from the input terminal 1, and in the phoneme sequence comparison unit 13, phoneme sequences before and after the unit voice in the input phoneme information and phoneme sequences before and after the unit voice in the unit voice data. Compare the match status of. The accent symbol sequence comparison unit 16 matches the accent symbol sequence added to the phoneme before and after the unit voice in the input phoneme information with the accent symbol sequence added to the phoneme sequence before and after the unit voice in the unit voice data. The similarity evaluation unit 17 performs weighting according to the positional relationship with the unit voice in the unit voice data, and then the similarity according to the results of the phoneme sequence comparison unit 13 and the accent symbol sequence comparison unit 17. Is calculated, and the one with the highest evaluation value is selected as the optimum unit voice data. The selection result is output from the output terminal 15.

【００２７】図５は、請求項３の単位音声データ選択部
の一例を示すブロック図である。音声データ探索部から
の出力情報が入力端子１２から入力され、音韻系列比較
部１４において、入力音韻情報中における当該単位音声
の前後の音韻列と、単位音声データにおける当該単位音
声の前後の音韻列の一致状況を比較する。アクセント記
号系列比較部１７では、入力音韻情報中における当該単
位音声の前後の音韻に付加されたアクセント記号系列と
単位音声データにおける当該単位音声の前後の音韻列に
付加されたアクセント記号系列の一致状況を比較する。
一方、入力文字列に付加された構文、形態素解析結果等
の情報から、言語的条件比較部１８で構文、品詞等に応
じて予め定めた基準に基づき類似度を評価する。類似度
評価部１９では、音韻系列比較部１３とアクセント系列
比較部１６から出力される比較結果と言語的条件比較部
１８から出力される比較結果を併せて、単位音声データ
中の当該単位音声との位置関係に応じた重み付けを行っ
た上で、総合的な類似度を計算し、その評価値の最も高
いものを最適な単位音声データとして選択する。選択結
果は出力端子１５から出力される。FIG. 5 is a block diagram showing an example of the unit audio data selection unit of claim 3. Output information from the voice data search unit is input from the input terminal 12, and in the phoneme sequence comparison unit 14, phoneme sequences before and after the unit voice in the input phoneme information and phoneme sequences before and after the unit voice in the unit voice data. Compare the match status of. In the accent symbol sequence comparison unit 17, the matching status of the accent symbol sequence added to the phoneme before and after the unit voice in the input phoneme information and the accent symbol sequence added to the phoneme sequence before and after the unit voice in the unit voice data. To compare.
On the other hand, from the information such as the syntax and the morphological analysis result added to the input character string, the linguistic condition comparison unit 18 evaluates the degree of similarity based on a predetermined criterion according to the syntax, the part of speech, and the like. The similarity evaluation unit 19 combines the comparison result output from the phoneme sequence comparison unit 13 and the accent sequence comparison unit 16 and the comparison result output from the linguistic condition comparison unit 18 with the unit voice in the unit voice data. After the weighting is performed according to the positional relationship, the overall similarity is calculated, and the one having the highest evaluation value is selected as the optimum unit voice data. The selection result is output from the output terminal 15.

【００２８】上記の実施例では、単位音声データベース
として単語データを使用した場合について説明している
が、本発明で使用するデータは連続音声であればよく、
単語に限らない。In the above embodiment, the case where word data is used as the unit voice database has been described, but the data used in the present invention may be continuous voice,
Not limited to words.

【００２９】[0029]

【発明の効果】連続的に合成音声を生成する場合、従来
固定的単位音声データの使用に起因して生じていた不自
然さが減少する。さらに、各単位音声ごとに多数の単位
音声データの中から最適なものを選択する際に、各種テ
ーブルを用いるため、その選択処理が簡単である。ま
た、音韻の前後関係、アクセント位置を考慮して使用単
位音声を切り出すため、実際に人間が発声したものに類
似したスペクトルパタンの変化を実現できる。従って、
それぞれの音韻が独立に明瞭すぎるためにかえって聞き
取りにくい等の聴取者の負担を軽減し、聞き疲れしにく
い合成音声を生成することが可能となる。As described above, when the synthetic speech is continuously generated, the unnaturalness caused by using the fixed unit speech data is reduced. Further, since various tables are used when selecting the optimum one from a large number of unit voice data for each unit voice, the selection process is simple. Further, since the unit voice to be used is cut out in consideration of the context of the phoneme and the accent position, it is possible to realize a change in the spectrum pattern similar to that actually uttered by a human. Therefore,
It is possible to reduce the burden on the listener such that it is difficult to hear each phoneme because each phoneme is too clear independently, and it is possible to generate a synthetic voice that is hard to hear.

【図面の簡単な説明】[Brief description of drawings]

【図１】請求項１，２，３の発明による音声合成方式の
概要を示すブロック図である。FIG. 1 is a block diagram showing an outline of a voice synthesis system according to claims 1, 2, and 3.

【図２】この発明の実施例に用いられる単位音声データ
テーブルの一例を示す図である。FIG. 2 is a diagram showing an example of a unit audio data table used in the embodiment of the present invention.

【図３】請求項１の単位音声データ選択部の一例を示す
ブロック図である。FIG. 3 is a block diagram showing an example of a unit audio data selection unit of claim 1.

【図４】請求項２の単位音声データ選択部の一例を示す
ブロック図である。FIG. 4 is a block diagram showing an example of a unit audio data selection unit of claim 2;

【図５】請求項３の単位音声データ選択部の一例を示す
ブロック図である。FIG. 5 is a block diagram showing an example of a unit audio data selection unit of claim 3;

【符号の説明】[Explanation of symbols]

１入力端子２合成単位生成部３音声データ探索部４単位音声データテーブル５単位音声データ選択部６単位音声抽出部７音声パラメータ蓄積部８単位音声接続部９音源生成部１０音声合成部１１出力端子１２入力端子１３音韻系列比較部１４類似度評価部１５出力端子１６アクセント記号系列比較部１７類似度評価部１８言語的環境比較部１９類似度評価部 1 Input Terminal 2 Synthesis Unit Generation Section 3 Speech Data Search Section 4 Unit Speech Data Table 5 Unit Speech Data Selection Section 6 Unit Speech Extraction Section 7 Speech Parameter Storage Section 8 Unit Speech Connection Section 9 Sound Source Generation Section 10 Speech Synthesis Section 11 Output Terminal 12 input terminal 13 phoneme sequence comparison unit 14 similarity evaluation unit 15 output terminal 16 accent symbol sequence comparison unit 17 similarity evaluation unit 18 linguistic environment comparison unit 19 similarity evaluation unit

Claims

Translated fromJapanese

【特許請求の範囲】[Claims]

【請求項１】入力された音韻記号および韻律記号から
なる文字列を音声に変換し、任意語の音声合成を行う方
式において，予め連続発声された自然音声を分析した音
声パラメータを記憶し、予め定められた単位音声ごとに，当該単位音声と前記音
声パラメータとの対応関係と当該音声パラメータの発声
時の音韻列を単位音声データテーブルに記憶し、入力された文字列の音韻および韻律の情報に基づいて単
位音声情報を定め、前記定められた単位音声情報に基づ
き、前記単位音声データテーブルを参照し、前記単位音
声データテーブル中の各単位音声データにおける当該単
位音声の前後の音韻列に対し、当該単位音声との位置関
係に応じた重み付けを行い、前記入力音韻情報中における当該単位音声の前後の音韻
列と、前記単位音声データにおける当該単位音声の前後
の音韻列との一致状況を予め定めた評価関数に基づいて
評価して、最適と判定される単位音声データを選択し、前記選択された単位音声データの音声パラメータを、前
記単位音声データテーブルの情報に基づいて前記記憶さ
れた音声パラメータから抽出し、前記抽出された音声パラメータを用いて音声合成を行う
ことを特徴とする音声合成方式。1. A method for converting a character string consisting of a phoneme symbol and a prosody symbol input into a voice and synthesizing a voice of an arbitrary word, storing a voice parameter obtained by analyzing a natural voice that is continuously uttered in advance, For each defined unit voice, the correspondence relationship between the unit voice and the voice parameter and the phoneme sequence at the time of utterance of the voice parameter are stored in the unit voice data table, and the phoneme and prosody information of the input character string is stored. Unit voice information is determined based on the determined unit voice information, based on the unit voice data table, to the phoneme sequence before and after the unit voice in each unit voice data in the unit voice data table, Weighting is performed according to the positional relationship with the unit voice, and phoneme strings before and after the unit voice in the input phoneme information, and the unit voice data. Evaluate the matching situation with the phoneme sequence before and after the unit voice in, based on a predetermined evaluation function, select the unit voice data determined to be optimal, the voice parameters of the selected unit voice data, A voice synthesizing method characterized by extracting from the stored voice parameters based on information of the unit voice data table, and performing voice synthesizing using the extracted voice parameters.

【請求項２】入力された音韻記号および韻律記号から
なる文字列を音声に変換し、任意語の音声合成を行う方
式において，予め連続発声された自然音声を分析した音
声パラメータを記憶し、予め定められた単位音声ごとに，当該単位音声と前記音
声パラメータとの対応関係と当該音声パラメータの発声
時の音韻列を単位音声データテーブルに記憶し、入力された文字列の音韻および韻律の情報に基づいて単
位音声情報を定め、前記定められた単位音声情報に基づき、前記単位音声デ
ータテーブルを参照し、前記単位音声データテーブル中
の各単位音声データにおける当該単位音声の前後の音韻
列およびアクセント記号に対し、当該単位音声との位置
関係に応じた重み付けを行い、前記入力音韻情報および入力韻律情報中における当該単
位音声の前後の音韻列およびアクセント記号列と、前記
単位音声データ中における当該単位音声の前後の音韻列
およびアクセント記号列との一致状況を併せて予め定め
た評価関数に基づいて評価して、最適と判定される単位
音声データを選択し、前記選択された単位音声データの音声パラメータを、前
記単位音声データテーブルの情報に基づいて前記記憶さ
れた音声パラメータから抽出し、前記抽出された音声パラメータを用いて音声合成を行う
ことを特徴とする音声合成方式。2. A method of converting a character string consisting of a phoneme symbol and a prosodic symbol input into a voice and synthesizing a voice of an arbitrary word, storing a voice parameter obtained by analyzing a natural voice that is continuously uttered in advance, For each defined unit voice, the correspondence relationship between the unit voice and the voice parameter and the phoneme sequence at the time of utterance of the voice parameter are stored in the unit voice data table, and the phoneme and prosody information of the input character string is stored. Unit voice information based on the determined unit voice information, the unit voice data table is referred to based on the determined unit voice information, and the phoneme strings and accent marks before and after the unit voice in each unit voice data in the unit voice data table. Is weighted according to the positional relationship with the unit voice, and the unit in the input phoneme information and the input prosody information The phoneme sequence before and after the voice and the accent symbol sequence, and the matching situation of the phoneme sequence before and after the unit voice and the accent symbol sequence in the unit voice data are evaluated together based on a predetermined evaluation function, and the optimum Unit audio data is determined to be, the audio parameter of the selected unit audio data is extracted from the stored audio parameter based on the information of the unit audio data table, the extracted audio parameter A speech synthesis method characterized by performing speech synthesis using the method.

【請求項３】入力された少なくとも音韻記号、韻律記
号および構文情報からなる文字列を音声に変換し、任意
語の音声合成を行う方式において，予め連続発声された
自然音声を分析した音声パラメータを記憶し、予め定められた単位音声ごとに，当該単位音声と前記音
声パラメータとの対応関係と当該音声パラメータの発声
時の音韻列を単位音声データテーブルに記憶し、入力された文字列の音韻および韻律の情報に基づいて単
位音声情報を定め、前記定められた単位音声情報に基づき、前記単位音声デ
ータテーブルを参照し、前記入力音韻情報および入力韻
律情報中における当該単位音声の置かれた言語的な環境
と、前記単位音声データの解析結果を比較し、同時に前
記単位音声データ中における当該単位音声の前後の音韻
記号列に対し、当該単位音声との位置関係に応じた重み
付けを行い、前記入力音韻情報中における当該単位音声の前後の音韻
記号列と、前記単位音声データ中における当該単位音声
の前後の音韻記号列の一致状況を予め定めた評価関数に
基づいて評価し、併せて構文および品詞に応じて予め定
めた基準に基づき類似度を評価して最適な単位音声デー
タを選択し、前記選択された単位音声データの音声パラメータを、前
記単位音声データテーブルの情報に基づいて前記記憶さ
れた音声パラメータから抽出し、前記抽出された音声パラメータを用いて音声合成を行う
ことを特徴とする音声合成方式。3. A method of converting a character string, which is composed of at least a phonological symbol, a prosodic symbol, and syntactic information, into a voice and performing voice synthesis of an arbitrary word, wherein a voice parameter obtained by analyzing a natural voice that is continuously uttered in advance is used. For each predetermined unit voice, the correspondence relation between the unit voice and the voice parameter and the phoneme string at the time of utterance of the voice parameter are stored in the unit voice data table, and the phoneme and the phoneme of the input character string are stored. Unit voice information is determined based on prosody information, and based on the determined unit voice information, the unit voice data table is referred to, and the input phoneme information and the linguistic language in which the unit voice is placed in the input prosody information. And the analysis results of the unit voice data are compared with each other, and at the same time, for the phoneme symbol strings before and after the unit voice in the unit voice data. , The weighting is performed according to the positional relationship with the unit voice, and the matching state of the phoneme symbol sequence before and after the unit voice in the input phoneme information and the phoneme symbol sequence before and after the unit voice in the unit voice data Is evaluated on the basis of a predetermined evaluation function, and also the similarity is evaluated on the basis of a predetermined standard according to the syntax and the part of speech, and optimum unit voice data is selected, and the voice of the selected unit voice data is selected. A voice synthesis method, wherein a parameter is extracted from the stored voice parameter based on information of the unit voice data table, and voice synthesis is performed using the extracted voice parameter.