JP3771565B2

Movatterモバイル変換

Info

Publication number: JP3771565B2
Application number: JP2004079113A
Authority: JP
Inventors: 弓子加藤; 孝浩釜井; 紀代原; 謙二松井
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-11-28
Filing date: 2004-03-18
Publication date: 2006-04-26
Anticipated expiration: 2018-11-24
Also published as: JP2004206144A

Abstract

<P>PROBLEM TO BE SOLVED: To solve problems wherein when a fundamental frequency pattern is generated, a fluctuation in fundamental frequency in a mora can not precisely be determined, and a rhythm represented by an accent becomes unnatural owing to distortion generation on a real-time axis. <P>SOLUTION: Provided are: a phoneme time length standardized fundamental frequency database 351 which stores the number of the moras of an accent phrase, a rise reference point of an (i)th mora as the peak of a fundamental frequency pattern, etc., in the form of positions relative to the time length of a phoneme of a specified mora; a fundamental frequency pattern deformation database 350 which stores peaks of an accent phrase by positions in phrases of the accent phrase and the quantity of deformation in fundamental frequency at the ending; a time length setting part 40 which sets a time length of each phoneme by referring to a phoneme time length database 30 according to phoneme information etc., outputted from a character string analysis part 20; a generation part 60 which generates a fundamental frequency pattern by referring to the above-mentioned database 351 according to the above-mentioned outputted phoneme information and the time length of a phoneme set by the time length setting part 40; and others. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

Translated fromJapanese

本発明は、音声合成に用いる基本周波数パタン生成装置、基本周波数パタン生成方法、及びプログラム記録媒体に関する。The present invention relates to afundamental frequency pattern generation device, a fundamental frequency pattern generation method, and a program recording medium used for speech synthesis.

従来の音声基本周波数パタン生成方法としては、特許文献１のようにアクセントの型に着目して当該モーラの開始点あるいは当該モーラの母音開始点を基準とし、対数周波数軸上の臨界制動２次線形系で基本周波数パタンを決定するものがある。一方、特許文献２のようにアクセントの型、音韻の種類、語あるいは句のモーラ位置に着目して各モーラ毎の基本周波数を決定するものもある。
特開平５−１７３５９０号公報特開平５−８８６９０号公報As a conventional speech fundamental frequency pattern generation method, as inPatent Document 1, focusing on the accent type, the start point of the mora or the vowel start point of the mora is used as a reference, and the critical braking quadratic linear on the logarithmic frequency axis is used. Some systems determine the fundamental frequency pattern. On the other hand, as inPatent Document 2, there is a technique that determines the fundamental frequency for each mora by paying attention to the accent type, phoneme type, and mora position of a word or phrase.
JP-A-5-173590 JP-A-5-88690

しかし、これらの従来の手法では、モーラ内での基本周波数の変動を精密に決定できない、あるいはモーラ毎の時間長の違いによる実時間軸上での歪みを生じ、アクセントに代表される韻律が不自然になるという課題を有していた。 However, these conventional methods cannot accurately determine fluctuations in the fundamental frequency within the mora, or cause distortion on the real time axis due to differences in the time length of each mora, resulting in poor prosody represented by accents. Had a problem of becoming natural.

本発明は、上述した従来の音声基本周波数パタン生成方法の課題を考慮し、従来に比べてより一層自然性の高い基本周波数パタンを生成出来る基本周波数パタン生成方法、及びプログラム記録媒体を提供することを目的とする。 The present invention provides a fundamental frequency pattern generation method and a program recording medium capable of generating a fundamental frequency pattern having higher naturalness than the conventional one in consideration of the problems of the above-described conventional speech fundamental frequency pattern generation method. With the goal.

第１の本発明は、入力された文字列をアクセント句に分け、前記アクセント句のモーラ数とアクセント型との情報を出力する文字列解析部と、
前記アクセント句のモーラ数とアクセント型とによって分類された基本周波数パタンを記憶した基本周波数データベースと、
前記文字列解析部からの前記モーラ数と前記アクセント型との情報から前記基本周波数データベース内の所定の基本周波数パタンを取得し、前記アクセント句の基本周波数パタンを生成する基本周波数パタン生成部と、を有し、
前記基本周波数データベース内の前記基本周波数パタンは、
前記アクセント句の基本周波数の中でピークの基本周波数を有するモーラの基本周波数パタン（ａ）と、前記アクセント句のアクセント核を有するモーラの基本周波数パタン（ｂ）と、前記アクセント句の前記アクセント核を有するモーラの次のモーラの基本周波数パタン（ｃ）と、前記アクセント句の末尾の複数のｋモーラの基本周波数パタン（ｄ）と、を含み、
前記基本周波数パタン生成部は、
前記文字列解析部から受け取った前記アクセント句のモーラ数としてのｎモーラとアクセント型としてのｍ型とに対応した基本周波数パタンが前記基本周波数データベースに無い場合には、
（１）前記文字列解析部から受け取った前記アクセント句が平板型以外の場合は、
前記アクセント句のアクセント核の位置が第ｍモーラで、前記アクセント句の基本周波数の中でピークの基本周波数を有する位置が第ｉモーラである、前記基本周波数パタン（ａ）から（ｄ）までを前記基本周波数データベースから選択し、
［イ］ｍ≦ｉ＋１の場合は、前記基本周波数パタン（ｃ）と前記基本周波数パタン（ｄ）との間の基本周波数パタンを補間し、
［ロ］ｉ＋１＜ｍ≦ｎ−ｋの場合は、前記基本周波数パタン（ａ）と前記基本周波数パタン（ｂ）との間の基本周波数パタンを補間し、かつ、前記基本周波数パタン（ｃ）と前記基本周波数パタン（ｄ）との間の基本周波数パタンを補間し、
［ハ］ｍ＞ｎ−ｋの場合は、前記基本周波数パタン（ａ）と前記基本周波数パタン（ｂ）との間の基本周波数パタンを補間し、
（２）前記文字列解析部から受け取った前記アクセント句が平板型の場合は、
前記アクセント句の基本周波数の中でピークの基本周波数を有する位置が第ｉモーラである基本周波数パタン（ａ）と、前記基本周波数パタン（ｄ）とを、前記基本周波数データベースから選択し、前記基本周波数パタン（ａ）と前記基本周波数パタン（ｄ）との間の基本周波数パタンを補間する、
基本周波数パタン生成装置である。The first aspect of the present invention isa character string analysis unit that divides aninput character string into accent phrases and outputs information on the number of mora and accent type of the accent phrase;
A fundamental frequency database storing fundamental frequency patterns classified by the number of mora and accent type of the accent phrase;
A fundamental frequency pattern generation unit for acquiring a predetermined fundamental frequency pattern in the fundamental frequency database from information on the mora number and the accent type from the character string analysis unit, and generating a fundamental frequency pattern of the accent phrase; Have
The fundamental frequency pattern in the fundamental frequency database is
A fundamental frequency pattern (a) of a mora having a peak fundamental frequency among fundamental frequencies of the accent phrase, a fundamental frequency pattern (b) of a mora having an accent nucleus of the accent phrase, and the accent nucleus of the accent phrase A fundamental frequency pattern (c) of a next mora of a mora having a fundamental frequency pattern (d) of a plurality of k mora at the end of the accent phrase,
The fundamental frequency pattern generator is
When there is no fundamental frequency pattern corresponding to the n-mora as the number of mora of the accent phrase received from the character string analysis unit and the m-type as the accent type in the fundamental frequency database,
(1) When the accent phrase received from the character string analysis unit is not a flat plate type,
The fundamental frequency patterns (a) to (d) in which the position of the accent nucleus of the accent phrase is the m-th mora and the position having the peak fundamental frequency among the fundamental frequencies of the accent phrase is the i-th mora. Select from the fundamental frequency database;
[A] When m ≦ i + 1, the fundamental frequency pattern between the fundamental frequency pattern (c) and the fundamental frequency pattern (d) is interpolated,
[B] When i + 1 <m ≦ n−k, the fundamental frequency pattern between the fundamental frequency pattern (a) and the fundamental frequency pattern (b) is interpolated, and the fundamental frequency pattern (c) Interpolating a fundamental frequency pattern between the fundamental frequency patterns (d),
[C] If m> n−k, interpolate a fundamental frequency pattern between the fundamental frequency pattern (a) and the fundamental frequency pattern (b),
(2) When the accent phrase received from the character string analysis unit is a flat plate type,
The fundamental frequency pattern (a) having the peak fundamental frequency among the fundamental frequencies of the accent phrase is selected from the fundamental frequency database, and the fundamental frequency pattern (d) is selected from the fundamental frequency database. Interpolating a fundamental frequency pattern between a frequency pattern (a) and the fundamental frequency pattern (d);
This is a fundamental frequency pattern generator.

第２の本発明は、上記補間は、実時間上の関数で補間することである、上記第１の本発明の基本周波数パタン生成装置である。The second aspect of the present invention isthe fundamental frequency pattern generation device according to the first aspect of the present invention, wherein the interpolation is performed by a function in real time.

第３の本発明は、上記補間は、実時間上の直線で補間することである、請求項１に記載の基本周波数パタン生成装置である。The third aspect of the present invention isthe fundamental frequency pattern generation device according toclaim 1, wherein the interpolation is performed by a straight line in real time.

第４の本発明は、入力された文字列をアクセント句に分け、前記アクセント句のモーラ数とアクセント型との情報を出力する文字列解析工程と、
前記アクセント句のモーラ数とアクセント型とによって分類された基本周波数パタンを基本周波数データベースに記憶する記憶工程と、
前記文字列解析工程で出力された前記モーラ数と前記アクセント型との情報から前記基本周波数データベース内の所定の基本周波数パタンを取得し、前記アクセント句の基本周波数パタンを生成する基本周波数パタン生成工程と、を用いて前記入力された文字列のアクセント句の基本周波数パタンを生成する基本周波数パタン生成方法であって、
前記基本周波数データベース内の前記基本周波数パタンは、
前記アクセント句の基本周波数の中でピークの基本周波数を有するモーラの基本周波数パタン（ａ）と、前記アクセント句のアクセント核を有するモーラの基本周波数パタン（ｂ）と、前記アクセント句の前記アクセント核を有するモーラの次のモーラの基本周波数パタン（ｃ）と、前記アクセント句の末尾の複数のｋモーラの基本周波数パタン（ｄ）と、を含み、
前記基本周波数パタン生成工程では、
前記文字列解析部から受け取った前記アクセント句のモーラ数としてのｎモーラとアクセント型としてのｍ型とに対応した基本周波数パタンが前記基本周波数データベースに無い場合には、
（１）前記文字列解析工程から受け取った前記アクセント句が平板型以外の場合は、
前記アクセント句のアクセント核の位置が第ｍモーラで、前記アクセント句の基本周波数の中でピークの基本周波数を有する位置が第ｉモーラである、前記基本周波数パタン（ａ）から（ｄ）までを前記基本周波数データベースから選択し、
［イ］ｍ≦ｉ＋１の場合は、前記基本周波数パタン（ｃ）と前記基本周波数パタン（ｄ）との間の基本周波数パタンを補間し、
［ロ］ｉ＋１＜ｍ≦ｎ−ｋの場合は、前記基本周波数パタン（ａ）と前記基本周波数パタン（ｂ）との間の基本周波数パタンを補間し、かつ、前記基本周波数パタン（ｃ）と前記基本周波数パタン（ｄ）との間の基本周波数パタンを補間し、
［ハ］ｍ＞ｎ−ｋの場合は、前記基本周波数パタン（ａ）と前記基本周波数パタン（ｂ）との間の基本周波数パタンを補間し、
（２）前記文字列解析工程から受け取った前記アクセント句が平板型の場合は、
前記アクセント句の基本周波数の中でピークの基本周波数を有する位置が第ｉモーラである基本周波数パタン（ａ）と、前記基本周波数パタン（ｄ）とを、前記基本周波数データベースから選択し、前記基本周波数パタン（ａ）と前記基本周波数パタン（ｄ）との間の基本周波数パタンを補間する、
基本周波数パタン生成方法である。The fourth aspect of the present invention isa character string analyzing step of dividing aninput character string into accent phrases and outputting information on the number of mora and accent type of the accent phrase;
Storing a fundamental frequency pattern classified according to the number of mora and accent type of the accent phrase in a fundamental frequency database;
A basic frequency pattern generation step of acquiring a predetermined basic frequency pattern in the basic frequency database from information of the mora number and the accent type output in the character string analysis step, and generating a basic frequency pattern of the accent phrase A basic frequency pattern generation method for generating a basic frequency pattern of an accent phrase of the input character string using
The fundamental frequency pattern in the fundamental frequency database is
A fundamental frequency pattern (a) of a mora having a peak fundamental frequency among fundamental frequencies of the accent phrase, a fundamental frequency pattern (b) of a mora having an accent nucleus of the accent phrase, and the accent nucleus of the accent phrase A fundamental frequency pattern (c) of a next mora of a mora having a fundamental frequency pattern (d) of a plurality of k mora at the end of the accent phrase,
In the fundamental frequency pattern generation step,
When there is no fundamental frequency pattern corresponding to the n-mora as the number of mora of the accent phrase received from the character string analysis unit and the m-type as the accent type in the fundamental frequency database,
(1) When the accent phrase received from the character string analysis step is not a flat plate type,
The fundamental frequency patterns (a) to (d) in which the position of the accent nucleus of the accent phrase is the m-th mora and the position having the peak fundamental frequency among the fundamental frequencies of the accent phrase is the i-th mora. Select from the fundamental frequency database;
[A] When m ≦ i + 1, the fundamental frequency pattern between the fundamental frequency pattern (c) and the fundamental frequency pattern (d) is interpolated,
[B] When i + 1 <m ≦ n−k, the fundamental frequency pattern between the fundamental frequency pattern (a) and the fundamental frequency pattern (b) is interpolated, and the fundamental frequency pattern (c) Interpolating a fundamental frequency pattern between the fundamental frequency patterns (d),
[C] If m> n−k, interpolate a fundamental frequency pattern between the fundamental frequency pattern (a) and the fundamental frequency pattern (b),
(2) When the accent phrase received from the character string analyzing step is a flat plate type,
The fundamental frequency pattern (a) having the peak fundamental frequency among the fundamental frequencies of the accent phrase is selected from the fundamental frequency database, and the fundamental frequency pattern (d) is selected from the fundamental frequency database. Interpolating a fundamental frequency pattern between a frequency pattern (a) and the fundamental frequency pattern (d);
This is a fundamental frequency pattern generation method.

第５の本発明は、上記第４の本発明の基本周波数パタン生成方法の前記文字列解析工程と、前記記憶工程と、前記基本周波数生成工程とをコンピュータにより実行させるためのプログラムを記録した、コンピュータにより処理可能なプログラム記録媒体である。According to a fifth aspect of the present invention, there isrecorded a program for causing the computer to execute the character string analysis step, the storage step, and the basic frequency generation step of the fundamental frequency pattern generation method of the fourth aspect of the present invention, A program recording medium that can be processed by a computer.

以上述べたところから明らかな様に本発明は、従来に比べてより一層自然性の高い基本周波数パタンを生成出来るという長所を有する。 As is apparent from the above description, the present invention has an advantage that it is possible to generate a fundamental frequency pattern with higher naturalness than in the prior art.

以下、本発明の実施の形態、及び本発明に関連する他の発明の実施の形態について、図１から図２０を用いて説明する。 Hereinafter, embodiments of the present invention and other embodiments of the present invention related to the present invention will be described with reference to FIGS.

（実施の形態１）
図１は、本発明に関連する他の発明の一実施の形態を示す基本周波数パタン生成装置の機能ブロック図であり、同図を参照しながら、本実施の形態の構成を説明する。(Embodiment 1)
FIG. 1 is a functional block diagram of a fundamental frequency pattern generation device showing an embodiment of another invention related to the present invention, and the configuration of this embodiment will be described with reference to FIG.

即ち、図１において、１０は音声合成の対象となる文字列を入力する文字列入力部である。２０は文字列入力部１０より入力された文字列を解析し合成されるべき音声の音韻情報とアクセントやポーズ等の韻律情報を出力する文字列解析部である。３０は発話速度、発話中での音韻の位置等の条件ごとに各音韻の時間長を記憶する音韻時間長データベースであり、４０は文字列解析部２０より出力された音韻情報および韻律情報に基づいて音韻時間長データベース３０を参照して各音韻の時間長を設定する時間長設定部である。５０はアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、各モーラの時間長で標準化した基本周波数パタンをモーラ毎に記憶するモーラ時間長標準化基本周波数データベースであり、６０は文字列解析部２０より出力された韻律情報と時間長設定部４０で設定された音韻の時間長に基づいてモーラ時間長標準化基本周波数データベース５０を参照して基本周波数パタンを生成する基本周波数パタン生成部である。７０は基本周波数パタン生成部より出力された基本周波数パタンに基づいて声帯振動を生成する声帯振動生成部であり、合成音声の音源振動を生成する。図２は本発明による基本周波数パタンの一例である。 That is, in FIG. 1,reference numeral 10 denotes a character string input unit for inputting a character string to be subjected to speech synthesis.Reference numeral 20 denotes a character string analysis unit that analyzes the character string input from the characterstring input unit 10 and outputs phonological information of speech to be synthesized and prosodic information such as accent and pose.Reference numeral 30 denotes a phoneme time length database for storing the time length of each phoneme for each condition such as the speech speed and the position of the phoneme during the speech, and 40 denotes the phoneme information and prosodic information output from the characterstring analysis unit 20. The time length setting unit sets the time length of each phoneme with reference to the phonemetime length database 30. 50 is a mora time length standardized fundamental frequency database that stores, for each mora, a fundamental frequency pattern that is standardized by the time length of each mora with respect to the conditions of the prosodic determinants such as the number of mora of accent phrases, accent type, and phoneme sequence,Reference numeral 60 denotes a fundamental frequency for generating a fundamental frequency pattern by referring to the mora time length standardized fundamental frequency database 50 based on the prosodic information output from the characterstring analysis unit 20 and the phoneme time length set by the timelength setting unit 40. It is a pattern generation unit.Reference numeral 70 denotes a vocal cord vibration generating unit that generates a vocal cord vibration based on the basic frequency pattern output from the basic frequency pattern generating unit, and generates a sound source vibration of the synthesized voice. FIG. 2 is an example of the fundamental frequency pattern according to the present invention.

以上のように構成された基本周波数パタン生成装置について、以下、その動作を述べる。 The operation of the fundamental frequency pattern generating apparatus configured as described above will be described below.

まず、文字列入力部１０から音声に変換されるべき文字列（図２に示す、「オンセーゴーセー」の文字列）が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 First, a character string to be converted into speech (a character string “ON SAGO SAY” shown in FIG. 2) is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, and indicates the number of mora and accent type of each accent phrase. Prosodic information and phoneme information indicating a phoneme string are output to the fundamental frequencypattern generation unit 60. The timelength setting unit 40 sets the time length of each phoneme with reference to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and outputs the time length information to the fundamental frequencypattern generation unit 60. . The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40.

まず、図２中のａ）のようにアクセント句の先頭モーラの基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得する。次にアクセント句のモーラ数とアクセント型より基本周波数が最大値をとるモーラを特定し、図２中のｂ）のように特定されたモーラの基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得する。図２中のｃ）およびｄ）のようにアクセント核とアクセント核の次のモーラの基本周波数パタンおよびアクセント句の最終モーラの基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得する。図２のｂ）とｃ）、ｃ）とｄ）のように基準となるモーラの間を実時間軸で線形補間を用いて、図２のｅ）、ｆ）およびｇ）の基本周波数パタンを決定する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 First, as shown in a) of FIG. 2, the fundamental frequency pattern of the first mora of the accent phrase is acquired from the mora time length standardized fundamental frequency database 50. Next, the mora having the maximum fundamental frequency is identified from the number of mora in the accent phrase and the accent type, and the fundamental frequency pattern of the mora identified as shown in b) in FIG. 2 is obtained from the mora time length standardized fundamental frequency database 50. get. As shown in c) and d) in FIG. 2, the fundamental frequency pattern of the mora next to the accent nucleus and the accent nucleus and the fundamental frequency pattern of the final mora of the accent phrase are acquired from the mora time length standardized fundamental frequency database 50. The basic frequency patterns of e), f) and g) in FIG. 2 are obtained by using linear interpolation between the reference mora as shown in b) and c) and c) and d) in FIG. decide. The vocal cordvibration generating unit 70 generates a vocal cord vibration of the synthesized sound according to the basic frequency pattern output from the basic frequencypattern generating unit 60.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングと角度を当該モーラの時間長で標準化した基本周波数パタンを当てはめることにより、モーラ内での基本周波数の変動を詳細に再現し、高い自然性を実現するとともに、聞こえに大きく影響しない部分については実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。 By applying a basic frequency pattern that standardizes the timing and angle of accent phrase rise and accent nucleus fall with the length of the mora, which greatly affects the naturalness of speech, the fluctuation of the fundamental frequency within the mora can be reduced. It reproduces in detail, realizes high naturalness, and by interpolating on the real time axis for parts that do not greatly affect hearing, eliminates discontinuity when controlling in mora units, and also has a basic frequency pattern database It can be made smaller.

（実施の形態２）
図４は本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図であり、モーラ時間長標準化基本周波数データベース５０がアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、各モーラの母音部の時間長を４等分し、各区間の基本周波数の代表値を区間の中央点の値として記憶する母音時間長標準化基本周波数データベース１５０ａに置き換わった以外は図１と同様である。(Embodiment 2)
FIG. 4 is a functional block diagram of an apparatus showing another embodiment of the present invention related to the present invention. A mora time length standardized fundamental frequency database 50 determines prosody such as the number of mora of accent phrase, accent type, and phoneme string. As for the factor condition, the time length of the vowel part of each mora is divided into four equal parts, and the basic value of the fundamental frequency of each section is replaced with thestandard frequency database 150a for standardization of the vowel time length stored as the value of the center point of the section. The same as FIG.

図３は本発明による基本周波数パタンの一例である。以下その動作を述べる。まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 FIG. 3 shows an example of the fundamental frequency pattern according to the present invention. The operation will be described below. First, a character string to be converted into speech is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, and indicates the number of mora and accent type of each accent phrase. Prosodic information and phoneme information indicating a phoneme string are output to the fundamental frequencypattern generation unit 60. The timelength setting unit 40 sets the time length of each phoneme with reference to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and outputs the time length information to the fundamental frequencypattern generation unit 60. . The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40.

まず、アクセント句のモーラ数とアクセント型および音韻列等により、母音時間長標準化基本周波数データベース１５０ａより基本周波数が最大値をとるモーラの母音相当部を４等分した３番目の区間中央のａ）立ち上がり基準点、アクセント核に当たるモーラの母音相当部を４等分した３番目の区間中央のｂ）立ち下がり基準点、アクセント核の次のモーラの母音相当部を４等分した３番目の区間中央のｃ）立ち下がり基準点、アクセント句の最終モーラの母音相当部を４等分した２番目の区間中央のｄ）アクセント句末基準点、および最終モーラの母音相当部を４等分した３番目の区間中央のｅ）語尾基準点を取得する。 First, a) in the middle of the third section obtained by dividing the vowel equivalent part of the mora having the maximum fundamental frequency from the vowel time length standardizedfundamental frequency database 150a by the number of accent phrases and the accent type and the phoneme string a). B) The center of the third section where the rising reference point and the vowel equivalent part of the mora that hits the accent core are divided into four equal parts b) The center of the third section that is divided into the quarter reference part of the mora vowel corresponding to the falling reference point and the accent core. C) The falling reference point, the middle of the second section obtained by dividing the vowel equivalent part of the final mora of the accent phrase into four equal parts d) The third reference point of the accent phrase end reference point and the vowel equivalent part of the final mora divided into four parts E) The ending reference point in the middle of the section is acquired.

次に各基準点を対応するモーラの母音時間長に対する相対位置に設定する。ａ）立ち上がり基準点が最大値となるようアクセント句の先頭からａ）立ち上がり基準点までを実時間軸上で対数周波数軸に対する臨界制動２次線形系を用いて補間する。ａ）からｄ）の各基準点の間を各区間ごとに２点間を実時間軸上で対数周波数軸に対する臨界制動２次線形系を用いて補間する。さらにアクセント句の終了が発話の終了である場合には、ｄ）アクセント句末基準点とｅ）語尾基準点との間を実時間軸上の関数である語尾関数により補間する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 Next, each reference point is set to a relative position with respect to the vowel time length of the corresponding mora. a) Interpolation from the beginning of the accent phrase to the rising reference point is performed using a critical braking quadratic linear system on the logarithmic frequency axis on the real time axis so that the rising reference point becomes the maximum value. Interpolation is performed between the reference points a) to d) for each section between the two points using a critical braking quadratic linear system on the logarithmic frequency axis on the real time axis. Further, when the end of the accent phrase is the end of the utterance, interpolation is performed between the d) accent phrase end reference point and the e) end reference point by a ending function which is a function on the real time axis. The vocal cordvibration generating unit 70 generates a vocal cord vibration of the synthesized sound according to the basic frequency pattern output from the basic frequencypattern generating unit 60.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの母音長で標準化した時間軸上で設定することにより、モーラ内での基本周波数の変動のタイミングを詳細に再現し、立ち上がり、立ち下がりの角度については実時間軸上の関数を用いることによって、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。さらに聞こえに大きく影響しない部分については実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。 The timing of the fluctuation of the fundamental frequency in the mora is set by setting the timing of the rise of the accent phrase and the fall of the accent kernel on the time axis standardized by the vowel length of the mora, which greatly affects the naturalness of the speech. By using a function on the real time axis for the rise and fall angles, a stable basic frequency pattern with stable rise and fall without being affected by the difference in time length due to phonemes. To achieve high naturalness. Further, by interpolating on the real time axis for a portion that does not greatly affect hearing, the discontinuity when controlling in units of mora can be eliminated, and the fundamental frequency pattern database can be made smaller.

（実施の形態３）
本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図は、上記実施の形態２のデータベース１５０ａが、モーラ時間長標準化基本周波数データベース５０がアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、各モーラの母音部の時間長で標準化したモーラ毎の母音部の基本周波数パタンとアクセント句の先頭基本周波数を記憶する母音時間長標準化基本周波数データベース１５０ｂに置き換わった以外は図４と同様であるので、図示を省略する。(Embodiment 3)
The functional block diagram of the apparatus showing another embodiment related to the present invention is that thedatabase 150a of the second embodiment is the mora time length standardized fundamental frequency database 50 is the number of accent phrase mora, accent type, phoneme In the vowel time length standardized fundamental frequency database 150b for storing the fundamental frequency pattern of the vowel part for each mora and the beginning fundamental frequency of the accent phrase standardized by the time length of the vowel part of each mora with respect to the conditions of the prosodic determinants such as columns Since it is the same as that of FIG. 4 except having replaced, illustration is abbreviate | omitted.

図５は本発明による基本周波数パタンの一例である。 FIG. 5 is an example of the fundamental frequency pattern according to the present invention.

まず、文字列入力部１０から音声に変換されるべき文字列（図５に示す、「ｏＮｓｅ−ｇｏ−ｓｅ−」の文字列）が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各モーラの母音時間長または単母音音節、撥音あるいは長音における母音相当部の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 First, a character string to be converted to speech (a character string “oNse-go-se-” shown in FIG. 5) is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, and indicates the number of mora and accent type of each accent phrase. Prosodic information and phoneme information indicating a phoneme string are output to the fundamental frequencypattern generation unit 60. The timelength setting unit 40 refers to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and determines the time length of the vowel corresponding part in the vowel time length or single vowel syllable, repelling or long sound of each mora. The time length information is output to the fundamental frequencypattern generation unit 60. The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40.

まず、図５中のＡのようにアクセント句の先頭基本周波数を母音時間長標準化基本周波数データベース１５０ｂより取得する。次に図５中のａ）のようにアクセント句の先頭モーラの母音部の基本周波数パタンを母音時間長標準化基本周波数データベース１５０ｂより取得する。本例では第１モーラは単母音音節であるので図５中のａ）のように当該モーラの時間長の後半部に対して母音時間長標準化基本周波数データベース１５０ｂより取得した基本周波数パタンを適用する。ｂ)、ｃ),ｄ),ｅ),ｆ),ｇ),ｈ)についても同様に当該モーラの母音部の基本周波数パタンを母音時間長標準化基本周波数データベース１５０ｂより取得する。撥音であるｂ)、長音であるｄ),ｆ),ｈ)についてもａ）と同様に当該モーラの時間長の後半部にたいして母音時間長標準化基本周波数データベース１５０ｂより取得した基本周波数パタンを適用する。次に単母音音節、撥音、長音の前半部あるいは有声子音のａ')、ｂ')、ｄ')、ｅ'),ｆ')、ｈ')の基本周波数を前後の基本周波数により、実時間軸で線形補間を用いて生成する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 First, as shown by A in FIG. 5, the head fundamental frequency of the accent phrase is acquired from the vowel time length standardized fundamental frequency database 150b. Next, as shown in a) of FIG. 5, the fundamental frequency pattern of the vowel part of the head mora of the accent phrase is acquired from the vowel time length standardized fundamental frequency database 150b. In this example, since the first mora is a single vowel syllable, the fundamental frequency pattern obtained from the vowel time length standardized fundamental frequency database 150b is applied to the latter half of the time length of the mora as shown in FIG. . Similarly, for b), c), d), e), f), g), and h), the fundamental frequency pattern of the vowel part of the mora is acquired from the vowel time length standardized fundamental frequency database 150b. Similarly to a), the fundamental frequency pattern obtained from the vowel time length standardized fundamental frequency database 150b is applied to the latter part of the time length of the mora as in the case of a) for b) being repellent and d), f) and h) being long sound. . Next, the basic frequencies of a '), b'), d '), e'), f '), h') of a single vowel syllable, repellent sound, long sound or voiced consonant are determined by the fundamental frequencies before and after. Generate using linear interpolation on the time axis. The vocal cordvibration generating unit 70 generates a vocal cord vibration of the synthesized sound according to the basic frequency pattern output from the basic frequencypattern generating unit 60.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングと角度を当該モーラの母音長で標準化した基本周波数パタンを当てはめることにより、モーラ内での基本周波数の変動を詳細に再現し、高い自然性を実現するとともに、聞こえに大きく影響しない部分については実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。 By applying the basic frequency pattern that standardizes the rise and timing of the accent phrase and the fall at the accent core with the vowel length of the mora, which greatly affects the naturalness of speech, the fluctuation of the fundamental frequency within the mora can be reduced. It reproduces in detail, realizes high naturalness, and by interpolating on the real time axis for parts that do not greatly affect hearing, eliminates discontinuity when controlling in mora units, and also has a basic frequency pattern database It can be made smaller.

（実施の形態４）
実施の形態４においては母音時間長標準化基本周波数データベース１５０ａはアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、Ａ）先頭基本周波数、Ｂ）立ち上がり基準点、Ｃ）立ち下がり基準点（アクセント核）、Ｄ）立ち下がり基準点（アクセント核の直後）、Ｅ）アクセント句末基準点、およびＦ）語尾基準点を、各基準点を含むモーラの母音時間長に対する相対位置で記憶する母音時間長標準化基本周波数データベースである。これ以外は装置の構成については図４と同様である。図６は本発明による基本周波数パタンの一例である。以下その動作を述べる。(Embodiment 4)
In the fourth embodiment, the vowel time length standardizedfundamental frequency database 150a includes A) a leading fundamental frequency, B) a rising reference point, and C) with respect to conditions of prosodic determinants such as the number of mora of accent phrases, accent types, and phoneme strings. Relative to falling reference point (accent kernel), D) falling reference point (immediately after accent kernel), E) accent phrase end reference point, and F) ending reference point with respect to the vowel duration of the mora including each reference point It is a vowel time length standardized fundamental frequency database stored at a position. Other than this, the configuration of the apparatus is the same as that of FIG. FIG. 6 shows an example of the fundamental frequency pattern according to the present invention. The operation will be described below.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。まず、アクセント句のモーラ数とアクセント型および音韻列等により、母音時間長標準化基本周波数データベース１５０ａよりＡ）からＦ）の基準点を取得する。次に各基準点を対応するモーラの母音長に対する相対位置に設定する。Ａ）先頭基本周波数からＢ）立ち上がり基準点までの間を実時間軸上の関数を用いて生成する。さらにＢ）以降の各基準点の間の基本周波数パタンを実時間軸上の直線で補間することにより生成する。 First, a character string to be converted into speech is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, and indicates the number of mora and accent type of each accent phrase. Prosodic information and phoneme information indicating a phoneme string are output to the fundamental frequencypattern generation unit 60. The timelength setting unit 40 sets the time length of each phoneme with reference to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and outputs the time length information to the fundamental frequencypattern generation unit 60. . The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40. First, the reference points of A) to F) are acquired from the vowel time length standardizedfundamental frequency database 150a based on the number of mora of the accent phrase, the accent type and the phoneme string. Next, each reference point is set to a relative position with respect to the vowel length of the corresponding mora. A function from the top fundamental frequency to B) the rising reference point is generated using a function on the real time axis. Further, B) is generated by interpolating the fundamental frequency pattern between the reference points after that with a straight line on the real time axis.

声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 The vocal cordvibration generating unit 70 generates a vocal cord vibration of the synthesized sound according to the basic frequency pattern output from the basic frequencypattern generating unit 60.

（実施の形態５）
図７は本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図であり、母音時間長標準化基本周波数データベース１５０ａがアクセント句のモーラ数、アクセント型の条件についてａ）立ち上がり基準点、ｂ）立ち下がり基準点（アクセント核）、ｃ）立ち下がり基準点（アクセント核の直後）、ｄ）アクセント句末基準点、およびｅ）語尾基準点を、各基準点を含むモーラの母音あるいは母音相当部の時間長に対する相対位置で記憶し、音韻あるいは音素列による基本周波数の微細な変動を母音時間長標準化基本周波数データベース１５０ａに記憶された各基準点および基準点の間を補間した値との差を音素の時間長で標準化して記憶するマイクロプロソディデータベース２５０がつけ加わった以外は図４と同様である。(Embodiment 5)
FIG. 7 is a functional block diagram of an apparatus showing another embodiment of the present invention related to the present invention, in which the vowel time length standardizedfundamental frequency database 150a shows the number of accent phrase mora and accent type conditions. B) a falling reference point (accent kernel), c) a falling reference point (immediately after the accent kernel), d) an accent phrase end reference point, and e) a ending reference point, or a mora vowel containing each reference point or A value obtained by interpolating between reference points and reference points stored in the vowel time length standardizedfundamental frequency database 150a, which is stored in a relative position with respect to the time length of the vowel equivalent part, and fine fluctuations of the fundamental frequency due to phonemes or phoneme strings 4 is the same as that shown in FIG. 4 except that amicroprocedure database 250 for storing the difference between the two is standardized by the phoneme time length.

図８はマイクロプロソディデータベース２５０に記憶されているマイクロプロソディ成分の模式図であり、図９（Ａ）〜（Ｃ）は本発明による基本周波数パタンの一例である。 FIG. 8 is a schematic diagram of micro-prosody components stored in themicro-prosody database 250, and FIGS. 9A to 9C are examples of the fundamental frequency pattern according to the present invention.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各モーラの音素ごとの時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。まず、アクセント句のモーラ数とアクセント型により、母音時間長標準化基本周波数データベースより、基本周波数が最大値をとるモーラの母音相当部を４等分した３番目の区間中央のａ）立ち上がり基準点、アクセント核に当たるモーラの母音相当部を４等分した３番目の区間中央のｂ）立ち下がり基準点、アクセント核の次のモーラの母音相当部を４等分した３番目の区間中央のｃ）立ち下がり基準点、アクセント句の最終モーラの母音相当部を４等分した２番目の区間中央のｄ）アクセント句末基準点、および最終モーラの母音相当部を４等分した３番目の区間中央のｅ）語尾基準点を取得する。 First, a character string to be converted into speech is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, and indicates the number of mora and accent type of each accent phrase. Prosodic information and phoneme information indicating a phoneme string are output to the fundamental frequencypattern generation unit 60. The timelength setting unit 40 sets the time length for each phoneme of each mora with reference to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and uses the time length information as the basic frequencypattern generation unit 60. Output to. The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40. First, according to the number of accented mora and accent type, a) rising reference point in the middle of the third section obtained by dividing the vowel equivalent part of mora having the maximum fundamental frequency from the vowel time length standardized fundamental frequency database. B) In the middle of the third section that divides the vowel equivalent part of the mora that hits the accent kernel into four equal parts c) In the middle of the third section that divides the vowel equivalent part of the mora next to the accent nucleus into four equal parts D) The center of the second section where the vowel equivalent part of the final mora of the accent phrase is divided into four equal parts d) The center of the third section where the vowel equivalent part of the accent phrase and the vowel equivalent part of the final mora are divided into four equal parts e) Acquire the ending reference point.

次に各基準点を対応するモーラの音素時間長に対する相対位置に設定する。ａ）立ち上がり基準点が最大値となるようアクセント句の先頭からａ）立ち上がり基準点までを実時間軸上でかつ対数周波数軸に対する臨界制動２次線形系で補間する。ａ）からｅ）の各基準点の間を各区間ごとに２点間を実時間軸上でかつ対数周波数軸に対する臨界制動２次線形系で補間し、図９（Ａ）のような基本周波数パタンを生成する。次にマイクロプロソディデータベース２５０より各音素に対応する基本周波数の微細な変動を取得し、各音素の時間長に合わせて伸長圧縮し、図９（Ｂ）のように適用する。図９（Ａ）の基本周波数パタンに、同図（Ｂ）の微細な変動を加え、同図（Ｃ）のような基本周波数パタンを生成する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 Next, each reference point is set at a relative position with respect to the phoneme time length of the corresponding mora. a) Interpolate from the beginning of the accent phrase to the rising reference point on the real time axis with a critical braking quadratic linear system with respect to the logarithmic frequency axis so that the rising reference point becomes the maximum value. A basic frequency as shown in FIG. 9A is obtained by interpolating between the reference points of a) to e) for each section between two points on the real time axis and a critical braking quadratic linear system with respect to the logarithmic frequency axis. Generate a pattern. Next, fine fluctuations in the fundamental frequency corresponding to each phoneme are acquired from themicroprocedure database 250, and decompressed and compressed according to the time length of each phoneme, and applied as shown in FIG. 9B. 9B is added to the basic frequency pattern of FIG. 9A to generate a basic frequency pattern as shown in FIG. The vocal cordvibration generating unit 70 generates a vocal cord vibration of the synthesized sound according to the basic frequency pattern output from the basic frequencypattern generating unit 60.

アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの音素の時間長で標準化した軸上で設定することによりモーラ内での基本周波数の変動のタイミングを詳細に再現し、さらに音声の自然性と明瞭性に影響する、基本周波数の微細な変動を加えることにより高い自然性と明瞭性を実現する。 By setting the timing of the accent phrase rise and fall at the accent core on the axis that is standardized by the time length of the mora phoneme, the timing of fluctuation of the fundamental frequency in the mora can be reproduced in detail, and the voice High naturalness and clarity are achieved by adding subtle variations in the fundamental frequency that affect naturalness and clarity.

（実施の形態６）
図１０は本発明の一実施の形態を示す装置の機能ブロック図であり、モーラ時間長標準化基本周波数データベース５０がアクセント句のモーラ数、アクセント型の条件について基本周波数パタンのピークである第ｉモーラのａ）立ち上がり基準点、ｂ）立ち下がり基準点（アクセント核）、ｃ）立ち下がり基準点（アクセント核の直後）、アクセント句末尾のｋモーラのｄ）アクセント句末基準点を、各基準点を含むモーラの音素の時間長に対する相対位置で記憶する音素時間長標準化基本周波数データベース３５１に入れかわり、基本周波数を生成しようとするアクセント句のフレーズ内での位置ごとにアクセント句のピークと末尾の基本周波数の変形量を記憶した基本周波数パタン変形データベース３５０がつけ加わった以外は図１と同様である。(Embodiment 6)
FIG. 10 is a functional block diagram of the apparatus showing an embodiment of the present invention. The i-th mora in which the mora time length standardized fundamental frequency database 50 is the peak of the fundamental frequency pattern with respect to the number of accent phrases and the accent type condition. A) Rising reference point, b) Falling reference point (accent kernel), c) Falling reference point (immediately after the accent nucleus), k-mora d) Accent phrase end reference point Is replaced with the phoneme time length standardizedfundamental frequency database 351 stored at a relative position to the time length of the phoneme of the mora including the accent phrase peak and tail at each position in the phrase of the accent phrase for which the fundamental frequency is to be generated. 1 except that a fundamental frequency pattern deformation database 350 storing the deformation amount of the fundamental frequency is added. It is.

図１１、図１２、図１３および図１４は音素時間長標準化基本周波数データベース３５１に基本周波数生成しようとするアクセント句のモーラ数およびアクセント型に対応する基本周波数パタンのデータがない場合に生成する基本周波数パタンの模式図である。図１５は複数のアクセント句の基本周波数パタンを接続して生成した文の基本周波数パタンの模式図である。以下その動作を述べる。 11, FIG. 12, FIG. 13 and FIG. 14 are basics generated when there is no fundamental frequency pattern data corresponding to the number of mora of accent phrases and accent types to be generated in the phoneme time length standardizedbasic frequency database 351. It is a schematic diagram of a frequency pattern. FIG. 15 is a schematic diagram of a basic frequency pattern of a sentence generated by connecting basic frequency patterns of a plurality of accent phrases. The operation will be described below.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 First, a character string to be converted into speech is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, and indicates the number of mora and accent type of each accent phrase. Prosodic information and phoneme information indicating a phoneme string are output to the fundamental frequencypattern generation unit 60. The timelength setting unit 40 sets the time length of each phoneme with reference to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and outputs the time length information to the fundamental frequencypattern generation unit 60. . The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40.

まず、アクセント句のモーラ数とアクセント型および音韻列等により、音素時間長標準化基本周波数データベース３５１よりａ）立ち上がり基準点、ｂ）立ち下がり基準点、ｃ）立ち下がり基準点、ｄ）アクセント句末基準点、あるいは、ｄ’）最終モーラを取得する。 First, a) rise reference point, b) fall reference point, c) fall reference point, d) accent phrase end, from phoneme time length standardizedfundamental frequency database 351, depending on the number of accent phrase mora, accent type, phoneme sequence, etc. Reference point, or d ') Get final mora.

基本周波数を生成しようとするアクセント句のモーラ数、アクセント型に対応する基本周波数パタンのデータが音素時間長標準化基本周波数データベース３５１にない場合は、基本周波数を生成しようとするアクセント句のモーラ数がｎモーラ、アクセント型がｍ型とすると、ｍがｉ＋１以下の場合は図１１（Ａ）のように、アクセント型がｍ型でモーラ数がｎに最も近いｌモーラｍ型の基本周波数パタンのａ）からｄ）を音素時間長標準化基本周波数データベース３５１より取得し、図１１（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｄ）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。If the number of accent phrases to be generated is not in the phoneme time length standardizedfundamental frequency database 351, the number of accent phrases to be generated is determined. n mora, the accent type is the m-type, m is i + 1 following as shown in FIG. 11 (a) if, the number of moras accent type in type m of the fundamental frequency pattern of the nearestl mora m type n a ) To d) are obtained from the phoneme time length standardizedfundamental frequency database 351, and d) obtained from the phoneme time length standardizedfundamental frequency database 351 as shown in FIG. It is set as a reference point from the (n−k + 1) th mora to the nth mora.

ｍがｉ＋１より大きくｎ−ｋ以下の場合は図１２（Ａ）のように、アクセント核のモーラ位置ｊがｉ＋１より大きくｌ−ｋ以下で、モーラ数がｎに最も近いｌモーラｊ型の基本周波数パタンのａ）からｄ）を音素時間長標準化基本周波数データベース３５１より取得し、図１２（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｂ）とｃ）を基本周波数を生成しようとするアクセント句の第ｍモーラと第ｍ＋１モーラの基準点として設定し、音素時間長標準化基本周波数データベース３５１から取得したｄ）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。 When m is greater than i + 1 and less than or equal to nk, the mora position j of the accent nucleus is greater than i + 1 and less than or equal to lk and the number of mora is closest to n, as shown in FIG. The frequency patterns a) to d) are obtained from the phoneme time length standardizedfundamental frequency database 351, and the fundamental frequencies b) and c) obtained from the phoneme time length standardizedfundamental frequency database 351 as shown in FIG. 12B are generated. The reference point of the mth mora and m + 1th mora of the accent phrase to be obtained is set, and d) obtained from the phoneme time length standardizedfundamental frequency database 351 is obtained from the n−k + 1th mora of the accent phrase to be generated. Set as a reference point for the nth mora.

ｍがｎ−ｋより大きい場合は図１３（Ａ）のように、アクセント核のモーラ位置ｊがｌ−ｋより大きくモーラ数がｎに最も近いｌモーラｊ型の基本周波数パタンのａ）からｄ’）を音素時間長標準化基本周波数データベース３５１より取得し、図１３（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｂ）とｃ）を含むｄ’）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。基本周波数を生成しようとするアクセント句がｎモーラ平板型の場合図１４（Ａ）のように、アクセント型が平板型でモーラ数がｎに最も近いｌモーラ平板型の基本周波数パタンのａ）とｄ）を音素時間長標準化基本周波数データベース３５１より取得し、図１３（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｄ）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。 When m is larger than n−k, as shown in FIG. 13A, the mora position j of the accent nucleus is larger than l−k and the number of mora is closest to n. ') Is obtained from the phoneme time length standardizedfundamental frequency database 351, and d') including b) and c) obtained from the phoneme time length standardizedfundamental frequency database 351 is generated as shown in FIG. 13B. Is set as a reference point of the n−k + 1 mora to the nth mora of the accent phrase. When the accent phrase for generating the fundamental frequency is an n-mora flat plate type, as shown in FIG. 14A, the accent frequency is a flat plate type and the m-mora plate basic frequency pattern closest to n is a) d) is obtained from the phoneme time length standardizedfundamental frequency database 351, and d) obtained from the phoneme time length standardizedfundamental frequency database 351 as shown in FIG. It is set as a reference point for k + 1 mora to nth mora.

次に、音素時間長標準化基本周波数データベース３５１より取得されたあるいは音素時間長標準化基本周波数データベース３５１より取得された基準点より生成されたアクセント句の基本周波数パタンを基本周波数変形データベース３５０にアクセント句のフレーズ内での位置ごとに記憶された変形量に従って、各アクセント句の基本周波数の最大値、ａ）からｄ）あるいはｄ’）の基準点の基本周波数を変更する。 Next, the basic frequency pattern of the accent phrase acquired from the phoneme time length standardizedfundamental frequency database 351 or generated from the reference point acquired from the phoneme time length standardizedfundamental frequency database 351 is stored in the basic frequency transformation database 350. According to the deformation amount stored for each position in the phrase, the maximum value of the fundamental frequency of each accent phrase, the fundamental frequency of the reference point of a) to d) or d ′) is changed.

まず基本周波数変形データベース３５０に記憶された第１アクセント句の変形量により図１５中のＡ）のように、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の９０％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。第２アクセント句については図１５中のＢ）のように、ａ）の基本周波数を音素時間長標準化基本周波数データベース３５１より取得された基本周波数の７５％の値に変更し、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の７０％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。同様に第３アクセント句も図１５中のＣ）のようにａ）の基本周波数を音素時間長標準化基本周波数データベース３５１より取得された基本周波数の７０％の値に変更し、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の６８％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。 First, the difference between the fundamental frequencies of a) and d) is acquired from the phoneme time length standardizedfundamental frequency database 351 as shown in A) of FIG. 15 according to the deformation amount of the first accent phrase stored in the fundamental frequency modification database 350. The basic frequencies of b), c) and d) are changed so as to be 90% of the difference between the fundamental frequencies. For the second accent phrase, as shown in B) in FIG. 15, the fundamental frequency of a) is changed to a value of 75% of the fundamental frequency acquired from the phoneme time length standardizedfundamental frequency database 351, and a) and d). The fundamental frequencies of b), c) and d) are changed so that the fundamental frequency difference becomes 70% of the fundamental frequency difference obtained from the phoneme time length standardizedfundamental frequency database 351. Similarly, the third accent phrase is also changed to a value of 70% of the fundamental frequency obtained from the phoneme time length standardizedfundamental frequency database 351 as shown in C) of FIG. 15, and a) and d). The fundamental frequencies b), c) and d) are changed so that the fundamental frequency difference becomes 68% of the fundamental frequency difference obtained from the phoneme time length standardizedfundamental frequency database 351.

基本周波数変形データベース３５０に第ｎアクセント句に対応する変形量が記憶されていない場合、アクセント位置の値がｎより小さく、最もｎに近いアクセント位置に対応する変形量を適用する。本例では第４アクセント句の変形量が基本周波数変形データベース３５０に記憶されていない場合を示す。 When the deformation amount corresponding to the nth accent phrase is not stored in the fundamental frequency deformation database 350, the deformation amount corresponding to the accent position closest to n is applied, where the value of the accent position is smaller than n. In this example, the case where the deformation amount of the fourth accent phrase is not stored in the fundamental frequency deformation database 350 is shown.

アクセント位置の値が４より小さく、最も４に近い第３アクセント句の変形量を適用し図１５中のＤ）のように第３アクセント句と同様の変形を加える。フレーズ終端である最終アクセント句については、基本周波数変形データベース３５０より最終アクセント句に対応する変形量を取得し、図１５中のＥ）のようにａ）の基本周波数を音素時間長標準化基本周波数データベース３５１より取得された基本周波数の４８％の値に変更し、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の６０％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。 By applying the deformation amount of the third accent phrase whose accent position value is smaller than 4 and closest to 4, the same deformation as the third accent phrase is applied as shown in FIG. For the final accent phrase that is the end of the phrase, the deformation amount corresponding to the final accent phrase is acquired from the basic frequency modification database 350, and the fundamental frequency of a) is phoneme time length standardized fundamental frequency database as shown in E) of FIG. So that the difference between the fundamental frequencies a) and d) is 60% of the difference between the fundamental frequencies obtained from the phoneme time length standardizedfundamental frequency database 351. Change the fundamental frequency of b), c) and d).

次に、各アクセント句について、実施の形態２あるいは実施の形態４のようにアクセント句の先頭からａ）までの基本周波数を実時間軸上の関数を用いて生成し、さらに各基準点の間を実時間軸上で補間し、アクセント句終了点までの基本周波数パタンを生成する。 Next, for each accent phrase, the fundamental frequency from the beginning of the accent phrase to a) is generated using a function on the real time axis as in the second embodiment or the fourth embodiment, and between the reference points. Are interpolated on the real time axis to generate a fundamental frequency pattern up to the end point of the accent phrase.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの音素時間長で標準化した時間軸上で設定することにより、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。さらに基本周波数パタンの拡張を行うことによりデータベースの縮小が可能になる。また、フレーズ内でのアクセント句位置に基づいて基本周波数パタンを変形することにより、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By setting the timing of accent phrase rise and accent kernel fall on the time axis that is standardized with the phoneme time length of the corresponding mora, which greatly affects the naturalness of speech, it is influenced by the difference in time length due to phonemes. It is possible to obtain a smooth fundamental frequency pattern with stable rise and fall without realizing high naturalness. Furthermore, the database can be reduced by expanding the fundamental frequency pattern. Further, by transforming the basic frequency pattern based on the accent phrase position in the phrase, a unit as a phrase can be created, and natural sentence speech can be realized.

（実施の形態７）
図１７は複数のアクセント句の基本周波数パタンを接続して生成した文の基本周波数パタンの模式図である。装置の構成については図１に同じである。以下その動作を述べる。(Embodiment 7)
FIG. 17 is a schematic diagram of a basic frequency pattern of a sentence generated by connecting basic frequency patterns of a plurality of accent phrases. The configuration of the apparatus is the same as in FIG. The operation will be described below.

図１７に示す様に、まず、第１アクセント句１７０１のモーラ数、アクセント型に対応する基本周波数パタン１７１１をモーラ時間長標準化基本周波数データベース５０より取得し、適用する。 As shown in FIG. 17, first, thefundamental frequency pattern 1711 corresponding to the number of mora and accent type of thefirst accent phrase 1701 is acquired from the mora time length standardized fundamental frequency database 50 and applied.

第１アクセント句１７０１の基本周波数の最大値ａを通り、第ｎアクセント句の位置を示すｉの値が増加するごとに、第１アクセント句１７０１の最大値ａが１０％低下するような、第ｎアクセント句に対するアクセント句の基本周波数最大値を示す式１を求める。
（数１）
（−０．１ｉ＋１）ａ …式１
但し、ａは、第１アクセント句１７０１の基本周波数の最大値である。また、アクセント句数ｉは、第ｎアクセント句が、第１アクセント句から数えて、何番目のアクセント句であるかを示す数であり、ｎ−１となる。The maximum value a of thefirst accent phrase 1701 decreases by 10% as the value of i indicating the position of the nth accent phrase increases through the maximum value a of the fundamental frequency of thefirst accent phrase 1701.Formula 1 which shows the maximum fundamental frequency of the accent phrase with respect to the n accent phrase is obtained.
(Equation 1)
(−0.1i + 1) aEquation 1
However, a is the maximum value of the fundamental frequency of thefirst accent phrase 1701. The accent phrase number i is a number indicating the number of the accent phrase from the first accent phrase and the nth accent phrase, which is n-1.

さらに第１アクセント句１７０１のアクセント句末の周波数ｂを通り、第ｎアクセント句の位置を示すｉの値が増加する毎に、第１アクセント句１７０１のアクセント句末の周波数ｂが５％低下するような、第ｎアクセント句に対するアクセント句末の周波数を示す式２を求める。
（数２）
（−０．０５ｉ＋１）ｂ …式２
但し、ｂは、第１アクセント句１７０１のアクセント句末の周波数である。Further, every time the value of i indicating the position of the nth accent phrase passes through the frequency b of the accent phrase end of thefirst accent phrase 1701, the frequency b of the accent phrase end of thefirst accent phrase 1701 decreases by 5%.Equation 2 indicating the frequency of the accent phrase end with respect to the nth accent phrase is obtained.
(Equation 2)
(−0.05i + 1)b Equation 2
Here, b is the frequency at the end of the accent phrase of thefirst accent phrase 1701.

次に、第２アクセント句１７０２のモーラ数、アクセント型に対応する基本周波数パタン１７１２（図中、点線で表した）をモーラ時間長標準化基本周波数データべース５０より取得する。第２アクセント句のアクセント句数ｉは１であるから、これを式１に代入して、基本周波数パタン１７１２の変形後の最大値ａ₂を求める。同様にして、式２より、基本周波数パタン１７１２の変形後のアクセント句末の周波数ｂ₂を求める。Next, the fundamental frequency pattern 1712 (represented by a dotted line in the figure) corresponding to the number of mora and accent type of thesecond accent phrase 1702 is acquired from the mora time length standardized fundamental frequency database 50. Since the number i of accent phrases of the second accent phrase is 1, this is substituted intoEquation 1 to obtain the maximum value a₂ after modification of thefundamental frequency pattern 1712. Similarly, the frequency b₂ at the end of the accent phrase after transformation of thefundamental frequency pattern 1712 is obtained fromEquation 2.

この様にして求めた変形後の最大値ａ₂と、変形後のアクセント句末の周波数ｂ₂とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタン１７１２を変形した後、変形後の基本周波数パタン１７１３を第２アクセント句１７０２の基本周波数パタンとして用いる。Thefundamental frequency pattern 1712 obtained from the mora time length standardized fundamental frequency database 50 so as to coincide with the maximum value a₂ after modification and the frequency b₂ at the end of the accent phrase after modification. Then, the modifiedfundamental frequency pattern 1713 is used as the fundamental frequency pattern of thesecond accent phrase 1702.

第ｎアクセント句についても、当該アクセント句が最終アクセント句（文末）でない場合、第ｎアクセント句のモーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得する。そして、その取得した基本周波数パタンの最大値が、式１より得られた値に一致し、且つ、その取得した基本周波数パタンのアクセント句末の周波数が、式２より得られた値に一致する様に、上記データベース５０より取得した基本周波数パタンを変形し、これを第ｎアクセント句の基本周波数パタンとして用いる。 For the nth accent phrase, if the accent phrase is not the final accent phrase (end of sentence), the number of mora of the nth accent phrase and the fundamental frequency pattern corresponding to the accent type are obtained from the mora time length standardized fundamental frequency database 50. To do. Then, the maximum value of the acquired fundamental frequency pattern matches the value obtained fromExpression 1, and the frequency at the end of the accent phrase of the acquired fundamental frequency pattern matches the value obtained fromExpression 2. Similarly, the fundamental frequency pattern acquired from the database 50 is transformed and used as the fundamental frequency pattern of the nth accent phrase.

更に、基本周波数を生成しようとするアクセント句が文末である場合、モーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得し、その最大値が当該アクセント句の直前のアクセント句の最大値を１５％低下させた値に一致し、しかも、アクセント句末の周波数が直前のアクセント句のアクセント句末を１０％低下させた値に一致するように、上記データベース５０より取得した基本周波数パタンを変形し、これを適用する。 Further, when the accent phrase for generating the fundamental frequency is at the end of the sentence, the fundamental frequency pattern corresponding to the number of mora and the accent type is obtained from the mora time length standardized fundamental frequency database 50, and the maximum value is the accent value. The maximum value of the accent phrase immediately before the phrase matches the value reduced by 15%, and the frequency of the accent phrase end matches the value obtained by reducing the accent phrase end of the immediately preceding accent phrase by 10%. The fundamental frequency pattern acquired from the database 50 is transformed and applied.

尚、対応する基本周波数パタンのデータがモーラ時間長標準化基本周波数データベース５０にない場合は、実施の形態６のようにアクセント句の基本周波数パタンを生成し、これを変形する。 If the corresponding fundamental frequency pattern data is not in the mora time length standardized fundamental frequency database 50, an accent phrase fundamental frequency pattern is generated and modified as in the sixth embodiment.

当該モーラのモーラ時間長で標準化した時間軸上で設定することにより、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。また、フレーズ内でのアクセント句位置に基づいて基本周波数パタンを変形することにより、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By setting on the time axis standardized by the mora time length of the mora, it is possible to obtain a smooth fundamental frequency pattern with stable rise and fall without being affected by the difference in time length due to phonemes. Realize naturalness. Further, by transforming the basic frequency pattern based on the accent phrase position in the phrase, a unit as a phrase can be created, and natural sentence speech can be realized.

尚、上記実施の形態では、基本周波数を生成しようとするアクセント句が、文末である場合のみ、直前のアクセント句の所定位置の周波数を基準として、その周波数を所定の比率で低下させて用いる場合について述べた。そこで、上記実施の形態の変形例として、文末以外に存在しているアクセント句についても、上記例と同様のルールで各周波数値を圧縮しても良い。即ち、この場合、例えば、図１８に示す様に、文末を除く、第２アクセント句から第ｎアクセント句については、それぞれ、直前のアクセント句の最大値を１０％低下させた値（図中、例えば、ａ₂）と、直前のアクセント句のアクセント句末の周波数を５％低下させた値（図中、例えば、ｂ₂）を求める。In the above embodiment, only when the accent phrase for which the fundamental frequency is to be generated is the end of the sentence, the frequency is used at a predetermined ratio with the frequency at the predetermined position of the immediately preceding accent phrase as a reference. Said. Therefore, as a modification of the above embodiment, each frequency value may be compressed according to the same rule as in the above example for an accent phrase other than the end of the sentence. That is, in this case, for example, as shown in FIG. 18, for the second accent phrase to the nth accent phrase excluding the sentence end, a value obtained by reducing the maximum value of the immediately preceding accent phrase by 10% (in the figure, For example, a₂ ) and a value (for example, b_{2 in the} figure) obtained by reducing the frequency of the accent phrase end of the immediately preceding accent phrase by 5% are obtained.

そして、例えば第２アクセント句については、この様にして求めた変形後の最大値ａ₂と、変形後のアクセント句末の周波数ｂ₂とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタン１７１２を変形した後、変形後の基本周波数パタン１７１３を第２アクセント句１７０２の基本周波数パタンとして用いる。第ｎアクセント句についても、これと同様である。尚、基本周波数を生成しようとするアクセント句が、文末である場合は、図１７と同様の方法を用いる。For example, for the second accent phrase, the mora time length standardized fundamental frequency data should be matched with the maximum value a₂ after modification thus obtained and the frequency b₂ at the end of the accent phrase after modification. After thefundamental frequency pattern 1712 acquired from the source 50 is transformed, the transformedfundamental frequency pattern 1713 is used as the fundamental frequency pattern of thesecond accent phrase 1702. The same applies to the nth accent phrase. If the accent phrase for which the fundamental frequency is to be generated is the end of the sentence, the same method as in FIG. 17 is used.

（実施の形態８）
図１９は複数のアクセント句の基本周波数パタンを接続して生成した文の基本周波数パタンの模式図である。装置の構成については図１に同じである。以下その動作を述べる。(Embodiment 8)
FIG. 19 is a schematic diagram of a basic frequency pattern of a sentence generated by connecting basic frequency patterns of a plurality of accent phrases. The configuration of the apparatus is the same as in FIG. The operation will be described below.

図１９に示す様に、まず、第１アクセント句１８０１のモーラ数、アクセント型に対応する基本周波数パタン１８１１をモーラ時間長標準化基本周波数データベース５０より取得し、適用する。 As shown in FIG. 19, first, thefundamental frequency pattern 1811 corresponding to the number of mora and accent type of the first accent phrase 1801 is obtained from the mora time length standardized fundamental frequency database 50 and applied.

第１アクセント句１８０１の基本周波数の最大値ａを通り、第１アクセント句の基本周波数の最大値ａを含むモーラ位置からのモーラ数が増えるごとにアクセント句１８０１の最大値ａが２％低下するような、累積モーラ数ｊに対するアクセント句の基本周波数最大値を示す式３を求める。
（数３）
（−０．０２ｊ＋１）ａ …式３
但し、ａは、第１アクセント句１８０１の基本周波数の最大値であり、累積モーラ数ｊは、第１アクセント句の基本周波数の最大値ａを含むモーラ位置（図中、横軸の原点とした）を基準として数えたモーラ数である。The maximum value a of the accent phrase 1801 decreases by 2% each time the number of mora from the mora position including the maximum value a of the fundamental frequency of the first accent phrase 1801 increases.Equation 3 showing the maximum value of the fundamental frequency of the accent phrase for the cumulative number of mora j is obtained.
(Equation 3)
(−0.02j + 1) aEquation 3
Where a is the maximum value of the fundamental frequency of the first accent phrase 1801, and the cumulative number of mora j is the mora position including the maximum value a of the fundamental frequency of the first accent phrase (the origin of the horizontal axis in the figure). ) Is the number of mora counted on the basis.

さらに第１アクセント句１８０１のアクセント句末の周波数ｂを通り、第１アクセント句のアクセント句末の周波数ｂを含むモーラ位置からのモーラ数が増えるごとに、第１アクセント句１８０１のアクセント句末の周波数ｂが１％低下するような、累積モーラ数ｊに対するアクセント句末の周波数を示す式４を求める。
（数４）
（−０．０１ｊ＋１）ｂ …式４
但し、ｂは、第１アクセント句１８０１のアクセント句末の周波数である。Further, every time the number of mora from the mora position including the frequency b of the accent phrase end of the first accent phrase passes through the frequency b of the accent phrase end of the first accent phrase 1801, the accent phrase end of the first accent phrase 1801 increases. Equation 4 indicating the frequency of the accent phrase ending with respect to the cumulative number of mora j such that the frequency b is reduced by 1% is obtained.
(Equation 4)
(−0.01j + 1) b Equation 4
Where b is the frequency at the end of the accent phrase of the first accent phrase 1801.

次に、第２アクセント句１８０２のモーラ数、アクセント型に対応する基本周波数パタン１８１２（図中、点線で表した）をモーラ時間長標準化基本周波数データべース５０より取得し、その最大値１８１２ａをとるモーラが、原点のモーラからｊ_2aモーラ目になることを求め、これを式３に累積モーラ数として代入して、基本周波数パタン１８１２の変形後の最大値ａ₂を求める。又、第２アクセント句１８０２のアクセント句末１８１２ｂが原点のモーラからｊ_2bモーラ目になることを求め、これを式４に累積モーラ数として代入して、基本周波数パタン１８１２の変形後のアクセント句末の周波数ｂ₂を求める。Next, the fundamental frequency pattern 1812 (represented by a dotted line in the figure) corresponding to the number of mora and accent type of thesecond accent phrase 1802 is acquired from the mora time length standardized fundamental frequency database 50, and itsmaximum value 1812a Is determined to be the j_2a mora number from the origin mora, and this is substituted inEquation 3 as the cumulative mora number to obtain the maximum value a₂ after deformation of thefundamental frequency pattern 1812. Further, theaccent phrase end 1812b of thesecond accent phrase 1802 is determined to be the j_2b mora number from the origin mora, and this is substituted as the cumulative mora number in Equation 4, so that the accent phrase after the transformation of thefundamental frequency pattern 1812 is obtained. The final frequency b₂ is obtained.

この様にして求めた変形後の最大値ａ₂と、変形後のアクセント句末の周波数ｂ₂とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタン１８１２を変形した後、これを第２アクセント句１８０２の基本周波数パタンとして用いる。Thefundamental frequency pattern 1812 obtained from the mora time length standardized fundamental frequency database 50 so as to coincide with the maximum value a₂ after modification and the frequency b₂ at the end of the accent phrase after modification. Is used as the basic frequency pattern of thesecond accent phrase 1802.

第ｎアクセント句についても、当該アクセント句が最終アクセント句（文末）でない場合、第ｎアクセント句のモーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得し、その最大値をとるモーラが、原点のモーラから数えて何モーラ目になるかを求め、これを式３に累積モーラ数として代入して基本周波数パタンの変形後の最大値を求める。更に、アクセント句末が、原点のモーラから数えて何モーラ目になるかを求め、これを式４に累積モーラ数として代入して基本周波数パタンの変形後のアクセント句末の周波数を求める。 For the nth accent phrase, if the accent phrase is not the final accent phrase (end of sentence), the number of mora of the nth accent phrase and the fundamental frequency pattern corresponding to the accent type are obtained from the mora time length standardized fundamental frequency database 50. Then, the number of mora in which the mora taking the maximum value is counted from the mora at the origin is obtained, and this is substituted inEquation 3 as the cumulative mora number to obtain the maximum value after deformation of the fundamental frequency pattern. Further, the number of mora that the accent phrase end counts from the origin mora is obtained, and this is substituted as a cumulative mora number in Equation 4 to obtain the frequency of the accent phrase end after the transformation of the basic frequency pattern.

この様にして求めた変形後の最大値と、変形後のアクセント句末の周波数とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタンを変形して、第ｎアクセント句の基本周波数パタンとして用いる。 The fundamental frequency pattern obtained from the mora time length standardized fundamental frequency database 50 is transformed so as to coincide with the maximum value after transformation obtained in this way and the frequency at the end of the accent phrase after transformation, Used as the fundamental frequency pattern of the nth accent phrase.

又、基本周波数を生成しようとするアクセント句が文末である場合、モーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得し、その最大値が当該アクセント句の直前のアクセント句の最大値を１５％低下させた値に一致し、アクセント句末の周波数が直前のアクセント句のアクセント句末を１０％低下させた値に一致するよう取得した基本周波数パタンを変形して適用する。対応する基本周波数パタンのデータがモーラ時間長標準化基本周波数データベース５０にない場合は、実施の形態６のようにアクセント句の基本周波数パタンを生成し、変形する。 If the accent phrase for generating the fundamental frequency is the end of the sentence, the fundamental frequency pattern corresponding to the number of mora and the accent type is obtained from the mora time length standardized fundamental frequency database 50, and the maximum value is the accent value. The basic frequency pattern obtained by matching the maximum value of the accent phrase immediately before the phrase with a value reduced by 15% and the frequency at the end of the accent phrase matching the value by reducing the accent phrase end of the immediately preceding accent phrase by 10% Apply by transforming. If the corresponding fundamental frequency pattern data is not in the mora time length standardized fundamental frequency database 50, an accent phrase fundamental frequency pattern is generated and transformed as in the sixth embodiment.

当該モーラのモーラ時間長で標準化した時間軸上で設定することにより、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。また、フレーズ内での累積モーラ位置に基づいて基本周波数パタンを変形することにより、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By setting on the time axis standardized by the mora time length of the mora, it is possible to obtain a smooth fundamental frequency pattern with stable rise and fall without being affected by the difference in time length due to phonemes. Realize naturalness. In addition, by transforming the fundamental frequency pattern based on the accumulated mora position in the phrase, it is possible to create a unit as a phrase and realize natural sentence speech.

（実施の形態９）
図１６は本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図であり、モーラ時間長標準化基本周波数データベース５０が第１アクセント句から第３アクセント句についてアクセント句が文末であるか否か、およびアクセント句のモーラ数、アクセント型、音韻列等の、韻律を決定する要因によって分類された、各モーラの母音部の時間長で標準化したモーラ毎の母音部の基本周波数パタンを記憶するアクセント句位置基本周波数データベース４５０に置き換わった以外は図１と同様である。(Embodiment 9)
FIG. 16 is a functional block diagram of an apparatus showing another embodiment of the present invention related to the present invention. In the mora time length standardized fundamental frequency database 50, the accent phrase is the end of the sentence from the first accent phrase to the third accent phrase. And the basic frequency pattern of the vowel part for each mora, standardized by the time length of the vowel part of each mora, classified by the factors that determine the prosody, such as the number of mora in the accent phrase, the accent type, and thephoneme string 1 except that the accent phrase positionbasic frequency database 450 to be stored is replaced.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型、およびアクセント句のフレーズ内での位置を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。 First, a character string to be converted into speech is input from the characterstring input unit 10. The characterstring analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the timelength setting unit 40, divides the character string into accent phrases, the number of mora and accent type of each accent phrase, and Prosodic information indicating the position of the accent phrase in the phrase and phoneme information indicating the phoneme string are output to the fundamental frequencypattern generation unit 60.

時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各モーラの母音時間長または単母音音節、撥音あるいは長音における母音相当部の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。本例では５つのアクセント句によって構成される文の基本周波数の生成を説明する。 The timelength setting unit 40 refers to the phonemetime length database 30 based on the phoneme information input from the characterstring analysis unit 20, and determines the time length of the vowel corresponding part in the vowel time length or single vowel syllable, repelling or long sound of each mora. The time length information is output to the fundamental frequencypattern generation unit 60. The fundamental frequencypattern generation unit 60 generates a fundamental frequency pattern for each accent phrase based on the prosodic information and phoneme information input from the characterstring analysis unit 20 and the time length information input from the timelength setting unit 40. In this example, generation of a fundamental frequency of a sentence composed of five accent phrases will be described.

まず、第１アクセント句に対して、アクセント句位置基本周波数データベース４５０より第１アクセント句で文末でない、基本周波数を生成しようとするアクセント句のモーラ数、アクセント型に対応する基本周波数パタンを取得する。第２アクセント句、第３アクセント句に対しても同様にアクセント句位置基本周波数データベース４５０より基本周波数パタンを取得する。 First, for the first accent phrase, the basic frequency pattern corresponding to the number of mora of the accent phrase that is not the end of the sentence in the first accent phrase and to generate the fundamental frequency and the accent type is acquired from the accent phrase positionfundamental frequency database 450. . Similarly, the fundamental frequency pattern is acquired from the accent phrase positionfundamental frequency database 450 for the second accent phrase and the third accent phrase.

第４アクセント句については、アクセント句位置基本周波数データベース４５０に第４アクセント句に対応する基本周波数パタンはないため、第４アクセント句にアクセント句の位置が最も近い第３アクセント句の、文末でない基本周波数パタンからモーラ数とアクセント型に対応する基本周波数パタンを取得する。 As for the fourth accent phrase, there is no fundamental frequency pattern corresponding to the fourth accent phrase in the accent phrase positionfundamental frequency database 450, so the third accent phrase whose accent phrase position is closest to the fourth accent phrase is not a sentence end basic. The fundamental frequency pattern corresponding to the number of mora and the accent type is acquired from the frequency pattern.

最終アクセント句である第５アクセント句についても、アクセント句位置基本周波数データベース４５０に該当する基本周波数パタンはないため、アクセント句の位置が最も近い第３アクセント句の、文末の基本周波数パタンからモーラ数とアクセント型に対応する基本周波数パタンを取得する。実施の形態３あるいは実施の形態４のように基本周波数パタンのない部分を実時間軸上で補間し、基本周波数パタンを生成する。 Also for the fifth accent phrase, which is the final accent phrase, there is no fundamental frequency pattern corresponding to the accent phrase positionfundamental frequency database 450, so the number of mora from the fundamental frequency pattern at the end of the sentence of the third accent phrase with the closest accent phrase position. And get the fundamental frequency pattern corresponding to the accent type. As in the third embodiment or the fourth embodiment, a portion having no fundamental frequency pattern is interpolated on the real time axis to generate a fundamental frequency pattern.

当該モーラの母音長で標準化した基本周波数パタンを利用することによりモーラ内での基本周波数の変動を詳細に再現し、アクセント句の位置、文末か否かの条件によって当てはめることによりフレーズ単位の基本周波数の変動を正確に再現できるため、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By using the fundamental frequency pattern standardized by the vowel length of the mora, the fluctuation of the fundamental frequency in the mora is reproduced in detail, and the fundamental frequency of the phrase unit is applied by applying the conditions depending on the position of the accent phrase and the end of the sentence. Can be reproduced accurately, so that phrases can be grouped and natural sentence speech can be realized.

（実施の形態１０）
図２０（Ａ）、（Ｂ）はアクセント句の基本周波数パタンを接続して文を生成する際の基本周波数パタンの接続部の模式図である。本発明に関連する他の発明一実施の形態の基本周波数パタン生成装置の構成については図１に同じである。以下その動作を述べる。(Embodiment 10)
FIGS. 20A and 20B are schematic diagrams of a basic frequency pattern connection portion when a sentence is generated by connecting the basic frequency patterns of an accent phrase. The configuration of the fundamental frequency pattern generation apparatus according to another embodiment related to the present invention is the same as that shown in FIG. The operation will be described below.

まず、基本周波数パタンを生成しようとする各アクセント句のモーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得し、適用する。実施の形態６、実施の形態７、あるいは実施の形態８の方法でアクセント句ごとにモーラ時間長標準化基本周波数データベース50より取得した基本周波数パタンを変形する。 First, the number of mora of each accent phrase for which a fundamental frequency pattern is to be generated and the fundamental frequency pattern corresponding to the accent type are acquired from the mora time length standardized fundamental frequency database 50 and applied. The fundamental frequency pattern obtained from the mora time length standardized fundamental frequency database 50 is transformed for each accent phrase by the method of the sixth embodiment, the seventh embodiment, or the eighth embodiment.

変形された各アクセント句の基本周波数パタンのうち、文末でない第ｎアクセント句につて、図２０のｅ）当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差を求める。 Of the modified fundamental frequency patterns of each accent phrase, for the nth accent phrase that is not the end of the sentence, e) in FIG. 20) the fundamental frequency of the vowel part of the last mora of the accent phrase and the first mora of the n + 1th accent phrase Find the fundamental frequency difference of the vowel part.

第ｎアクセント句と第ｎ+1アクセント句の間にポーズがない場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が４０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から４０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮し、図２０のｆ）のように第ｎアクセント句と第ｎ+1アクセント句の間を滑らかに接続する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が４０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から４０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮し、第ｎアクセント句と第ｎ+1アクセント句の間を滑らかに接続する。 When there is no pause between the nth accent phrase and the n + 1th accent phrase, the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase If the difference is 40 Hz or more and the accent nucleus of the nth accent phrase is not in the last 3 mora in the accent phrase, the n + 1th accent is the first mora of the accent phrase end reference point or the preceding mora. The fundamental frequency patterns from the mora having a fundamental frequency exceeding the fundamental frequency of the vowel part of the first mora of the phrase to the final mora of the nth accent phrase are compressed in the frequency axis direction, and the frequency axis direction is compressed. Thus, the nth accent phrase and the (n + 1) th accent phrase are smoothly connected. e) The difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the (n + 1) th accent phrase is 40 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase If it is in the last 3 mora, the mora of the accent nucleus or the preceding mora, from the mora having a fundamental frequency that exceeds the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase minus 40. The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction, and the nth accent phrase and the (n + 1) th accent phrase are smoothly connected.

第ｎアクセント句と第ｎ+1アクセント句の間に５０ｍｓｅｃ未満のポーズがある場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が５０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から５０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が５０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から５０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。 If there is a pause of less than 50 msec between the nth accent phrase and the n + 1th accent phrase, the basic frequency of the vowel part of the last mora and the vowel part of the first mora of the n + 1th accent phrase in e) If the fundamental frequency difference is 50 Hz or more and the accent nucleus of the nth accent phrase is not in the last 3 mora in the accent phrase, the nth of the first mora of the accent phrase end reference point or the preceding mora The fundamental frequency pattern from the mora having a fundamental frequency exceeding the value obtained by subtracting 50 from the fundamental frequency of the vowel part of the first mora of the +1 accent phrase to the final mora of the nth accent phrase is compressed in the frequency axis direction. The difference between the fundamental frequency of the vowel part of the last mora of the accent phrase in e) and the fundamental frequency of the vowel part of the first mora of the (n + 1) th accent phrase is 50 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase If it is in the last 3 mora, the mora of the accent core or the preceding mora, from the mora having a fundamental frequency that exceeds the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase minus 50 The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction.

第ｎアクセント句と第ｎ+1アクセント句の間に５０ｍｓｅｃ以上１００ｍｓｅｃ未満のポーズがある場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が７０Ｈｚ以上で、第ｎアクセントト句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から７０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が７０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から７０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。 If there is a pause of 50 msec or more and less than 100 msec between the nth accent phrase and the n + 1th accent phrase, e) the fundamental frequency of the vowel part of the last mora of the accent phrase and the first mora of the n + 1th accent phrase If the difference in the fundamental frequency of the vowel part is 70 Hz or more and the accent kernel of the nth accent phrase is not in the last 3 mora in the accent phrase, the first mora of the accent phrase end reference point or the preceding mora The fundamental frequency pattern from the mora having a fundamental frequency exceeding the basic frequency of the vowel part of the first mora of the n + 1th accent phrase to the final mora of the nth accent phrase is compressed in the frequency axis direction. . e) The difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the (n + 1) th accent phrase is 70 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase If it is in the last 3 mora, from the mora with the fundamental frequency exceeding the value obtained by subtracting 70 from the fundamental frequency of the vowel part of the first mora of the n + 1 accent phrase in the mora of the accent nucleus or the preceding mora The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction.

第ｎアクセント句と第ｎ+1アクセント句の間に１００ｍｓｅｃ以上１５０ｍｓｅｃ未満のポーズがある場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が８０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から８０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が８０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から７０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。 If there is a pause of 100 msec or more and less than 150 msec between the nth accent phrase and the n + 1th accent phrase, e) the fundamental frequency of the vowel part of the last mora of the accent phrase and the first mora of the n + 1th accent phrase If the difference between the fundamental frequencies of the vowel part is 80 Hz or more and the accent kernel of the nth accent phrase is not in the last 3 mora in the accent phrase, the first mora of the accent phrase end reference point or the preceding mora, A fundamental frequency pattern from a mora having a fundamental frequency exceeding a value obtained by subtracting 80 from a fundamental frequency of a vowel part of the first mora of the n + 1th accent phrase to a final mora of the nth accent phrase is compressed in the frequency axis direction. e) The difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the (n + 1) th accent phrase is 80 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase If it is in the last 3 mora, from the mora with the fundamental frequency exceeding the value obtained by subtracting 70 from the fundamental frequency of the vowel part of the first mora of the n + 1 accent phrase in the mora of the accent nucleus or the preceding mora The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction.

アクセント句単位で生成した基本周波数パタンの末尾を後続アクセント句との間のポーズ長に基づいて変形することによりアクセント句どうしの接続部を滑らかにし、自然な文音声を実現することができる。 By transforming the end of the basic frequency pattern generated in units of accent phrases based on the pose length between subsequent accent phrases, the connection between accent phrases can be smoothed, and natural sentence speech can be realized.

なお、以上の説明では、実施の形態１、３、４では補間関数として直線を用い、実施の形態２で補間関数として対数周波数軸に対する臨界制動２次線形系を用いた例で説明したが、実施の形態１、３、４に臨界制動２次線形系を用い、実施の形態２に直線を用いてもよい、またその他の実時間軸上の関数についても同様に実施可能である。 In the above description, the first, third, and fourth embodiments have been described using an example in which a straight line is used as an interpolation function, and the second embodiment uses a critical braking quadratic linear system with respect to a logarithmic frequency axis as an interpolation function. The critical braking quadratic linear system may be used in the first, third, and fourth embodiments, and a straight line may be used in the second embodiment. Other functions on the real time axis can be similarly implemented.

なお、実施の形態２においてアクセント句の先頭から、立ち上がり基準点までの基本周波数を対数周波数軸に対する臨界制動２次線形系を用いて補間し、実施の形態４で実時間軸上で表現された基本周波数パタンを当てはめることにより補間したが、実施の形態２に実時間軸上で表現された基本周波数パタンを当てはめ、実施の形態４に対数周波数軸に対する臨界制動２次線形系を用いてもよい。 In the second embodiment, the fundamental frequency from the beginning of the accent phrase to the rising reference point is interpolated using the critical braking quadratic linear system with respect to the logarithmic frequency axis, and is expressed on the real time axis in the fourth embodiment. Although interpolation was performed by applying the fundamental frequency pattern, the fundamental frequency pattern expressed on the real time axis may be applied to the second embodiment, and the critical braking quadratic linear system with respect to the logarithmic frequency axis may be used in the fourth embodiment. .

なお、実施の形態２において母音時間長標準化基本周波数データベース１５０ａは各モーラの母音部の時間長を４等分し、各区間の基本周波数の代表値を記憶するものとしたが、基本周波数パタンを各音素の時間長で標準化したものであればこれ以外のものでもよい。 In the second embodiment, the vowel time length standardizedfundamental frequency database 150a divides the time length of the vowel part of each mora into four equal parts and stores the representative value of the fundamental frequency of each section. Others may be used as long as they are standardized by the time length of each phoneme.

なお、実施の形態２、５において、アクセント立ち上がり基準点を当該モーラの母音長を４等分した３番目の区間の中央を立ち上がり基準点としたが、母音の後半に当たる相対位置であればこれ以外の値でも良い。 In the second and fifth embodiments, the accent rising reference point is set to the center of the third section obtained by dividing the vowel length of the mora into four equal parts, but any other relative position corresponding to the latter half of the vowel is used. The value of

なお、実施の形態５において母音時間長標準化基本周波数データベース１５０ａは各モーラの母音部の時間長を４等分し、各区間の基本周波数の代表値を記憶するものとしたが、基本周波数パタンを各母音の時間長で標準化したものであればこれ以外のものでもよい。 In the fifth embodiment, the vowel time length standardizedfundamental frequency database 150a divides the time length of the vowel part of each mora into four equal parts and stores the representative value of the fundamental frequency of each section. Others may be used as long as they are standardized by the time length of each vowel.

なお、実施の形態２、５において、アクセント核に当たるモーラの母音長を４等分した３番目の区間の中央と、アクセント核の次のモーラの母音長を４等分した３番目の区間の中央の２点を立ち下がり基準点としたが、母音の後半に当たる相対位置であればこれ以外の値でも良い。 In the second and fifth embodiments, the center of the third section obtained by dividing the vowel length of the mora corresponding to the accent nucleus into four equal parts and the center of the third section obtained by dividing the vowel length of the mora next to the accent nucleus into four equal parts. These two points are used as reference points for falling, but other values may be used as long as they are relative positions corresponding to the latter half of the vowel.

なお、実施の形態２、５において、アクセント句の最終モーラの母音長を４等分した２番目の区間の中央をアクセント句末基準点としたが、母音の前半に当たる相対位置であればこれ以外の値でも良い。 In the second and fifth embodiments, the center of the second section obtained by dividing the vowel length of the last mora of the accent phrase into four equal parts is used as the accent phrase end reference point. The value of

なお、実施の形態２、５において、発話の最終モーラの母音長を４等分した３番目の区間の中央を語尾基準点としたが、母音の後半に当たる相対位置であればこれ以外の値でも良い。 InEmbodiments 2 and 5, the center of the third section obtained by dividing the vowel length of the last mora of the utterance into four equals is the ending reference point. However, any other value can be used as long as it is a relative position corresponding to the latter half of the vowel. good.

なお、実施の形態５において、マイクロプロソディを付加する基礎となる基本週は素パタンを実施の形態２と同様に生成したが、実施の形態１、３、４と同様にしても良い。 In the fifth embodiment, the elementary pattern is generated in the same manner as in the second embodiment for the basic week on which the microprocedure is added, but may be the same as in the first, third, and fourth embodiments.

なお、実施の形態６において、アクセント句の基本周波数パタンを実施の形態２と同様に生成したが、実施の形態１、３、４と同様にしても良い。 In the sixth embodiment, the basic frequency pattern of the accent phrase is generated in the same manner as in the second embodiment, but may be the same as in the first, third, and fourth embodiments.

なお、実施の形態６において、基本周波数パタンの基準点をデータベースより取得された変形量に従って変更した後に補間を行ったが、補間を行った後に基本周波数パタンを変形しても良い。 In the sixth embodiment, the interpolation is performed after changing the reference point of the fundamental frequency pattern according to the deformation amount acquired from the database. However, the fundamental frequency pattern may be transformed after the interpolation.

なお、実施の形態６において、基本周波数パタンの変形量として、第１アクセント句では最大値とアクセント句末との差を９０％に圧縮したが７０％から１００％未満の範囲内の他の値でも良い。 In Embodiment 6, as the amount of deformation of the fundamental frequency pattern, the difference between the maximum value and the end of the accent phrase is compressed to 90% in the first accent phrase, but other values within the range of 70% to less than 100% But it ’s okay.

なお、実施の形態６において、基本周波数パタンの変形量として、第２アクセント句においては最大値を７５％に圧縮し、第３アクセント句、第ｎアクセント句においては最大値を７０％に圧縮したが５０％から９０％の範囲内の他の値でも良い。 In the sixth embodiment, as the amount of deformation of the fundamental frequency pattern, the maximum value is compressed to 75% in the second accent phrase, and the maximum value is compressed to 70% in the third accent phrase and the nth accent phrase. May be any other value within the range of 50% to 90%.

なお、実施の形態６において、基本周波数パタンの変形量として、第２アクセント句においては最大値とアクセント句末との差を７０％に圧縮し、第３アクセント句、第ｎアクセント句においては最大値とアクセント句末との差を６８％に圧縮したが５０％から９０％の範囲内の他の値でも良い。 In the sixth embodiment, as the amount of deformation of the fundamental frequency pattern, the difference between the maximum value and the end of the accent phrase is compressed to 70% in the second accent phrase, and the maximum in the third accent phrase and the nth accent phrase. The difference between the value and the accent phrase end is compressed to 68%, but other values within the range of 50% to 90% may be used.

なお、実施の形態６において、基本周波数パタンの変形量として、最終アクセント句については最大値を４８％に圧縮したが３０％から７０％の範囲内の他の値でも良い。 In the sixth embodiment, the maximum value for the final accent phrase is compressed to 48% as the deformation amount of the fundamental frequency pattern, but other values within the range of 30% to 70% may be used.

なお、実施の形態６において、基本周波数パタンの変形量として、最終アクセント句については最大値とアクセント句末との差を６０％に圧縮するとしたが４０％から８０％の範囲内の他の値でも良い。 In the sixth embodiment, as the amount of deformation of the basic frequency pattern, the difference between the maximum value and the end of the accent phrase is compressed to 60% for the final accent phrase, but other values within the range of 40% to 80% are used. But it ’s okay.

なお、実施の形態７において、式１のｉの係数を−０．１としたが−０．０５から−０．４の範囲内の他の値でも良い。 In the seventh embodiment, the coefficient of i inFormula 1 is set to −0.1, but other values within the range of −0.05 to −0.4 may be used.

なお、実施の形態７において、式２のｊの係数を−０．０５としたが０を最大として−０．２の範囲内の他の値でも良い。 In the seventh embodiment, the coefficient of j inExpression 2 is set to −0.05, but 0 may be maximized and may be other values within the range of −0.2.

なお、実施の形態７および実施の形態８において最終アクセント句においては、基本周波数の最大値を直前のアクセント句の最大値を１５％低下させた値としたが、１０％から４０％の範囲内の他の値でも良い。 In the seventh and eighth embodiments, in the final accent phrase, the maximum value of the fundamental frequency is a value obtained by reducing the maximum value of the immediately preceding accent phrase by 15%, but it is within the range of 10% to 40%. Other values may be used.

アクセント句末を直前のアクセント句のアクセント句末を１０％低下させた値にするとしたが、５％から４０％の範囲内の他の値でも良い。 Although the accent phrase end is set to a value obtained by lowering the accent phrase end of the immediately preceding accent phrase by 10%, other values within the range of 5% to 40% may be used.

なお、実施の形態８において、式３のｉの係数を−０．０２としたが、これに限らず、−０．０１から−０．２の範囲内の他の値でも良い。 In the eighth embodiment, the coefficient of i inExpression 3 is set to −0.02, but the present invention is not limited to this, and other values in the range of −0.01 to −0.2 may be used.

なお、実施の形態８において、式４のｊの係数を−０．０１としたが、これに限らず、−０．０１から−０．１の範囲内の他の値でも良い。 In the eighth embodiment, the coefficient of j in Expression 4 is set to −0.01. However, the present invention is not limited to this, and other values within the range of −0.01 to −0.1 may be used.

なお、実施の形態１０において実施の形態６、７、あるいは８同様にしてモーラ時間長標準化基本周波数データベース５０より取得した基本周波数パタンを変形するとしたが、実施の形態９と同様にアクセント句位置基本周波数データベース４５０よりアクセント句の位置に基づいて基本周波数パタンを取得するとしても良い。 In the tenth embodiment, the basic frequency pattern acquired from the mora time length standardized basic frequency database 50 is modified in the same manner as in the sixth, seventh, or eighth embodiment. The fundamental frequency pattern may be acquired from thefrequency database 450 based on the position of the accent phrase.

なお、実施の形態１０において第ｎアクセント句と第ｎ＋１アクセント句の間にポーズがない場合に、第ｎアクセント句の最終モーラの母音部中央と第ｎ＋１アクセント句の先頭モーラの母音部中央の基本周波数の差が４０Ｈｚ以下になるように基本周波数パタンを変形するとしたが、２０Ｈｚから６０Ｈｚの間の他の値でも良い。 In the tenth embodiment, when there is no pause between the nth accent phrase and the (n + 1) th accent phrase, the basic vowel center of the last mora of the nth accent phrase and the center of the vowel part of the first mora of the (n + 1) th accent phrase. Although the fundamental frequency pattern is modified so that the frequency difference is 40 Hz or less, other values between 20 Hz and 60 Hz may be used.

なお、実施の形態１０においてアクセント句立ち下がり、アクセント句末、語尾の基本周波数の変更の基準として、第ｎアクセント句と第ｎ＋１アクセント句の間のポーズの持続時間を５０msec未満、５０msec以上１００msec未満、１００msec以上１５０msec未満、１５０msec以上の４段階に分類したが、１ないし８の他の数の段階に分類しても良い。 In the tenth embodiment, the duration of the pause between the nth accent phrase and the (n + 1) th accent phrase is less than 50 msec, 50 msec or more and less than 100 msec as a reference for changing the fundamental frequency of the accent phrase fall, accent phrase end and ending. Although classified into four stages of 100 msec or more and less than 150 msec and 150 msec or more, it may be classified into 1 to 8 other stages.

なお、実施の形態１０において第ｎアクセント句と第ｎ＋１アクセント句の間のポーズの持続時間が１５０msec以上の場合はアクセント句立ち下がり、アクセント句末、語尾の基本周波数の変更を行わないものとしたが、変更を行うポーズの持続時間の上限は１２０msecから２００msecの間のほかの値としても良い。 In the tenth embodiment, when the duration of the pause between the nth accent phrase and the (n + 1) th accent phrase is 150 msec or more, the accent phrase falling, accent phrase end, and ending fundamental frequency are not changed. However, the upper limit of the duration of the pose to be changed may be another value between 120 msec and 200 msec.

なお、実施の形態１０においてアクセント句立ち下がり、アクセント句末、語尾の基本周波数の変更の基準として、第ｎアクセント句と第ｎ＋１アクセント句の間のポーズの持続時間を４段階に分類し、第ｎアクセント句の最終モーラの母音部中央と第ｎ＋１アクセント句の先頭モーラの母音部中央の基本周波数の差の上限をポーズの持続時間の段階毎に設定したが、ポーズの持続時間ｔに対する一次式（式５）
（数５）
ａｔ＋ｂ（Ｈｚ） …式５
ただし０＜ａ＜０．４２０＜ｂ＜６０
によって設定するとしても良い。In the tenth embodiment, the pose duration between the nth accent phrase and the (n + 1) th accent phrase is classified into four stages as a reference for changing the fundamental frequency of the accent phrase fall, accent phrase end, and ending. The upper limit of the fundamental frequency difference between the center of the vowel part of the last mora of the n accent phrase and the center of the vowel part of the first mora of the (n + 1) th accent phrase is set for each stage of the pause duration, but a linear expression for the pause duration t (Formula 5)
(Equation 5)
at + b (Hz) ...Formula 5
However, 0 <a <0.4 20 <b <60
It may be set by.

なお、本発明はプログラムによって実現し、これをフロッピー（登録商標）ディスク、光ディスク、ＩＣカード、ＲＯＭカセット等のプログラムを記録することのできる記録媒体に記録して移送することにより、独立した他のコンピュータシステムで容易に実施することができる。 The present invention is realized by a program, which is recorded on a recording medium capable of recording the program, such as a floppy (registered trademark) disk, an optical disk, an IC card, a ROM cassette, etc. It can be easily implemented in a computer system.

又、本発明の音韻は、上記実施の形態では、主にモーラに該当するものとして説明したが、これに限らず例えば、音節であっても良い。即ち、上記の様に、基本周波数データベースとして、モーラ単位又は音素単位でデータを格納している場合に限らず例えば、音節単位又は音節に含まれる音素単位でデータを格納した基本周波数データベースを用いても勿論良く、この場合でも、上記と同様の効果を発揮する。即ち、上述した全ての実施の形態において、「モーラ」を「音節」と読み替えた構成としても、上記と同様の効果を発揮する。 Moreover, although the phoneme of this invention demonstrated as what mainly corresponds to mora in the said embodiment, it is not restricted to this, For example, a syllable may be sufficient. That is, as described above, the fundamental frequency database is not limited to data stored in units of mora or phonemes, for example, using a fundamental frequency database storing data in units of syllables or phonemes included in syllables. Of course, in this case as well, the same effect as described above is exhibited. That is, in all the above-described embodiments, the same effect as described above is exhibited even when “Mora” is replaced with “syllable”.

又、上記実施の形態では、基本周波数データベースが、末尾から３モーラまでの基本周波数パタンを保持している場合について述べたが、最大限末尾から４モーラまでの基本周波数パタンを保持しておけば十分な効果を発揮する。 In the above embodiment, the case where the fundamental frequency database holds the fundamental frequency pattern from the end to 3 mora has been described. However, if the fundamental frequency pattern from the end to 4 mora is retained at the maximum, Demonstrate sufficient effect.

上記の様に、本発明の第１の方法は、アクセント句のモーラ位置毎に当該モーラの音素の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を含むモーラ、アクセント核とアクセント核の次のモーラ、およびアクセント句の末尾の１モーラあるいは複数のモーラのおのおのについて、前記のデータベースを参照して各モーラ内での基本周波数パタンを設定し、基本周波数がデータベースより設定されない区間については、データベースから設定された基本周波数の間を実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 As described above, the first method of the present invention uses the phoneme time length standardized fundamental frequency database storing the fundamental frequency pattern standardized with the time length of the phoneme of the mora for each mora position of the accent phrase. For the mora including the maximum value of the fundamental frequency, the next mora of the accent kernel and the accent nucleus, and the one or more mora at the end of the accent phrase, the fundamental frequency pattern in each mora with reference to the database. And a basic frequency pattern generation method for generating a basic frequency pattern by interpolating between the basic frequencies set from the database by a function on the real time axis for a section where the basic frequency is not set from the database.

又、第２の方法は、アクセント句のモーラ位置毎に当該モーラの音素の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を与える立ち上がり基準点、アクセントの立ち下がりを与える立ち下がり基準点、アクセント句の終了時の基本周波数を与えるアクセント句末基準点および発話終了時の基本周波数を与える語尾基準点を当該モーラの母音長に対して一定比である時間点に設定し、おのおのの基準点について前記のデータベースを参照して基本周波数を設定し、それらの基準点間の基本周波数については実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 The second method uses a phoneme time length standardized fundamental frequency database in which a fundamental frequency pattern standardized by a time length of a phoneme of the mora is stored for each mora position of the accent phrase, and the maximum value of the basic frequency of the accent phrase is determined. The mora vowel length is the rising reference point to be given, the falling reference point to give the fall of the accent, the accent reference point giving the fundamental frequency at the end of the accent phrase, and the ending reference point giving the fundamental frequency at the end of the utterance For each reference point, set the fundamental frequency by referring to the database above, and interpolate the fundamental frequency between these reference points by a function on the real time axis. This is a fundamental frequency pattern generation method for generating a fundamental frequency pattern.

又、第３の方法は、アクセント句のモーラ位置毎に当該モーラの母音あるいは母音相当部の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を含むモーラ、アクセント核とアクセント核の次のモーラ、およびアクセント句末尾の１ないし複数のモーラのおのおのについて、前記のデータベースを参照して各モーラ内での基本周波数パタンを設定し、基本周波数がデータベースより設定されない区間については、データベースから設定された基本周波数の間を実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 The third method uses a phoneme time length standardized fundamental frequency database in which a fundamental frequency pattern standardized by a time length of a vowel of the mora or a vowel equivalent is stored for each mora position of the accent phrase, and the basic frequency of the accent phrase is used. For each of the mora including the maximum value of, the next mora of the accent kernel and the accent kernel, and one or more mora at the end of the accent phrase, the fundamental frequency pattern in each mora is set with reference to the above database. A section in which the fundamental frequency is not set from the database is a fundamental frequency pattern generation method for generating a fundamental frequency pattern by interpolating between the fundamental frequencies set from the database by a function on the real time axis.

又、第４の方法は、アクセント句のモーラ位置毎に当該モーラの母音あるいは母音相当部の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を与える立ち上がり基準点、アクセントの立ち下がりを与える立ち下がり基準点、アクセント句の終了時の基本周波数を与えるアクセント句末基準点および発話終了時の基本周波数を与える語尾基準点を当該モーラの母音長に対して一定比である時間点に設定し、おのおのの基準点について前記のデータベースを参照して基本周波数を設定し、それらの基準点間の基本周波数については実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 The fourth method uses a phoneme time length standardized fundamental frequency database in which a fundamental frequency pattern standardized by the time length of the vowel of the mora or the vowel equivalent is stored for each mora position of the accent phrase, and the basic frequency of the accent phrase is used. The mora is the rising reference point that gives the maximum value of the, the falling reference point that gives the fall of the accent, the accent reference point that gives the fundamental frequency at the end of the accent phrase, and the ending reference point that gives the fundamental frequency at the end of the utterance Is set to a time point that is a fixed ratio with respect to the vowel length of each, and the fundamental frequency is set with reference to the database for each reference point, and the fundamental frequency between these reference points is a function on the real time axis. This is a basic frequency pattern generation method for generating a basic frequency pattern by performing interpolation using

又、第５の方法は、アクセント句のモーラ位置毎に当該モーラの音素時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースと、音素あるいは音韻列ごとの基本周波数を音素時間長で標準化した値と基本周波数パタンとの差を記憶したマイクロプロソディデータベースとを用い、音素時間長標準化基本周波数データベースから取得された基本周波数パタンにマイクロプロソディデータを加算あるいは減算することにより基本周波数パタンを生成する基本周波数パタン生成方法である。 The fifth method includes a phoneme time length standardized fundamental frequency database storing a fundamental frequency pattern standardized by a phoneme time length of the mora for each mora position of an accent phrase, and a fundamental frequency for each phoneme or phoneme string. The basic frequency pattern is obtained by adding or subtracting the micro-prosody data to the basic frequency pattern obtained from the phoneme time length standardized basic frequency database, using a micro-procedure database that stores the difference between the standardized frequency value and the basic frequency pattern. Is a fundamental frequency pattern generation method for generating.

又、第６の方法は、アクセント句のモーラ位置毎に当該モーラの音素時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句ごとの基本周波数パタンを生成する基本周波数パタン生成方法において、基本種端数を生成しようとするアクセント句のモーラ数およびアクセント型に該当する基本周波数パタンが音素時間長標準化基本周波数データベース内にない場合、データベース内の基本周波数パタンを利用し、基本周波数を生成しようとするアクセント句をｎモーラｍ型、データベースから取得した基本周波数パタンをｌモーラｊ型、取得した基本周波数パタンの最大値を含むモーラの位置をｉ、取得した基本周波数パタンのアクセント句末尾のモーラ数をｋとするとき、ｍ≦ｉ＋１のとき第１から第ｍ＋１モーラまではデータベースより取得した基本周波数パタンの第１から第ｊ＋１モーラまでを適用し、第ｎ−ｋ＋１から第ｎモーラまではデータベースより取得した基本周波数パタンの第ｌ―ｋ＋１から第ｌモーラを適用し、その間のモーラについては実時間軸上で補間することにより基本周波数パタンを生成する。またｉ＋１＜ｍ≦ｎ−ｋ＋１のとき第１から第ｉモーラまではデータベースより取得した基本周波数パタンの第１から第ｉモーラまでを適用し、第ｍ、第ｍ＋１モーラにはデータベースより取得した基本周波数パタンの第ｊ、第ｊ＋１モーラを適用し、第ｎ−ｋ＋１から第ｎモーラまではデータベースより取得した基本周波数パタンの第ｌ―ｋ＋１から第ｌモーラまでを適用し、その間のモーラについては実時間軸上で補間することにより基本周波数パタンを生成する。またｍ＞ｎ−ｋ＋１のとき第１から第ｉモーラまではデータベースより取得した基本周波数パタンの第１から第ｉモーラまでを適用し、第ｍから第ｎモーラまではデータベースより取得した基本周波数パタンの第ｊモーラから第ｌモーラまでを適用し、その間のモーラについては実時間軸上で補間することにより基本周波数パタンを生成する基本周波数生成方法である。 The sixth method generates a basic frequency pattern for each accent phrase using a phoneme time length standardized fundamental frequency database that stores a basic frequency pattern standardized by the phoneme time length of the mora for each mora position of the accent phrase. In the fundamental frequency pattern generation method, if the fundamental frequency pattern corresponding to the accent phrase mora number and accent type to be generated is not in the phoneme time length standardized fundamental frequency database, the fundamental frequency pattern in the database is used. The accent phrase for generating the fundamental frequency is an n-mora m type, the fundamental frequency pattern acquired from the database is the l-mora j type, the position of the mora including the maximum value of the acquired fundamental frequency pattern is i, and the acquired fundamental frequency When the number of mora at the end of the pattern accent phrase is k, m ≦ i + 1 When the first to m + 1th mora, the first to j + 1th mora of the fundamental frequency pattern obtained from the database is applied, and from the (n−k + 1) th to the nth mora, the l−k + 1th of the fundamental frequency pattern obtained from the database. The fundamental frequency pattern is generated by interpolating on the real time axis with respect to the mora in between. When i + 1 <m ≦ n−k + 1, the first to i-th mora of the fundamental frequency pattern acquired from the database is applied to the first to i-th mora, and the basic acquired from the database to the m-th and m + 1-th mora. The jth and j + 1th mora of the frequency pattern is applied, the lk + 1 to the lth mora of the fundamental frequency patterns obtained from the database are applied from the n−k + 1 to the nth mora, and the mora between them is A fundamental frequency pattern is generated by interpolation on the time axis. When m> n−k + 1, the first to i-th mora of the fundamental frequency pattern acquired from the database is applied to the first to i-th mora, and the basic frequency pattern acquired from the database to the m-th to n-th mora. This is a fundamental frequency generation method in which the fundamental frequency pattern is generated by interpolating on the real time axis for the mora between the jth mora and the lth mora.

又、第７の方法は、アクセント句の基本周波数パタンをフレーズのアクセント句の位置および文末であるか否かによって分類した基本周波数データベースを用いて基本周波数パタンを生成する基本周波数生成方法である。 The seventh method is a basic frequency generation method for generating a basic frequency pattern using a basic frequency database in which the basic frequency pattern of an accent phrase is classified according to the position of the accent phrase of the phrase and whether it is the end of the sentence.

又、第８の方法は、アクセント句の基本周波数パタンを記憶した基本周波数データベースと、フレーズのアクセント句の位置および文末であるか否かによって、基本周波数パタンの変形量を記憶した変形データベースを用い、基本周波数データより取得した基本周波数パタンを変形データベースより取得した変形量に従って変形し基本周波数パタンを生成する基本周波数パタン生成方法である。 The eighth method uses a basic frequency database storing the basic frequency pattern of the accent phrase and a deformation database storing the deformation amount of the basic frequency pattern depending on whether the phrase is at the position of the accent phrase and at the end of the sentence. This is a fundamental frequency pattern generation method for generating a fundamental frequency pattern by deforming a fundamental frequency pattern obtained from fundamental frequency data according to the deformation amount obtained from the deformation database.

又、第９の方法は、アクセント句の基本周波数パタンを記憶した基本周波数データベースを用い、基本周波数データより取得した基本周波数パタンをフレーズ内でのアクセント句の位置ｉの関数により基本周波数パタンを変形する基本周波数パタン生成方法である。 The ninth method uses a fundamental frequency database that stores the fundamental frequency pattern of an accent phrase, and transforms the fundamental frequency pattern obtained from the fundamental frequency data by a function of the accent phrase position i in the phrase. This is a basic frequency pattern generation method.

又、第１０の方法は、アクセント句の基本周波数パタンを記憶した基本周波数データベースを用い、基本周波数データより取得した基本周波数パタンを基本周波数パタンを決定する基準になるモーラに対してそのモーラのフレーズ内での位置ｊの関数により基本周波数パタンを変形する基本周波数パタン生成方法である。 The tenth method uses a fundamental frequency database storing the fundamental frequency pattern of an accent phrase, and uses the fundamental frequency pattern obtained from the fundamental frequency data as a reference for determining the fundamental frequency pattern. This is a fundamental frequency pattern generation method in which the fundamental frequency pattern is deformed by a function of the position j within.

又、第１１の方法は、アクセント句ごとに基本周波数パタンを生成し、当該アクセント句のアクセント末尾、および終了点の周波数と次のアクセント句の開始点の周波数の差があらかじめ定められた値以下になるよう当該アクセント句のアクセントの立ち下がり、アクセント末尾および終了点の特性を変更する基本周波数パタン生成方法である。 In the eleventh method, a basic frequency pattern is generated for each accent phrase, and the difference between the frequency of the accent end and end point of the accent phrase and the frequency of the start point of the next accent phrase is less than a predetermined value. The basic frequency pattern generation method changes the characteristics of the accent fall, accent end and end point of the accent phrase.

以上説明したように、本発明によれば、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングと角度を当該モーラの母音長で標準化した基本周波数パタンを当てはめることにより、モーラ内での基本周波数の変動を詳細に再現し、高い自然性を実現するとともに、データベースのパタンを当てはめない実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。あるいはアクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの母音長で標準化した時間軸上で設定することにより、モーラ内での基本周波数の変動のタイミングを詳細に再現し、立ち上がり、立ち下がりの角度については実時間軸上の関数を用いることによって、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、モーラ単位で制御する際の不連続感をなくし、高い自然性を実現する。さらに補間を用いることにより基本周波数パタンデータベースもより小さくすることができ、その実用的効果は大きい。 As described above, according to the present invention, the fundamental frequency pattern in the mora is applied by applying the fundamental frequency pattern in which the timing and angle of the accent phrase rise and the accent nucleus fall are standardized by the vowel length of the mora. The basic frequency pattern database eliminates the discontinuity when controlling in mora units by interpolating on the real-time axis that does not fit the database pattern, while reproducing the fluctuations in detail in detail and realizing high naturalness. Can be made smaller. Alternatively, by setting the timing of accent phrase rise and fall at the accent core on the time axis standardized by the vowel length of the mora, the timing of fluctuation of the fundamental frequency within the mora is reproduced in detail, By using a function on the real time axis for the fall angle, a stable basic frequency pattern with stable rise and fall can be obtained without being affected by the difference in time length due to phonemes. This eliminates the discontinuity when controlling with, and realizes high naturalness. Furthermore, the fundamental frequency pattern database can be made smaller by using interpolation, and its practical effect is great.

本発明に係る基本周波数パタン生成方法等は、従来に比べてより一層自然性の高い基本周波数パタンを生成出来るという長所を有し、基本周波数パタン生成方法等として有用である。 The fundamental frequency pattern generation method and the like according to the present invention have the advantage of being able to generate a fundamental frequency pattern with higher naturalness than conventional methods, and are useful as a fundamental frequency pattern generation method and the like.

本発明及び／又は本発明に関連する他の発明（以下、本発明等という）による基本周波数生成装置の機能ブロック図Functional block diagram of a fundamental frequency generator according to the present invention and / or another invention related to the present invention (hereinafter referred to as the present invention).本発明等の実施の形態１により生成される基本周波数パタンの１例を示す図The figure which shows an example of the fundamental frequency pattern produced | generated byEmbodiment 1 of this invention etc.本発明等の実施の形態２により生成される基本周波数パタンの１例を示す図The figure which shows an example of the fundamental frequency pattern produced | generated byEmbodiment 2 of this invention etc.本発明等の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing an embodiment of the present invention and the like本発明による基本周波数パタンの一例を示す図The figure which shows an example of the fundamental frequency pattern by this invention本発明による基本周波数パタンの一例を示す図The figure which shows an example of the fundamental frequency pattern by this invention本発明等の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing an embodiment of the present invention and the likeマイクロプロソディデータベース２５０に記憶されているマイクロプロソディ成分の模式図Schematic diagram of the microprosody components stored in the microprosody database 250（Ａ）：実施の形態５の基本周波数データベースより生成される基本周波数パタンを示す図（Ｂ）：同実施の形態のマイクロプロソディデータベースより取得したマイクロプロソディ成分を示す図（Ｃ）：図９（Ａ）のパタンに図９（Ｂ）のパタンを加算して生成した基本周波数パタンを示す図(A): A diagram showing a fundamental frequency pattern generated from the fundamental frequency database of the fifth embodiment (B): A diagram showing a microprosody component acquired from the microprosody database of the same embodiment (C): FIG. The figure which shows the fundamental frequency pattern produced | generated by adding the pattern of FIG. 9 (B) to the pattern of A)本発明の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing an embodiment of the present invention（Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): diagrams showing an example of a fundamental frequency pattern according to the present invention（Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): diagrams showing an example of a fundamental frequency pattern according to the present invention（Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): diagrams showing an example of a fundamental frequency pattern according to the present invention（Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): diagrams showing an example of a fundamental frequency pattern according to the present invention本発明の基本周波数パタンの模式図Schematic diagram of the fundamental frequency pattern of the present invention本発明等の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing an embodiment of the present invention and the like本発明等の一実施の形態の基本周波数パタンの模式図Schematic diagram of fundamental frequency pattern of one embodiment of the present invention, etc.本発明等の変形例の基本周波数パタンの模式図Schematic diagram of basic frequency pattern of modification of the present invention etc.本発明等の基本周波数パタンの模式図Schematic diagram of the basic frequency pattern of the present invention（Ａ）、（Ｂ）：本発明等の基本周波数パタンのアクセント句接続部の模式図(A), (B): Schematic diagram of accent phrase connection part of fundamental frequency pattern of the present invention, etc.

符号の説明Explanation of symbols

１０文字列入力部
２０文字列解析部
３０音韻時間長データベース
４０時間長設定部
５０モーラ時間長標準化基本周波数データベース
６０基本周波数パタン生成部
７０声帯振動生成部
１５０、１５０ａ,１５０ｂ母音時間長標準化基本周波数データベース
２５０マイクロプロソディデータベース
３５０基本周波数パタン変形データベース
４５０アクセント句位置基本周波数データベースDESCRIPTION OFSYMBOLS 10 Characterstring input part 20 Characterstring analysis part 30 Phonologicaltime length database 40 Time length setting part 50 Mora time length standardizationfundamental frequency database 60 Fundamental frequencypattern generation part 70 Vocal cordvibration generation part 150, 150a, 150b Vowel sound length standardizationfundamental frequency Database 250 Micro-Prosody database 350 Fundamental frequencypattern transformation database 450 Accent phrase position fundamental frequency database

Claims

Translated fromJapanese

前記補間は、実時間上の関数で補間することである、請求項１に記載の基本周波数パタン生成装置。 The fundamental frequency pattern generation device according to claim 1, wherein the interpolation is performed by a function in real time.

前記補間は、実時間上の直線で補間することである、請求項１に記載の基本周波数パタン生成装置。The fundamental frequency pattern generationapparatus accordingto claim 1, wherein the interpolationis performedbya straight lineinreal time.

請求項４に記載の基本周波数パタン生成方法の前記文字列解析工程と、前記記憶工程と、前記基本周波数生成工程とをコンピュータにより実行させるためのプログラムを記録した、コンピュータにより処理可能なプログラム記録媒体。A computer-processable program recording medium recordinga programfor causing the computer to execute the character string analysis step, the storage step, and the fundamental frequency generation step of the fundamental frequency pattern generation method according to claim 4 .