JP3393532B2

Movatterモバイル変換

Info

Publication number: JP3393532B2
Application number: JP06039197A
Authority: JP
Inventors: 仁一村上; 博和鈴木
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 1997-03-14
Filing date: 1997-03-14
Publication date: 2003-04-07
Anticipated expiration: 2017-03-14
Also published as: JPH10254493A

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は録音音声の音量正
規化方法およびこの方法を実施する装置に関し、特に、
被調整録音音声の各母音の音量基準となる音量標準値を
設定して被調整録音音声の各音声ファイル毎に各母音フ
レームの音量の平均値を音量標準値に等しくする制御を
実施する録音音声の音量正規化方法およびこの方法を実
施する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for normalizing the volume of recorded voice and an apparatus for implementing this method, and more particularly,
Set the volume standard value that is the volume reference of each vowel of the adjusted recorded voice, and perform control to make the average value of the volume of each vowel frame equal to the standard volume value for each audio file of the adjusted recorded voice. Volume normalization method and apparatus for implementing this method.

【０００２】[0002]

【従来の技術】音声を使用して電話利用者に各種の電話
サービスを提供するには、大量の音声を録音蓄積してお
く必要がある。この場合、録音蓄積されている録音音声
の音量は常に一定であることを要請される。ところが、
これらの音声を録音蓄積するに際して、話者の発声の条
件状況、録音環境を常に一定に設定することはできない
ので、録音を常に一定のレベルで実施することは困難で
あり、実際に録音される音声の音量は相違するに到る。
即ち、録音単位である音声ファイルが異なると、録音音
声の音量が相違することが多く発生する。例えば、「ご
利用有り難うございます。」と録音した音声ファイル
と、「有り難うございました。」と録音した音声ファイ
ルとは、録音音声の音量は相違するに到る。2. Description of the Related Art In order to provide various telephone services to telephone users by using voice, it is necessary to record and store a large amount of voice. In this case, it is required that the volume of the recorded voice that is recorded and accumulated is always constant. However,
When recording and accumulating these voices, it is difficult to always record at a constant level because it is not possible to set the conditions of the speaker's utterance and the recording environment to be constant at all times. The sound volume is different.
That is, when the sound file as the recording unit is different, the volume of the recorded sound often differs. For example, the volume of the recorded voice is different between the voice file recorded as "Thank you for using." And the voice file recorded as "Thank you."

【０００３】従来、録音音声の音量を一定にするに、録
音音声の音声波形を観察しながら人手により音量レベル
の調節をし、この一定レベルの音量を録音していた。Conventionally, in order to keep the volume of the recorded voice constant, the volume level was manually adjusted while observing the voice waveform of the recorded voice, and this fixed level of volume was recorded.

【０００４】[0004]

【発明が解決しようとする課題】以上の通りに録音音声
の音声波形を観察しながら人手により音量レベルの調節
をしてこの一定レベルの音量を録音するという音声波形
録音処理は、大変な労力および時間を必要とするもので
あり、録音処理能率の観点から大量の音声を録音処理す
るには不適切である。その上に、人手による音声波形録
音処理であるところから、録音音声の音量を常に一定に
することは困難であり、録音される音声の音量が不揃い
になる恐れがある。As described above, the voice waveform recording process of manually adjusting the volume level and recording the volume of the constant level while observing the voice waveform of the recorded voice requires a great deal of labor and labor. It requires time and is unsuitable for recording a large amount of sound from the viewpoint of recording processing efficiency. In addition, since it is a manual voice waveform recording process, it is difficult to always keep the volume of the recorded voice constant, and the volume of the recorded voice may be uneven.

【０００５】この発明は、被調整録音音声の各母音の音
量基準となる音量標準値を設定して被調整録音音声の各
音声ファイル毎に各母音フレームの音量の平均値を音量
標準値に等しくする制御を実施して人手によらず録音音
量を一定にする録音音声の音量正規化方法およびこの方
法を実施する装置を提供するものである。The present invention sets a volume standard value as a volume reference for each vowel of a recording sound to be adjusted, and makes the average value of the volume of each vowel frame equal to the volume standard value for each audio file of the recording sound to be adjusted. The present invention provides a method for normalizing the volume of a recorded voice so as to make the recording volume constant regardless of human hands by performing such control, and an apparatus for implementing this method.

【０００６】[0006]

【課題を解決するための手段】[Means for Solving the Problems]

請求項１：被調整録音音声の音量調整の基準となる各母
音の音量標準値を設定し、録音音量を調整されるべき被
調整録音音声の各母音の音量をそれぞれの音量標準値に
等しくする制御を実施する録音音声の音量正規化方法を
構成した。そして、請求項２：請求項１に記載される録音音声音量
正規化方法において、被調整録音音声の各母音の音量を
それぞれの音量標準値に等しくする制御は録音単位であ
る音声ファイル毎に実施する録音音声の音量正規化方法
を構成した。Claim 1: The volume standard value of each vowel that is the reference for the volume adjustment of the adjusted recording voice is set, and the volume of each vowel of the adjusted recording voice whose recording volume is to be adjusted is made equal to the respective volume standard value. A method for normalizing the volume of recorded voice for controlling is constructed. According to a second aspect of the present invention, in the sound volume normalizing method according to the first aspect, the control for equalizing the volume of each vowel of the recording sound to be adjusted to the respective volume standard value is performed for each sound file as a recording unit. Configured the volume normalization method for recorded voice.

【０００７】また、請求項３：請求項２に記載される録
音音声の音量正規化方法において、被調整録音音声の母
音フレーム毎に音量標準値に基づいて、音量正規化数を
算出し、被調整録音音声の各音声ファイル毎に音量正規
化数の平均値を算出し、音量正規化数の平均値をファイ
ル音量正規化数として記憶格納し、ファイル音量正規化
数に基づいて音声ファイル毎の被調整録音音声の音量正
規化を行う録音音声の音量正規化方法を構成した。According to a third aspect of the present invention, in the sound volume normalizing method according to the second aspect, the sound volume normalizing number is calculated based on the sound volume standard value for each vowel frame of the adjusted sound to be adjusted. The average value of the volume normalization number is calculated for each audio file of the adjusted recording sound, and the average value of the volume normalization number is stored and stored as the file volume normalization number. The volume normalization method of the recorded voice for adjusting the volume of the recorded voice to be adjusted was configured.

【０００８】更に、請求項４：請求項１ないし請求項３
の内の何れかに記載される録音音声の音量正規化方法に
おいて、予め音素ラベリングされた録音音声サンプルを
フレームに区分し、各フレーム毎に音素認識を行い、母
音フレームの音量を検出し、各母音毎に母音フレームの
音量の平均値を算出し、この各母音毎の音量の平均値を
音量標準値として設定する録音音声の音量正規化方法を
構成した。Further, claim 4: claim 1 to claim 3
In the sound volume normalization method described in any one of, in the recorded sound sample pre-phoneme labeling is divided into frames, phoneme recognition is performed for each frame, the volume of the vowel frame is detected, The volume normalization method of the recorded voice is constructed by calculating the average value of the volume of the vowel frame for each vowel and setting the average value of the volume of each vowel as the standard volume value.

【０００９】また、請求項５：請求項１ないし請求項３
の内の何れかに記載される録音音声の音量正規化方法に
おいて、被調整録音音声をフレームに区分し、各フレー
ム毎に音素認識を行い、各母音フレームの音量を検出し
て記憶格納し、各母音毎に母音フレームの音量の平均値
を算出し、この各母音毎の平均値を音量標準値として設
定する録音音声の音量正規化方法を構成した。Claim 5: Claims 1 to 3
In the sound volume normalizing method described in any of the above, the adjusted sound recording is divided into frames, phoneme recognition is performed for each frame, and the volume of each vowel frame is detected and stored. A volume normalization method for recorded voice is constructed in which an average value of the volume of vowel frames is calculated for each vowel and the average value for each vowel is set as a standard volume value.

【００１０】そして、請求項６：請求項１ないし請求項
３の内の何れかに記載される録音音声の音量正規化方法
において、音量標準値を設定するに際して各母音の既知
の音量標準値を入力する録音音声の音量正規化方法を構
成した。また、請求項７：請求項３に記載される録音音声の音量
正規化方法において、被調整録音音声をフレームに区分
し、各フレーム毎に音素認識を行い、各母音フレームの
音量を検出し、母音フレームと音量標準値に基づいて母
音フレーム毎の音量正規化数を算出する録音音声の音量
正規化方法を構成した。According to a sixth aspect of the present invention, in the sound volume normalizing method according to any one of the first to third aspects, when setting the volume standard value, the known volume standard value of each vowel is set. Configured the volume normalization method for input recording voice. Claim 7: In the sound volume normalizing method according to claim 3, the adjusted sound recording is divided into frames, phoneme recognition is performed for each frame, and the sound volume of each vowel frame is detected. We constructed a volume normalization method for recorded voice that calculates the volume normalization number for each vowel frame based on the vowel frame and the volume standard value.

【００１１】更に、請求項８：請求項３ないし請求項５
の内の何れかに記載される録音音声の音量正規化方法に
おいて、各母音フレームの正規化数の内から所定範囲内
の正規化数を抽出してこれらの平均値をファイル音量正
規化数として設定する録音音声の音量正規化方法を構成
した。ここで、請求項９：被調整録音音声の音量調整の基準と
なる各母音の音量標準値を設定する母音音量標準値設定
部１を具備し、被調整録音音声の母音フレーム毎に音量
標準値に基づいて音量正規化数を算出する母音フレーム
音量正規化数算出部２を具備し、被調整録音音声の音声
ファイル毎に音量正規化数の平均値を算出する正規化数
平均算出部４を具備し、音量正規化数の平均値をファイ
ル音量正規化数として記憶格納するファイル音量正規化
数格納部５を具備し、ファイル音量正規化数に基づいて
音声ファイル毎の被調整録音音声の音量正規化を行う音
量正規化制御部６を具備する録音音声の音量正規化装置
を構成した。Further, claim 8: claim 3 to claim 5
In the sound volume normalization method described in any of the above, the normalization number within a predetermined range is extracted from the normalization numbers of each vowel frame, and the average value of these is used as the file volume normalization number. Configured the method for normalizing the volume of recorded voice to be set. Here, claim 9: comprises a vowel volume standard value setting unit 1 for setting a volume standard value of each vowel which is a reference for volume adjustment of the adjusted recording voice, and a volume standard value for each vowel frame of the adjusted recording voice. A normalization number average calculation unit 4 for calculating an average value of the volume normalization numbers for each audio file of the recording sound to be adjusted. A file volume normalization number storage unit 5 is provided, which stores and stores the average value of the volume normalization numbers as a file volume normalization number, and the volume of the adjusted recording sound for each audio file is based on the file volume normalization number. A sound volume normalizing device for a recorded voice having a sound volume normalizing control unit 6 for normalizing is configured.

【００１２】そして、請求項１０：請求項９に記載され
る録音音声の音量正規化装置において、所定範囲内のフ
レーム音量正規化数を抽出する閾値内正規化数抽出部３
を具備し、抽出結果を正規化数平均算出部４に出力する
録音音声の音量正規化装置を構成した。また、請求項１１：請求項９および請求項１０の内の何
れかに記載される録音音声の音量正規化装置において、
母音音量標準値設定部１は予め音素ラベリングされた録
音音声サンプルをフレームに区分し、各フレーム毎に音
素認識を行い、母音フレームの音量を検出するサンプル
音声分析部１ａおよび各母音毎に母音フレームの音量の
平均値を算出する音量標準値算出部１ｂより成り、母音
フレーム音量正規化数算出部２は、被調整録音音声をフ
レームに区分し、各フレーム毎に音素認識を行い、各母
音フレームの音量を検出する録音音声分析部２ａおよび
母音フレームの音量と音量標準値に基づいて母音フレー
ム毎の音量正規化数を算出する正規化数算出部２ｂより
成る録音音声の音量正規化装置を構成した。According to a tenth aspect of the present invention, in the sound volume normalizing apparatus according to the ninth aspect, the within-threshold normalization number extraction unit 3 for extracting the frame sound volume normalization number within a predetermined range.
The sound volume normalizing device for outputting the extraction result to the normalization number average calculating unit 4 is configured. Further, in claim 11: the sound volume normalizing device for recorded sound according to any one of claims 9 and 10,
The vowel volume standard value setting unit 1 divides a recorded voice sample in which phonemes are labeled in advance into frames, performs phoneme recognition for each frame, and detects a voice volume of the vowel frame, and a vowel frame for each vowel. The vowel frame volume normalization number calculation unit 2 calculates the average value of the volume of the vowel frame, and the vowel frame volume normalization number calculation unit 2 divides the adjusted recording voice into frames, performs phoneme recognition for each frame, and detects each vowel frame. And a recorded voice analysis unit 2a for detecting the volume of the recorded voice and a normalization number calculation unit 2b for calculating the volume normalization number for each vowel frame based on the volume of the vowel frame and the volume standard value. did.

【００１３】更に、請求項１２：請求項９および請求項
１０の内の何れかに記載される録音音声の音量正規化装
置において、母音音量標準値設定部１１は、被調整録音
音声をフレームに区分し、各フレーム毎に音素認識を行
い、各母音フレームの音量を検出する録音音声分析部１
１ｃと、各母音フレームの音量を記憶格納する母音フレ
ーム音量格納部１１ｄと、各母音毎に母音フレームの音
量の平均値を算出する音量標準値算出部１１ｅより成る
録音音声の音量正規化装置を構成した。Further, in the recorded sound volume normalizing apparatus according to any one of claims 9 and 10, the vowel sound volume standard value setting unit 11 sets the adjusted recorded sound as a frame. A recorded voice analysis unit 1 that classifies and performs phoneme recognition for each frame and detects the volume of each vowel frame.
1c, a vowel frame volume storage unit 11d for storing and storing the volume of each vowel frame, and a volume normalizing device for a recorded voice including a volume standard value calculation unit 11e for calculating the average value of the volume of the vowel frame for each vowel. Configured.

【００１４】また、請求項１３：請求項９および請求項
１０の内の何れかに記載される録音音声の音量正規化装
置において、母音音量標準値設定部１１は、各母音の既
知の音量標準値を入力する入力部を有するものである録
音音声の音量正規化装置を構成した。Further, in the sound volume normalizing apparatus according to any one of claims 13 and 10, the vowel sound volume standard value setting unit 11 includes a known sound volume standard for each vowel. A sound volume normalizing device for a recorded voice having an input unit for inputting a value was constructed.

【００１５】[0015]

【発明の実施の形態】この発明は、録音音声の音量を決
定する要因である「ａ」、「ｉ」、「ｕ」、「ｅ」、
「ｏ」の各母音に着目し、これら母音それぞれの音量の
音量標準値を予め設定しておき、録音音量を調整される
べき被調整録音音声の各母音の音量を、それぞれの音量
標準値に等しくする制御を実施するものである。これを
録音単位である音声ファイル毎に実施する。ここにおけ
る音声ファイルとは、録音音声の収録単位をいう。一般
に、音声ファイルを一区切りとして録音音声の音量が不
揃いとなっており、この発明は、この音声ファイル毎に
音量調整を実施して被調整録音音声全体の音量の調整を
行うものである。BEST MODE FOR CARRYING OUT THE INVENTION In the present invention, "a", "i", "u", "e", which are factors that determine the volume of recorded voice,
Focusing on each vowel of "o", the volume standard value of each volume of these vowels is set in advance, and the volume of each vowel of the adjusted recording sound whose recording volume is to be adjusted is set to each volume standard value. The control for equalization is performed. This is performed for each audio file which is a recording unit. The audio file here means a recording unit of recorded audio. Generally, the sound volume of recorded voices is not uniform with each voice file as a segment, and the present invention adjusts the volume of each voice file to adjust the volume of the entire recorded voice to be adjusted.

【００１６】この発明の録音音声の音量正規化方法は、
先ず、被調整録音音声の音量調整の基準となる各母音の
音量標準値を設定し、被調整録音音声の母音フレーム毎
に音量標準値に基づいて音量正規化数を算出し、被調整
録音音声の各音声ファイル毎に音量正規化数の平均値を
算出し、平均値をファイル音量正規化数として記憶格納
し、ファイル音量正規化数に基づいて音声ファイル毎の
録音音声の音量正規化を行うものである。The sound volume normalizing method of the present invention is as follows:
First, the volume standard value of each vowel that serves as a reference for the volume adjustment of the adjusted recording voice is set, and the volume normalization number is calculated based on the volume standard value for each vowel frame of the adjusted recording voice. Calculate the average value of the volume normalization number for each audio file, store the average value as a file volume normalization number, and normalize the volume of the recorded audio for each audio file based on the file volume normalization number. It is a thing.

【００１７】そして、この発明による録音音声の音量正
規化装置は、録音音声の音量を決定する要因となる
「ａ」、「ｉ」、「ｕ」、「ｅ」、「ｏ」の各母音の音
量の標準値である音量標準値を母音音量標準値設定部に
おいて設定する。次に、音量正規化数算出部において、
音量調整されるべき被調整録音音声の各母音フレーム毎
に、音量を一定にする音量正規化数を算出する。The sound volume normalizing apparatus for recorded voice according to the present invention is capable of determining the volume of the recorded voice by the vowels "a", "i", "u", "e" and "o". A vowel volume standard value setting unit sets a volume standard value, which is a standard value of volume. Next, in the volume normalization number calculation unit,
A volume normalization number that keeps the volume constant is calculated for each vowel frame of the adjusted recorded voice whose volume is to be adjusted.

【００１８】正規化数平均算出部において、録音単位で
ある音声ファイル毎に、音量正規化数の平均値を算出
し、これをファイル音量正規化数としてファイル音量正
規化数格納部に記憶格納する。最後に、音量正規化制御
部は、音声ファイル毎のファイル音量正規化数に基づい
て音声ファイル毎に被調整録音音声の音量を調整して音
量正規化を行う。In the normalization number average calculation unit, the average value of the volume normalization numbers is calculated for each sound file which is a recording unit, and this is stored in the file volume normalization number storage unit as a file volume normalization number. . Finally, the volume normalization control unit performs volume normalization by adjusting the volume of the adjusted recording voice for each audio file based on the file volume normalization number for each audio file.

【００１９】[0019]

【実施例】この発明の実施の形態を図１の実施例を参照
して具体的に説明する。図１はこの発明の録音音声の音
量正規化方法を説明する図である。この発明の録音音声
の音量正規化方法は、（ステップ１）被調整録音音声の音量調整の基準となる
各母音の音量標準値を設定し、（ステップ２）被調整録音音声の母音フレーム毎に音量
標準値に基づいて音量正規化数を算出し、（ステップ３）被調整録音音声の各音声ファイル毎に音
量正規化数の平均値を算出し、（ステップ４）平均値をファイル音量正規化数として記
憶格納し、（ステップ５）ファイル音量正規化数に基づいて音声フ
ァイル毎の録音音声の音量正規化を行うものである。Embodiments of the present invention will be specifically described with reference to the embodiment of FIG. FIG. 1 is a diagram for explaining a method for normalizing the volume of recorded voice according to the present invention. According to the sound volume normalizing method of the present invention, (step 1) a volume standard value of each vowel which is a reference for volume adjustment of the adjusted sound recording is set, and (step 2) each vowel frame of the adjusted sound recording is adjusted. The volume normalization number is calculated based on the volume standard value, (step 3) the average value of the volume normalization number is calculated for each audio file of the adjusted recording sound, and (step 4) the average value is the file volume normalization. It is stored and stored as a number, and (step 5) the volume of the recorded voice for each voice file is normalized based on the file volume normalization number.

【００２０】この発明の第１の実施例を図３を参照して
更に具体的に説明する。図３において、この発明の録音
音声の音量正規化装置の第１の実施例はサンプル音声分
析部１ａと音量標準値算出部１ｂより成る母音音量標準
値設定部１と、録音音声分析部２ａと正規化数算出部２
ｂより成る母音フレーム音量正規化数算出部２と、閾値
内正規化数抽出部３と、正規化数平均算出部４と、ファ
イル音量正規化数格納部５と、音量正規化制御部６とに
より構成されている。The first embodiment of the present invention will be described more specifically with reference to FIG. Referring to FIG. 3, the first embodiment of the recorded sound volume normalizing apparatus of the present invention comprises a vowel sound volume standard value setting unit 1 including a sample sound analysis unit 1a and a sound volume standard value calculation unit 1b, and a recorded sound analysis unit 2a. Normalization number calculation unit 2
a vowel frame volume normalization number calculation unit 2 including b, a threshold normalization number extraction unit 3, a normalization number average calculation unit 4, a file volume normalization number storage unit 5, and a volume normalization control unit 6. It is composed by.

【００２１】母音音量標準値設定部１は、音量調整しよ
うとする被調整録音音声の母音音量の基準となる音量標
準値を母音毎に設定する。この母音音量標準値設定部１
のサンプル音声分析部１ａは、音素の並びが予め明確に
音素ラベリングされた録音音声サンプルを時間フレーム
に区分して、母音音素が含まれている母音フレームを抽
出する。The vowel sound volume standard value setting unit 1 sets, for each vowel sound, a sound volume standard value that serves as a reference for the vowel sound volume of the adjusted recording voice whose sound volume is to be adjusted. This vowel volume standard value setting unit 1
The sample speech analysis unit 1a of section 1 divides a recorded speech sample in which phoneme sequences are clearly phoneme-labeled beforehand into time frames, and extracts vowel frames containing vowel phonemes.

【００２２】母音音量標準値設定部１の音量標準値算出
部１ｂは、録音音声サンプルから抽出された母音フレー
ムの音量に基づいて各母音「ａ」、「ｉ」、「ｕ」、
「ｅ」および「ｏ」毎に母音フレームの音量の平均値を
算出する。そして、これら各母音の母音フレームの音量
の平均値を、各母音の音量標準値として設定する。母音
フレーム音量正規化数算出部２は、母音音量標準値設定
部１から出力される各母音の音量標準値に基づいて、被
調整録音音声の録音単位である音声ファイル毎に各母音
フレームの音量を正規化する正規化数を算出する。The volume standard value calculation unit 1b of the vowel volume standard value setting unit 1 determines each vowel "a", "i", "u", based on the volume of the vowel frame extracted from the recorded voice sample.
The average value of the volume of the vowel frame is calculated for each of "e" and "o". Then, the average value of the volume of the vowel frame of each vowel is set as the volume standard value of each vowel. The vowel frame volume normalization number calculation unit 2 calculates the volume of each vowel frame for each audio file, which is the recording unit of the adjusted recording voice, based on the standard volume value of each vowel output from the vowel volume standard value setting unit 1. A normalization number for normalizing is calculated.

【００２３】先ず、母音フレーム音量正規化数算出部２
の録音音声分析部２ａは、被調整録音音声の録音単位で
ある音声ファイル毎に、一定時間毎のフレームに区分
し、各フレームに含まれる音素を認識し、母音音素を含
むフレームである母音フレームの音量を検出する。次
に、正規化数算出部２ｂは、各母音フレームの音量で当
該母音の音量標準値を割ることにより各母音フレームの
正規化数を算出する。即ち、母音フレームの正規化数は
下記の通りである。First, the vowel frame volume normalization number calculation unit 2
The recorded voice analysis unit 2a divides the voice file, which is a recording unit of the adjusted recording voice, into frames at regular time intervals, recognizes the phonemes contained in each frame, and recognizes the vowel frames that are vowel frames. Detect the volume of. Next, the normalization number calculation unit 2b calculates the normalization number of each vowel frame by dividing the volume standard value of the vowel by the volume of each vowel frame. That is, the normalized number of vowel frames is as follows.

【００２４】母音フレームの正規化数＝（母音の音量標
準値）／（母音フレームの音量）ここで、母音音量標準値設定部１において、母音「ａ」
の音量標準値Ｐ（ａ）を８．０、母音「ｉ」の音量標準
値Ｐ（ｉ）を４．０、母音「ｕ」の音量標準値Ｐ（ｕ）
を６．０、母音「ｅ」の音量標準値Ｐ（ｅ）を１０．
０、母音「ｏ」の音量標準値Ｐ（ｏ）を６．０と設定さ
れたものとする。そして、録音音声分析部２ａにおい
て、「かき」と録音されている音声ファイルがフレーム
に区分され、次いで、音素認識され、更に、各母音フレ
ームの音量が検出されて、以下の結果が得られたものと
する。フレームNo.１２３４５６７８音素｜ｋ｜ａ｜ａ｜ａ｜ｋ｜ｋ｜ｉ｜ｉ｜音量 10.0 20.0 15.0 4.6 1.5この場合、フレーム No.２の音素（ａ）の母音フレームの正規化
数：Ｐ（ａ）／10.0 ＝ 8.0 ／10.0 ＝ 0.8フレーム No.３の音素（ａ）の母音フレームの正規化
数：Ｐ（ａ）／20.0 ＝ 8.0 ／20.0 ＝ 0.4フレーム No.４の音素（ａ）の母音フレームの正規化
数：Ｐ（ａ）／15.0 ＝ 8.0 ／15.0 ＝ 0.53フレーム No.７の音素（ｉ）の母音フレームの正規化
数：Ｐ（ｉ）／10.0 ＝ 4.0 ／ 4.6 ＝ 0.87フレーム No.８の音素（ｉ）の母音フレームの正規化
数：Ｐ（ｉ）／10.0 ＝ 4.0 ／ 1.6 ＝ 2.5という結果が得られる。Normalized number of vowel frames = (vowel volume standard value) / (vowel frame volume) Here, in the vowel volume standard value setting unit 1, the vowel "a"
Sound volume standard value P (a) of 8.0, vowel "i" sound volume standard value P (i) of 4.0, vowel "u" sound volume standard value P (u)
Is 6.0 and the volume standard value P (e) of the vowel "e" is 10.
It is assumed that the volume standard value P (o) of the vowel "o" is set to 6.0. Then, in the recorded voice analysis unit 2a, the voice file recorded as "oyster" is divided into frames, then phoneme recognition is performed, and the volume of each vowel frame is detected, and the following results are obtained. I shall. Frame No. 1 2 3 4 5 6 7 8 Phoneme ｜ k ｜ a ｜ a ｜ a ｜ a ｜ k ｜ k ｜ i ｜ i ｜ Volume 10.0 20.0 15.0 4.6 1.5 In this case, the vowel frame of the phoneme (a) of frame No. 2 Normalization number: P (a) /10.0 = 8.0 / 10.0 = 0.8 Vowel frame normalization number of phoneme (a) of frame No. 3: P (a) /20.0 = 8.0 / 20.0 = 0.4 Frame No. 4 Normalized number of vowel frames of phonemes (a) of P: (a) /15.0 = 8.0 / 15.0 = 0.53 Normalized number of vowel frames of phonemes (i) of frame No. 7: P (i) /10.0 = 4.0 / 4.6 = 0.87 Normalized number of vowel frames of phoneme (i) of frame No. 8: P (i) / 10.0 = 4.0 / 1.6 = 2.5 The result is obtained.

【００２５】閾値内正規化数抽出部３は、母音フレーム
音量正規化数算出部２において算出された各母音フレー
ムの正規化数の内から所定範囲内の正規化数のみ抽出す
る。というのは、所定範囲以下の正規化数の場合は、母
音フレームの音量が大きすぎて録音対象ではない騒音が
録音されたものと考えられ、これとは逆に、所定範囲以
上の正規化数の場合は、母音フレームの音量が小さすぎ
て、これは雑音が録音されたものと考えられるからであ
る。The within-threshold normalized number extraction unit 3 extracts only the normalized number within a predetermined range from the normalized number of each vowel frame calculated by the vowel frame volume normalized number calculation unit 2. This is because if the normalized number is below the predetermined range, it is considered that the volume of the vowel frame was too loud and noise that was not recorded was recorded. In the case of, the volume of the vowel frame is too low, which is considered to be due to noise being recorded.

【００２６】正規化数の範囲を０．５以上、２．０以下
とすると、先の例においては、フレームＮO．３の正規
化数：０．４およびＮO．８の正規化数：２．５は排除
され、フレームＮO．２の各正規化数：０．８、ＮO．４
の正規化数：０．５３、およびＮO．７の正規化数：
０．８７が抽出される。正規化数平均算出部４は、所定
範囲内の正規化数の平均値を算出する。先の例におい
て、正規化数０．８、０．５３、０．８７の平均値は、
０．７３となる。If the range of the normalized number is 0.5 or more and 2.0 or less, in the above example, the frame number NO. Normalized number of 3: 0.4 and NO. Normalized number of 8: 2.5 is eliminated and frame NO. 2, each normalized number: 0.8, NO. Four
Normalized number: 0.53, and NO. Normalized number of 7:
0.87 is extracted. The normalized number average calculation unit 4 calculates an average value of normalized numbers within a predetermined range. In the previous example, the average value of the normalized numbers 0.8, 0.53 and 0.87 is
It becomes 0.73.

【００２７】ファイル音量正規化数格納部５は、所定範
囲内の正規化数の平均値を、音声ファイルの音量を調整
するファイル音量正規化数として設定し、これを記憶格
納する。先の例においては、０．７３を「かき」と録音
されている音声ファイルのファイル音量正規化数として
設定し、記憶格納する。母音フレーム音量正規化数算出
部２と、閾値内正規化数抽出部３と、正規化数平均算出
部４と、ファイル音量正規化数格納部５は、被調整録音
音声の音声ファイル毎に、繰り返して処理を実施し、フ
ァイル音量正規化数格納部５には、音声ファイル毎に音
量を調整するファイル音量正規化数が記憶格納される。The file volume normalization number storage unit 5 sets an average value of the normalization numbers within a predetermined range as a file volume normalization number for adjusting the volume of an audio file, and stores this. In the above example, 0.73 is set as the file volume normalization number of the audio file recorded as "Oyaku", and stored and stored. The vowel frame volume normalization number calculation unit 2, the in-threshold normalization number extraction unit 3, the normalization number average calculation unit 4, and the file volume normalization number storage unit 5 are provided for each audio file of the adjusted recording sound. The process is repeatedly performed, and the file volume normalization number storage unit 5 stores the file volume normalization number for adjusting the volume for each audio file.

【００２８】音量正規化制御部６は、ファイル音量正規
化数格納部５に格納されている音声ファイル毎のファイ
ル音量正規化数に基づいて音声ファイル毎に音量の制御
を実施する。ここで、録音音声の音量正規化装置の第１
の実施例の動作を図４を参照して説明する。The volume normalization control unit 6 controls the volume of each audio file based on the file volume normalization number of each audio file stored in the file volume normalization number storage unit 5. Here, the first of the sound volume normalizing device for recorded voice
The operation of this embodiment will be described with reference to FIG.

【００２９】（ステップ１００）被調整録音音声の音量
調整の基準となる各母音の音量標準値を設定する録音音
声サンプルを準備する。（ステップ１０１）母音音量標準値設定部１のサンプル
音声分析部１ａにおいて、録音音声サンプルを一定時間
毎のフレームに区分し、各フレームに含まれる音素を認
識する。(Step 100) Prepare a recorded voice sample for setting the standard volume value of each vowel which serves as a reference for adjusting the volume of the recorded voice to be adjusted. (Step 101) In the sample voice analysis unit 1a of the vowel sound volume standard value setting unit 1, recorded voice samples are divided into frames at regular time intervals and the phonemes included in each frame are recognized.

【００３０】（ステップ１０２）母音音量標準値設定部
１のサンプル音声分析部１ａにおいて、母音音素を含む
フレームである母音フレームを抽出し、抽出された母音
フレームの音量を測定する。（ステップ１０３）母音音量標準値設定部１の音量標準
値算出部１ｂにおいて、各母音毎の母音フレームの音量
の平均値を算出し、各母音の音量標準値を、母音ａ＝Ｐ
（ａ）、母音ｉ＝Ｐ（ｉ）、母音ｕ＝Ｐ（ｕ）、母音ｅ
＝Ｐ（ｅ）、母音ｏ＝Ｐ（ｏ）とする。(Step 102) The sample voice analysis unit 1a of the vowel volume standard value setting unit 1 extracts a vowel frame which is a frame containing vowel phonemes and measures the volume of the extracted vowel frame. (Step 103) The volume standard value calculation unit 1b of the vowel volume standard value setting unit 1 calculates the average value of the volume of the vowel frame for each vowel and sets the standard volume value of each vowel as vowel a = P.
(A), vowel i = P (i), vowel u = P (u), vowel e
= P (e) and vowel o = P (o).

【００３１】（ステップ１０４）音量調整すべき被調整
録音音声を準備する。（ステップ１０５）母音フレーム音量正規化数算出部２
において、被調整録音音声の録音単位である音声ファイ
ルの最初のものを選択する。図２において、最初の音声
ファイルは「東京」である。(Step 104) Prepare a recording sound to be adjusted whose volume is to be adjusted. (Step 105) Vowel frame volume normalization number calculation unit 2
In step 1, the first audio file of the recording unit of the adjusted recording audio is selected. In FIG. 2, the first audio file is "Tokyo".

【００３２】以降のステップ１０６からステップ１１１
までの処理は、各音声ファイルについて行う。（ステップ１０６）母音フレーム音量正規化数算出部２
の録音音声分析部２ａにおいて、音声ファイルをフレー
ムに区分し、各フレームの音素を認識する。Subsequent steps 106 to 111
The processes up to are performed for each audio file. (Step 106) Vowel frame volume normalization number calculation unit 2
In the recorded voice analysis unit 2a, the voice file is divided into frames and the phonemes of each frame are recognized.

【００３３】（ステップ１０７）更に、録音音声分析部
２ａにおいて母音フレームを抽出し、それぞれの音量を
測定する。（ステップ１０８）正規化数算出部２ｂにおいて、各母
音フレームの音量と母音の音量標準値に基づいて、各母
音フレームの音量を調整する各母音フレーム毎に正規化
数を算出する。(Step 107) Further, the recorded voice analysis unit 2a extracts vowel frames and measures the volume of each. (Step 108) The normalized number calculation unit 2b calculates a normalized number for each vowel frame for adjusting the volume of each vowel frame based on the volume of each vowel frame and the standard value of the volume of the vowel.

【００３４】母音フレームの正規化数＝（母音の音量標
準値）／（母音フレームの音量）（ステップ１０９）閾値内正規化数抽出部３において、
ステップ１０８において算出した母音フレームの正規化
数が所定範囲内にあるものを抽出する。正規化数平均算
出部４において、所定範囲内にある正規化数の平均値を
求め、この平均値を音声ファイルの音量を調整するファ
イル音量正規化数として設定する。Normalized number of vowel frames = (vowel volume standard value) / (vowel frame volume) (step 109)
The normalization number of the vowel frame calculated in step 108 is extracted within a predetermined range. The normalization number average calculation unit 4 obtains an average value of the normalization numbers within a predetermined range, and sets this average value as a file volume normalization number for adjusting the volume of the audio file.

【００３５】（ステップ１１０）ファイル音量正規化数
格納部５において、音声ファイル毎のファイル音量正規
化数を記憶格納する。（ステップ１１１）被調整録音音声の全ての音声ファイ
ルについて、ステップ１０６からステップ１１０に到る
処理が終了したか否かを判断し、ＹＥＳであればステッ
プ１１３へ進み、ＮＯであればステップ１１２へ進む。(Step 110) The file volume normalized number storage unit 5 stores the file volume normalized number for each audio file. (Step 111) It is determined whether or not the processing from Step 106 to Step 110 has been completed for all the audio files of the adjusted recording sound. If YES, the procedure proceeds to Step 113, and if NO, to Step 112. move on.

【００３６】（ステップ１１２）次の音声ファイルを選
択し、ステップ１０６へ戻る。（ステップ１１３）全ての音声ファイルについてのファ
イル音量正規化数が求められ、記憶格納されたので、最
初の音声ファイルを選択して被調整録音音声の音声調整
を最初の音声ファイルから実施する状態に進む。(Step 112) The next audio file is selected, and the process returns to step 106. (Step 113) Since the file volume normalization numbers for all the audio files have been calculated and stored, the first audio file is selected and the audio adjustment of the recording sound to be adjusted is performed from the first audio file. move on.

【００３７】（ステップ１１４）音量正規化制御部６に
おいて、音声ファイル毎に記憶格納されているファイル
音量正規化数に基づいて音声ファイルの音量を制御す
る。即ち、もとの音声ファイルの音量のファイル音量正
規化数倍して音量を調整する。（ステップ１１５）被調整録音音声の全ての音声ファイ
ルについて、ステップ１１４の処理が終了したか否かを
判断し、ＹＥＳであればステップ１１７へ進み、ＮＯで
あればステップ１１６へ進む。(Step 114) The sound volume normalization control unit 6 controls the sound volume of the sound file based on the file sound volume normalization number stored and stored for each sound file. That is, the volume is adjusted by multiplying the volume of the original audio file by the file volume normalized number. (Step 115) It is determined whether or not the processing in step 114 has been completed for all the audio files of the adjusted recording sound. If YES, the process proceeds to step 117, and if NO, the process proceeds to step 116.

【００３８】（ステップ１１６）次の音声ファイルを選
択して、ステップ１１４へ戻る。（ステップ１１７）被調整録音音声の全ての音声ファイ
ルについての音量調整が終了する。以上の第１の実施例
によれば、音声ファイル毎に音量を一定にするファイル
音量正規化数が算出され、このファイル音量正規化数に
基づいて音声ファイル毎に音量の調整が自動的に行われ
るので、人手による音量調整よりも遥かに容易、かつ、
正確に音量の調整を行うことができる。(Step 116) The next audio file is selected and the process returns to step 114. (Step 117) The volume adjustment for all the audio files of the adjusted recording audio is completed. According to the first embodiment described above, the file volume normalization number that makes the volume constant for each audio file is calculated, and the volume is automatically adjusted for each audio file based on this file volume normalization number. Is much easier than manually adjusting the volume, and
The volume can be adjusted accurately.

【００３９】この発明の第２の実施例を図５を参照して
説明する。図５において、図３における参照符号と共通
する参照符号は同一の部材を意味するものとし、その詳
細な説明は省略する。図５において、録音音声の音量正
規化装置の第２の実施例は、母音音量標準値設定部１１
と、音量正規化数算出部１２と、閾値内正規化数抽出部
３と、正規化数平均算出部４と、ファイル音量正規化数
格納部５と音量正規化制御部６とを具備し、ここで、母
音音量標準値設定部１１は録音音声分析部１１ｃと、母
音フレーム音量格納部１１ｄと、音量標準値算出部１１
ｅより成る。A second embodiment of the present invention will be described with reference to FIG. 5, reference numerals common to those in FIG. 3 mean the same members, and detailed description thereof will be omitted. In FIG. 5, the second embodiment of the volume normalizing device for recorded voice is a vowel volume standard value setting unit 11
A volume normalization number calculation unit 12, a threshold normalization number extraction unit 3, a normalization number average calculation unit 4, a file volume normalization number storage unit 5, and a volume normalization control unit 6. Here, the vowel volume standard value setting unit 11 includes a recorded voice analysis unit 11c, a vowel frame volume storage unit 11d, and a volume standard value calculation unit 11
e.

【００４０】この第２の実施例において、第１の実施例
と異なるところは、音量調整の基準となる音量標準値を
設定するに際して、第１の実施例は、音素ラベリングさ
れた録音音声サンプルから抽出された母音フレームの音
量の平均値を基準に設定したが、この第２の実施例は、
調整されるべき録音音声そのものから抽出された母音フ
レームの音量の平均値を基準に設定している。The second embodiment is different from the first embodiment in that when setting a volume standard value which is a reference for volume adjustment, the first embodiment uses a phoneme-labeled recorded voice sample. Although the average value of the volume of the extracted vowel frames is set as a reference, this second embodiment
The average value of the volume of vowel frames extracted from the recorded voice itself to be adjusted is set as the reference.

【００４１】母音音量標準値設定部１１は、録音音声サ
ンプルから各母音の音量標準値を設定する。即ち、母音
音量標準値設定部１１の録音音声分析部１１ｃは、音量
調整する被調整録音音声を一定時間毎にフレームに区分
し、各フレーム毎の音素を認識し、母音音素を有するフ
レームである母音フレームを抽出し、各母音フレームの
音量を測定する。The vowel volume standard value setting unit 11 sets the volume standard value of each vowel from the recorded voice sample. That is, the recorded voice analysis unit 11c of the vowel volume standard value setting unit 11 divides the adjusted recorded voice whose volume is to be adjusted into frames at regular time intervals, recognizes phonemes in each frame, and is a frame having vowel phonemes. Extract vowel frames and measure the volume of each vowel frame.

【００４２】母音フレーム音量格納部１１ｄは、各母音
フレームの音量を記憶格納する。これは、音量正規化数
算出部１２において、各母音フレームの正規化数を算出
するに際して、再び各母音フレームの音量を利用するか
らである。これについては後で説明する音量標準値算出
部１１ｅは、各母音フレームの音量に基づいて各母音毎
の母音フレームの音量の平均値を算出し、これを各母音
の音量調整基準となる音量標準値とする。The vowel frame volume storage section 11d stores and stores the volume of each vowel frame. This is because the volume normalization number calculation unit 12 uses the volume of each vowel frame again when calculating the normalization number of each vowel frame. The volume standard value calculation unit 11e, which will be described later, calculates the average value of the volume of the vowel frame for each vowel based on the volume of each vowel frame, and uses this as the volume standard as the volume adjustment reference for each vowel. The value.

【００４３】音量正規化数算出部１２は、各母音の音量
標準値と各母音フレームの音量から各母音フレームの正
規化数を第１の実施例と同様にして算出する。即ち、算
出式は下記の通りである。母音フレームの正規化数＝（母音の音量標準値）／（母
音フレームの音量）そして、各母音フレームの正規化数に基づいて音声ファ
イル毎にファイル音量正規化数を求め、ファイル音量正
規化数を設定記憶し、音声ファイル毎のファイル音量正
規化数に基づいて音声ファイル毎に音量を制御すること
は、第１の実施例の場合と同様に実施される。The volume normalization number calculation unit 12 calculates the normalization number of each vowel frame from the volume standard value of each vowel and the volume of each vowel frame in the same manner as in the first embodiment. That is, the calculation formula is as follows. Vowel frame normalization number = (vowel volume standard value) / (vowel frame volume) Then, the file volume normalization number is calculated for each audio file based on the normalization number of each vowel frame, and the file volume normalization number is calculated. Is set and stored, and the sound volume is controlled for each sound file based on the file sound volume normalization number for each sound file, as in the case of the first embodiment.

【００４４】第２の実施例の場合、音量調整する被調整
録音音声そのものから各母音の音量平均値を音量標準値
として設定し、被調整録音音声の音声ファイル毎にこの
音量標準値に等しくする音量の調整制御を実施する。第
２の実施例は、音量標準値を設定するのに録音音声サン
プルを必要とせず、被調整録音音声の各母音フレームの
音量を格納する母音フレーム音量格納部１１ｄを準備す
ることにより、音量調整に人手を要しないことその他、
第１の実施例と同様の効果が得られる。ただ、多数の被
調整録音音声を調整録音処理する場合は、音量基準とな
る録音音声サンプルを使用する方が、全ての被調整録音
音声に対して統一された音量に調整することができて好
適である。In the case of the second embodiment, the volume average value of each vowel is set as the volume standard value from the adjusted recording voice itself for volume adjustment, and is made equal to this volume standard value for each audio file of the adjusted recording voice. Perform volume adjustment control. The second embodiment does not require a recorded voice sample to set the volume standard value, and adjusts the volume by preparing a vowel frame volume storage unit 11d that stores the volume of each vowel frame of the recorded voice to be adjusted. Other than that,
The same effect as the first embodiment can be obtained. However, when adjusting and recording a large number of adjusted recording sounds, it is preferable to use the recorded sound sample that serves as the volume reference, because the adjusted sound volume can be adjusted to be uniform for all the adjusted recording sounds. Is.

【００４５】この発明は、以上の実施例の他に、様々な
実施の態様をとることができる。先の実施例において
は、音量標準値を録音音声サンプル、或は被調整録音音
声から求めたが、各母音毎の音量標準値が既知の場合
は、これら各母音の音量標準値を所定の入力部より音量
正規化装置に入力して被調整録音音声の各母音をこれら
の音量標準値に等しくする音量制御を実施することがで
きる。The present invention can take various modes other than the above embodiments. In the above embodiment, the volume standard value was obtained from the recorded voice sample or the adjusted recorded voice, but when the volume standard value for each vowel is known, the volume standard value of each vowel is input to the predetermined value. Volume control can be performed by inputting the vowels of the adjusted recording voice to these volume standard values by inputting to the volume normalization device from the section.

【００４６】また、先の各実施例においては、被調整録
音音声の音量を各音声ファイル毎に音量を調整して音量
標準値に等しくする音量制御を音声ファイル毎のファイ
ル音量正規化数を設定して実施したが、母音フレームの
正規化数に基づいて各母音フレーム毎に音量の調整制御
を実施することができる。Further, in each of the above-described embodiments, the volume control for adjusting the volume of the recording sound to be adjusted for each voice file and making it equal to the standard volume value is performed by setting the file volume normalization number for each voice file. However, the volume adjustment control can be performed for each vowel frame based on the normalized number of vowel frames.

【００４７】[0047]

【発明の効果】以上の通りであって、この発明によれ
ば、被調整録音音声の各母音の音量基準となる音量標準
値を設定して、被調整録音音声の各音声ファイル毎に各
母音フレームの音量の平均値が音量標準値に等しくする
制御を人手に依らずして自動的に実施するので、手間が
かからず、正確に音量調整を実施することができる。As described above, according to the present invention, the volume standard value which is the volume reference of each vowel of the adjusted recorded voice is set, and each vowel is recorded for each audio file of the adjusted recorded voice. Since the control for making the average value of the volume of the frame equal to the standard volume value is automatically performed without human intervention, it is possible to accurately perform the volume adjustment without any trouble.

【００４８】そして、音量基準となる録音音声サンプル
を使用することにより、多数の被調整録音音声を調整録
音処理する場合に、全ての被調整録音音声に対して統一
された音量に調整することができて好適である。また、
被調整録音音声の各母音フレームの音量を格納する母音
フレーム音量格納部を準備することにより、音量標準値
を設定するのに録音音声サンプルを必要とせず、録音音
声の音量正規化装置を簡略化することができる。By using the recorded voice sample as the volume reference, when a large number of adjusted recorded voices are subjected to the adjusted recording processing, it is possible to adjust the volume to be uniform for all the adjusted recorded voices. It is possible and preferable. Also,
By preparing a vowel frame volume storage that stores the volume of each vowel frame of the adjusted recorded voice, a recorded voice sample is not required to set the standard volume value, and the recorded voice volume normalizer is simplified. can do.

【００４９】更に、音量標準値を設定するに際して各母
音の既知の音量標準値を入力する構成を採用することに
より、録音音声の音量正規化装置を更に簡略化すること
ができる。また、母音フレーム音量正規化数算出部にお
いて算出された各母音フレームの正規化数の内から所定
範囲内の正規化数のみ抽出して使用することにより、録
音対象ではない大音量の騒音および小音量の雑音を母音
フレーム音量正規化数の算出から排除することができ、
適正な母音フレーム音量正規化数を求めることができ
る。Further, by adopting the configuration of inputting the known volume standard value of each vowel when setting the volume standard value, the volume normalizing apparatus for recorded voice can be further simplified. Further, by extracting and using only the normalization number within a predetermined range from the normalization number of each vowel frame calculated by the vowel frame volume normalization number calculation unit, a large volume of noise and a low volume which is not a recording target are recorded. Volume noise can be excluded from the calculation of the vowel frame volume normalization number,
An appropriate vowel frame volume normalization number can be obtained.

【図面の簡単な説明】[Brief description of drawings]

【図１】実施例を説明する図。FIG. 1 is a diagram illustrating an example.

【図２】音声ファイルを説明する図。FIG. 2 is a diagram illustrating an audio file.

【図３】他の実施例を説明する図。FIG. 3 is a diagram illustrating another embodiment.

【図４】実施例の動作を説明する図。FIG. 4 is a diagram for explaining the operation of the embodiment.

【図５】更に、他の実施例を説明する図。FIG. 5 is a diagram for explaining another embodiment.

【符号の説明】[Explanation of symbols]

１、１１母音音量標準値設定部１ａサンプル音声分析部１ｂ、１１ｅ音量標準値算出部２母音フレーム音量正規化数算出部２ａ、１１ｃ録音音声分析部２ｂ正規化数算出部３閾値内正規化数抽出部４正規化数平均算出部５ファイル音量正規化数格納部６音量正規化制御部１１ｄ母音フレーム音量格納部1, 11 Vowel volume standard value setting section1a Sample voice analysis unit1b, 11e Volume standard value calculator2 Vowel frame volume normalization number calculator2a, 11c Recording voice analysis unit2b Normalized number calculator3 Normalized number extraction unit within threshold4 Normalized number average calculator5 file volume normalized number storage6 Volume normalization control unit11d Vowel frame volume storage

フロントページの続き (56)参考文献特開平８−263520（ＪＰ，Ａ) 特開平９−297596（ＪＰ，Ａ) 特許2992324（ＪＰ，Ｂ２) 特公平４−65391（ＪＰ，Ｂ２) 特表平11−501409（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/02 G10L 15/20Continuation of the front page (56) References JP-A-8-263520 (JP, A) JP-A-9-297596 (JP, A) Patent 2992324 (JP, B2) JP-B 4-65391 (JP, B2) JP Table 5 11-501409 (JP, A) (58) Fields investigated (Int.Cl.⁷ , DB name) G10L 21/02 G10L 15/20

Claims

Translated fromJapanese

(57)【特許請求の範囲】(57) [Claims]

【請求項１】被調整録音音声の音量調整の基準となる
各母音の音量標準値を設定しておき、録音音量を調整されるべき被調整録音音声の各母音の音
量をそれぞれの音量標準値に等しくする制御を実施する
録音音声の音量正規化方法。1. A volume standard value of each vowel that serves as a reference for adjusting the volume of a recording sound to be adjusted is set in advance, and the volume of each vowel of the recording sound to be adjusted whose recording volume is to be adjusted is set to a standard value of each volume. A method for normalizing the volume of a recorded voice that implements a control to equalize.

【請求項２】請求項１に記載される録音音声の音量正
規化方法において、被調整録音音声の各母音の音量をそれぞれの音量標準値
に等しくする制御は録音単位である音声ファイル毎に実
施することを特徴とする録音音声の音量正規化方法。2. The sound volume normalizing method according to claim 1, wherein the volume of each vowel of the adjusted sound to be adjusted is made equal to a standard volume value for each sound file as a recording unit. A method for normalizing the volume of recorded voice, characterized by:

【請求項３】請求項２に記載される録音音声の音量正
規化方法において、被調整録音音声の母音フレーム毎に音量標準値に基づい
て音量正規化数を算出し、被調整録音音声の各音声ファイル毎に音量正規化数の平
均値を算出し、音量正規化数の平均値をファイル音量正規化数として記
憶格納し、ファイル音量正規化数に基づいて音声ファイル毎の被調
整録音音声の音量正規化を行うことを特徴とする録音音
声の音量正規化方法。3. The volume normalization method for recorded sound according to claim 2, wherein a volume normalization number is calculated based on a volume standard value for each vowel frame of the recorded sound to be adjusted, The average value of the volume normalization number is calculated for each audio file, the average value of the volume normalization number is stored as a file volume normalization number, and the adjusted recording sound of each audio file is stored based on the file volume normalization number. A sound volume normalization method characterized by performing sound volume normalization.

【請求項４】請求項１ないし請求項３の内の何れかに
記載される録音音声の音量正規化方法において、予め音素ラベリングされた録音音声サンプルをフレーム
に区分し、各フレーム毎に音素認識を行い、母音フレー
ムの音量を検出し、各母音毎に母音フレームの音量の平
均値を算出し、この各母音毎の音量の平均値を音量標準
値として設定することを特徴とする録音音声の音量正規
化方法。4. The recorded sound volume normalization method according to any one of claims 1 to 3, wherein a recorded sound sample in which phonemes are labeled in advance is divided into frames, and phoneme recognition is performed for each frame. The volume of the vowel frame is detected, the average value of the volume of the vowel frame is calculated for each vowel, and the average value of the volume of each vowel is set as the volume standard value. Volume normalization method.

【請求項５】請求項１ないし請求項３の内の何れかに
記載される録音音声の音量正規化方法において、被調整録音音声をフレームに区分し、各フレーム毎に音
素認識を行い、各母音フレームの音量を検出して記憶格
納し、各母音毎に母音フレームの音量の平均値を算出
し、この各母音毎の平均値を音量標準値として設定する
ことを特徴とする録音音声の音量正規化方法。5. The sound volume normalizing method according to claim 1, wherein the sound recording to be adjusted is divided into frames, and phonemes are recognized for each frame. The volume of the recorded voice is characterized by detecting and storing the volume of the vowel frame, storing and storing the average value of the volume of the vowel frame for each vowel, and setting this average value for each vowel as the volume standard value. Normalization method.

【請求項６】請求項１ないし請求項３の内の何れかに
記載される録音音声の音量正規化方法において、音量標準値を設定するに際して各母音の既知の音量標準
値を入力することを特徴とする録音音声の音量正規化方
法。6. The sound volume normalization method according to claim 1, further comprising inputting a known sound volume standard value of each vowel when setting the sound volume standard value. A method for normalizing the volume of recorded voice that is featured.

【請求項７】請求項３に記載される録音音声の音量正
規化方法において、被調整録音音声をフレームに区分し、各フレーム毎に音
素認識を行い、各母音フレームの音量を検出し、母音フ
レームと音量標準値に基づいて母音フレーム毎の音量正
規化数を算出することを特徴とする録音音声の音量正規
化方法。7. The volume normalization method for recorded voice according to claim 3, wherein the recorded voice to be adjusted is divided into frames, phoneme recognition is performed for each frame, and the volume of each vowel frame is detected. A volume normalization method for recorded voice, comprising calculating a volume normalization number for each vowel frame based on a frame and a volume standard value.

【請求項８】請求項３ないし請求項５の内の何れかに
記載される録音音声の音量正規化方法において、各母音フレームの正規化数の内から所定範囲内の正規化
数を抽出してこれらの平均値をファイル音量正規化数と
して設定することを特徴とする録音音声の音量正規化方
法。8. The sound volume normalization method according to any one of claims 3 to 5, wherein a normalization number within a predetermined range is extracted from the normalization number of each vowel frame. The average value of these values is set as a file volume normalization number.

【請求項１０】請求項９に記載される録音音声の音
量正規化装置において、所定範囲内のフレーム音量正規化数を抽出する閾値内正
規化数抽出部を具備して、抽出結果を正規化数平均算出
部４に出力する、ことを特徴とする録音音声の音量正規化装置。10. The sound volume normalizing apparatus according to claim 9, further comprising: an in-threshold normalization number extraction unit that extracts a frame sound volume normalization number within a predetermined range, and normalizes the extraction result. A device for normalizing the volume of recorded voice, which is output to the number average calculation unit 4.

【請求項１１】請求項９および請求項１０の内の何
れかに記載される録音音声の音量正規化装置において、母音音量標準値設定部は、予め音素ラベリングされた録
音音声サンプルをフレームに区分し、各フレーム毎に音
素認識を行い、母音フレームの音量を検出するサンプル
音声分析部および各母音毎に母音フレームの音量の平均
値を算出する音量標準値算出部より成り、母音フレーム音量正規化数算出部は、被調整録音音声を
フレームに区分し、各フレーム毎に音素認識を行い、各
母音フレームの音量を検出する録音音声分析部２ａおよ
び母音フレームの音量と音量標準値に基づいて母音フレ
ーム毎の音量正規化数を算出する正規化数算出部より成
る、ことを特徴とする録音音声の音量正規化装置。11. The recorded sound volume normalization apparatus according to claim 9, wherein the vowel sound volume standard value setting unit classifies the recorded sound sample pre-phoneme-labeled into frames. The vowel frame volume normalization consists of a sample speech analysis unit that performs phoneme recognition for each frame and detects the volume of vowel frames, and a volume standard value calculation unit that calculates the average value of the volume of vowel frames for each vowel. The number calculation unit divides the adjusted recording voice into frames, performs phoneme recognition for each frame, and detects the volume of each vowel frame, and the recorded voice analysis unit 2a and the vowel sound based on the volume and volume standard value of the vowel frame. A recorded sound volume normalization device comprising a normalization number calculation unit for calculating a normalization number for each frame.

【請求項１２】請求項９および請求項１０の内の何
れかに記載される録音音声の音量正規化装置において、母音音量標準値設定部は、被調整録音音声をフレームに
区分し、各フレーム毎に音素認識を行い、各母音フレー
ムの音量を検出する録音音声分析部と、各母音フレーム
の音量を記憶格納する母音フレーム音量格納部と、各母
音毎に母音フレームの音量の平均値を算出する音量標準
値算出部より成る、ことを特徴とする録音音声の音量正規化装置。12. The sound volume normalizing apparatus according to claim 9 or 10, wherein the vowel sound volume standard value setting section divides the adjusted sound recording into frames, The phoneme recognition is performed for each of the vowel frames, the recorded voice analysis unit that detects the volume of each vowel frame, the vowel frame volume storage unit that stores and stores the volume of each vowel frame, and the average value of the volume of each vowel frame is calculated. A sound volume normalizing device comprising a sound volume standard value calculating unit.

【請求項１３】請求項９および請求項１０の内の何
れかに記載される録音音声の音量正規化装置において、母音音量標準値設定部は、各母音の既知の音量標準値を
入力する入力部を有するものであることを特徴とする録
音音声の音量正規化装置。13. The volume normalization device for recorded voice according to claim 9, wherein the vowel volume standard value setting unit inputs a known volume standard value of each vowel. A device for normalizing the volume of a recorded voice, which has a section.