JP2007256619A

Movatterモバイル変換

Info

Publication number: JP2007256619A
Application number: JP2006080812A
Authority: JP
Inventors: Noriyuki Hata; 紀行畑
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-03-23
Filing date: 2006-03-23
Publication date: 2007-10-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide an evaluation method which adopts subjective evaluation of a listener. <P>SOLUTION: In a storing section 32 of a server 3, results which are subjectively evaluated by an evaluating person on various singing voice (sample voice data) are stored beforehand. A control section 31 transmits the evaluation on the singing voice (sample voice data) which is similar to the singing voice of a practicing person (practicing person voice data) to a karaoke device 2 as the evaluation for the singing voice of the practicing person. Thereby, the subjective evaluation of the listener which is provided beforehand for the singing voice which is similar to the singing voice of the practicing person, becomes the evaluation for the singing voice of the practicing person. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

Translated fromJapanese

本発明は歌唱や演奏を評価する技術に関する。 The present invention relates to a technique for evaluating singing and performance.

カラオケ装置においては、歌唱者の歌唱の巧拙を評価するための方法が種々提案されている。その評価方法の１つに、楽譜の音符を評価基準として歌唱音声を客観的に評価するものがある（例えば特許文献１，２参照）。このような客観的評価方法によれば、楽譜の内容に忠実な歌唱が高く評価されることになる。さらに、このような評価方法は歌唱だけではなく、楽器の演奏にも適用し得る。 In a karaoke apparatus, various methods for evaluating the skill of a singer's singing have been proposed. As one of the evaluation methods, there is a method that objectively evaluates a singing voice by using a musical note of a musical score as an evaluation standard (see, for example,Patent Documents 1 and 2). According to such an objective evaluation method, singing faithful to the content of the score is highly evaluated. Furthermore, such an evaluation method can be applied not only to singing but also to playing musical instruments.

ところで、歌唱や演奏の評価は聴取者の主観に委ねられるという一面もある。例えば歌手のように熟練した歌唱者は、楽譜の内容に忠実に従って歌唱することはほとんどなく、その多くが、歌い始めや歌い終わりを意図的にずらしたり、声質や音量を変化させたり、或いはビブラートやこぶし等の各種歌唱技法を用いたりして、歌のなかに感情の盛り上がり（情感）を表現する。このような情感は歌唱者によって様々に表現される一方、その情感表現に対する評価も聴取者の主観によって様々である。そこで、特許文献３には、サンプルとなる歌唱を複数の聴取者に聞かせた後で、その歌唱に対する評価をアンケート方式で収集し、その結果を考慮しながら聴取者の主観を取り入れた評価基準を策定するという手法が提案されている。
特開昭６２−０４０４８８号公報特許第２８９０６５９号公報特開２０００−９９０２４号公報By the way, there is one aspect that the evaluation of singing and performance is left to the listener's subjectivity. For example, a skilled singer like a singer rarely sings according to the content of the score. Express the excitement (feelings) in the song by using various singing techniques such as fist. While such emotions are expressed in various ways by the singer, the evaluation of the emotional expression also varies depending on the subjectivity of the listener. Therefore, inPatent Document 3, after a plurality of listeners listen to a sample singing, the evaluation of the singing is collected by a questionnaire method, and an evaluation standard that incorporates the listener's subjectivity is taken into account while taking the result into consideration. A method of formulating is proposed.
JP-A-62-2040488 Japanese Patent No. 2890659 JP 2000-99024 A

最近では、このような聴取者の主観を取り入れた主観的評価手法の確立が望まれている。そこで、本発明の目的は、従来とは異なる仕組みで聴取者の主観的評価を取り入れた評価手法を提供することにある。 Recently, it has been desired to establish a subjective evaluation method that incorporates the subjectivity of such listeners. Accordingly, an object of the present invention is to provide an evaluation method that incorporates a subjective evaluation of a listener with a mechanism different from the conventional one.

上記課題を解決するため、本発明は、それぞれ異なる複数の歌唱音声又は演奏音の特徴を表す複数の特徴データと、各々の前記歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データとを対応付けて記憶する記憶手段と、練習者の歌唱音声又は演奏音の特徴を表す特徴データを取得する取得手段と、前記記憶手段によって記憶されている特徴データから、前記取得手段によって取得された特徴データに類似する、１以上の特徴データを選択する選択手段と、前記選択手段によって選択された特徴データに対応付けられた評価基準データを前記記憶手段から読み出し、前記練習者の歌唱音声又は演奏音に対する評価結果として出力する出力手段とを備えることを特徴とする評価装置を提供する。この評価装置において、さらに前記歌唱音声又は演奏音を表す複数の音声データを記憶する練習者音声データ記憶手段と、複数の音声再生装置とネットワークを介してデータ通信を行う通信手段と、前記練習者音声データ記憶手段によって記憶された音声データを前記通信手段によって前記音声再生装置に配信する配信手段と、前記音声再生装置によって再生された歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データを前記音声再生装置から取得する評価基準データ取得手段と、前記歌唱音声又は演奏音の特徴を表す特徴データと、前記評価基準データ取得手段によって取得された前記評価基準データとを対応付けて前記記憶手段に記憶させる登録手段とを備えることが望ましい。 In order to solve the above problems, the present invention includes a plurality of feature data representing characteristics of a plurality of different singing voices or performance sounds, and evaluation reference data representing a listener's evaluation on each of the singing voices or performance sounds. Features acquired by the acquisition means from storage means for storing in association, acquisition means for acquiring feature data representing the characteristics of the singing voice or performance sound of the practitioner, and feature data stored by the storage means Similar to the data, selection means for selecting one or more feature data, and evaluation reference data associated with the feature data selected by the selection means are read from the storage means, and the singing voice or performance sound of the practitioner And an output means for outputting as an evaluation result for the evaluation device. In this evaluation apparatus, a practitioner voice data storage means for storing a plurality of voice data representing the singing voice or performance sound, a communication means for performing data communication with a plurality of voice reproduction apparatuses via the network, and the practitioner Distribution means for distributing the voice data stored by the voice data storage means to the voice reproduction device by the communication means, and evaluation reference data representing a listener's evaluation of the singing voice or performance sound reproduced by the voice reproduction device. The storage means in association with the evaluation reference data acquisition means acquired from the sound reproducing device, the feature data representing the characteristics of the singing voice or performance sound, and the evaluation reference data acquired by the evaluation reference data acquisition means It is desirable to provide a registration means for storing the information.

また、本発明は、それぞれ異なる複数の歌唱音声又は演奏音の特徴を表す複数の特徴データと、各々の前記歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データとを対応付けて記憶する記憶手段と、制御手段とを備えた評価装置の制御方法であって、前記制御手段が、練習者の歌唱音声又は演奏音の特徴を表す特徴データを取得する第１のステップと、前記制御手段が、前記記憶手段によって記憶されている特徴データから、前記第１のステップにおいて取得された特徴データに類似する、１以上の特徴データを選択する第２のステップと、前記制御手段が、前記第２のステップにおいて選択された特徴データに対応付けられている評価基準データを前記記憶手段から読み出し、前記練習者の歌唱音声又は演奏音に対する評価結果として出力する第３のステップとを備えることを特徴とする制御方法を提供する。さらに、本発明は、コンピュータに対して機能を実現させるプログラムとしての形態も採り得る。 Moreover, this invention matches and memorize | stores the some characteristic data showing the characteristic of several different song voice or performance sound, and the evaluation reference data showing a listener's evaluation with respect to each said song voice or performance sound, respectively. A control method for an evaluation apparatus comprising a storage means and a control means, wherein the control means obtains feature data representing characteristics of a practitioner's singing voice or performance sound, and the control means A second step of selecting one or more feature data similar to the feature data acquired in the first step from the feature data stored by the storage means; and the control means comprises the first The evaluation reference data associated with the feature data selected instep 2 is read from the storage means, and the evaluation result for the singing voice or performance sound of the practitioner It provides a control method characterized by comprising a third step of and output. Furthermore, the present invention may also take the form of a program that causes a computer to realize functions.

本発明においては、それぞれ異なる歌唱音声又は演奏音の特徴を表す複数の特徴データと、各々の歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データとを対応付けて記憶しておき、練習者の歌唱音声又は演奏音の特徴を表す特徴データを取得すると、記憶されている特徴データから、取得された特徴データに類似する１以上の特徴データを選択し、選択した特徴データに対応付けられた評価基準データを練習者の歌唱音声又は演奏音に対する評価結果として出力する。つまり、練習者の歌唱音声や演奏音に類似する歌唱音声又は演奏音に対する聴取者の主観評価を、その練習者の歌唱音声や演奏音に対する評価として出力する。このように本発明によれば、従来とは異なる仕組みによって、聴取者の主観的評価を取り入れた評価手法を実現することができる。 In the present invention, a plurality of feature data representing the characteristics of different singing voices or performance sounds and evaluation reference data representing the evaluation of the listener for each singing voice or performance sound are stored in association with each other, and practice is performed. When the feature data representing the characteristics of the person's singing voice or performance sound is acquired, one or more feature data similar to the acquired feature data is selected from the stored feature data, and is associated with the selected feature data. The evaluation reference data is output as an evaluation result for the practitioner's singing voice or performance sound. That is, the listener's subjective evaluation on the singing voice or performance sound similar to the singing voice or performance sound of the practitioner is output as the evaluation on the singing voice or performance sound of the practitioner. As described above, according to the present invention, it is possible to realize an evaluation method incorporating the subjective evaluation of the listener by a mechanism different from the conventional one.

次に、本発明を実施するための最良の形態を説明する。
なお、以下の説明では、評価基準を決定するためのサンプルとしての歌唱を行う者を「歌唱者」と呼び、その歌唱者の歌唱を聴取して主観的に評価する者を「評価者（聴取者）」と呼び、その主観的な評価結果に基づいて歌唱が評価される者を「練習者」と呼ぶ。Next, the best mode for carrying out the present invention will be described.
In the following description, a person who performs a singing as a sample for determining an evaluation criterion is referred to as a “singer”, and a person who listens to the singer's song and evaluates it subjectively is referred to as an “evaluator (listening)”. A person who evaluates a song based on the subjective evaluation result is called a “practicing person”.

[１．構成]
図１は、本実施形態に係る評価システム１の全体構成を示すブロック図である。この評価システム１は、複数のカラオケ装置２ａ，２ｂ，２ｃと、サーバ装置３と、これらを接続するネットワーク４とを備えている。カラオケ装置２ａ，２ｂ，２ｃは、一般家庭や、カラオケボックス又は飲食店などの各種店舗に備えられており、音声データを再生して放音する音声再生装置として機能する。サーバ装置３は、練習者がカラオケ装置２ａ，２ｂ，２ｃを用いて行った歌唱を評価する評価装置として機能する。ネットワーク４は、例えばＩＳＤＮ（Integrated Services Digital Network）やインターネットであり、有線区間又は無線区間を含んでいる。図１には３つのカラオケ装置が例示されているが、この評価システム１に含まれるカラオケ装置の数は３に限定されるものではなく、これより多くても少なくてもよい。また、カラオケ装置２ａ，２ｂ，２ｃはいずれも同じ構成及び動作であるから、これらを各々区別する必要がない場合には単に「カラオケ装置２」と呼ぶことにする。[1. Constitution]
FIG. 1 is a block diagram showing the overall configuration of theevaluation system 1 according to the present embodiment. Thisevaluation system 1 includes a plurality ofkaraoke apparatuses 2a, 2b, 2c, aserver apparatus 3, and anetwork 4 connecting them. Thekaraoke apparatuses 2a, 2b, and 2c are provided in various households such as ordinary households, karaoke boxes, and restaurants, and function as sound reproducing apparatuses that reproduce and emit sound data. Theserver device 3 functions as an evaluation device that evaluates a song performed by a practitioner using thekaraoke devices 2a, 2b, and 2c. Thenetwork 4 is, for example, ISDN (Integrated Services Digital Network) or the Internet, and includes a wired section or a wireless section. Although three karaoke apparatuses are illustrated in FIG. 1, the number of karaoke apparatuses included in theevaluation system 1 is not limited to three, and may be more or less than this. Also, since thekaraoke devices 2a, 2b, and 2c all have the same configuration and operation, thekaraoke devices 2a, 2b, and 2c are simply referred to as “karaoke device 2” when it is not necessary to distinguish them.

図２は、カラオケ装置２の構成を示したブロック図である。制御部２１は例えばＣＰＵであり、記憶部２２に記憶されているコンピュータプログラムを読み出して実行することにより、カラオケ装置２の各部を制御する。表示部２３は、例えば液晶ディスプレイなどであり、制御部２１の制御の下で、カラオケ装置２を操作するためのメニュー画面や、背景画像に歌詞テロップが重ねられたカラオケ画面などの各種画面を表示する。操作部２４は、各種のキーを備えており、押下されたキーに対応した信号を制御部２１へ出力する。マイクロフォン２５は、歌唱者が発音した音声を収音する。音声処理部２６は、マイクロフォン２５によって収音された音声（アナログデータ）をデジタルデータに変換して制御部２１に出力する。スピーカ２７は、音声処理部２６から出力される音声を放音する。通信部２８は、制御部２１の制御の下で、ネットワーク４を介してサーバ装置３とデータ通信を行う。 FIG. 2 is a block diagram showing the configuration of thekaraoke apparatus 2. Thecontrol unit 21 is, for example, a CPU, and controls each unit of thekaraoke apparatus 2 by reading and executing a computer program stored in thestorage unit 22. Thedisplay unit 23 is a liquid crystal display, for example, and displays various screens such as a menu screen for operating thekaraoke device 2 and a karaoke screen in which lyrics telop is superimposed on a background image under the control of thecontrol unit 21. To do. Theoperation unit 24 includes various keys and outputs a signal corresponding to the pressed key to thecontrol unit 21. Themicrophone 25 picks up sound produced by the singer. Thesound processing unit 26 converts the sound (analog data) collected by themicrophone 25 into digital data and outputs the digital data to thecontrol unit 21. Thespeaker 27 emits sound output from thesound processing unit 26. Thecommunication unit 28 performs data communication with theserver device 3 via thenetwork 4 under the control of thecontrol unit 21.

記憶部２２は、例えばハードディスクなどの大容量の記憶手段であり、伴奏・歌詞データ記憶領域２２ａと、練習者音声データ記憶領域２２ｂと、楽譜音データ記憶領域２２ｃとを有している。伴奏・歌詞データ記憶領域２２ａには、楽曲の伴奏を行う各種楽器の演奏音が楽曲の進行に伴って記された伴奏データと、楽曲の歌詞を示す歌詞データとが対応付けられて記憶されている。伴奏データは、例えばＭＩＤＩ（Musical Instruments Digital Interface）形式などのデータ形式であり、練習者がカラオケ歌唱する際に再生される。歌詞データは、そのカラオケ歌唱の際に歌詞テロップとして表示部２３に表示される。練習者音声データ記憶領域２２ｂには、マイクロフォン２５から音声処理部２６を経てＡ／Ｄ変換された音声データが練習者音声データとして記憶される。この練習者音声データは例えばＷＡＶＥ形式やＭＰ３（MPEG Audio Layer-3）形式である。楽譜音データ記憶領域２２ｃには、曲の楽譜によって規定された歌唱音を表す楽譜音データが記憶されている。この楽譜音データは、例えばＭＩＤＩ形式などのデータ形式であり、歌唱音のピッチとその発音タイミングとを含んでいる。この楽譜音データは、例えば「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」などの、歌唱における各種技法を評価するために利用される。 Thestorage unit 22 is a large-capacity storage unit such as a hard disk, and includes an accompaniment / lyricdata storage area 22a, a practicer voicedata storage area 22b, and a score sounddata storage area 22c. In the accompaniment / lyricdata storage area 22a, accompaniment data in which performance sounds of various musical instruments for accompaniment of music are recorded as the music progresses and lyrics data indicating the lyrics of the music are stored in association with each other. Yes. The accompaniment data has a data format such as MIDI (Musical Instruments Digital Interface) format, and is reproduced when the practitioner sings a karaoke song. The lyrics data is displayed on thedisplay unit 23 as a lyrics telop at the time of the karaoke song. In the trainer speechdata storage area 22b, speech data that has been A / D converted from themicrophone 25 via thespeech processing unit 26 is stored as trainer speech data. The trainer audio data is, for example, in WAVE format or MP3 (MPEG Audio Layer-3) format. In the musical scoredata storage area 22c, musical score data representing a singing sound defined by the musical score of a song is stored. The musical score data is in a data format such as a MIDI format, for example, and includes the pitch of the singing sound and its pronunciation timing. This musical score data is used to evaluate various techniques in singing, such as “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for”, “breathing”.

次に、図３は、サーバ装置３の構成を示したブロック図である。
図３において、制御部３１は例えばＣＰＵであり、記憶部３２に記憶されているコンピュータプログラムを読み出して実行することにより、サーバ装置３の各部を制御する。記憶部３２は、例えばハードディスクなどの大容量の記憶手段であり、サンプル音声データ記憶領域３２ａと、評価基準データ記憶領域３２ｂと、練習者音声特徴データ記憶領域３２ｃと、楽譜音データ記憶領域３２ｄとを有している。通信部３３は、制御部３１の制御の下でネットワーク４を介してカラオケ装置２とデータ通信を行う。Next, FIG. 3 is a block diagram showing a configuration of theserver device 3.
In FIG. 3, thecontrol unit 31 is, for example, a CPU, and controls each unit of theserver device 3 by reading and executing a computer program stored in thestorage unit 32. Thestorage unit 32 is a large-capacity storage unit such as a hard disk, for example, and includes a sample voicedata storage area 32a, an evaluation referencedata storage area 32b, a trainer voice featuredata storage area 32c, and a score sounddata storage area 32d. have. Thecommunication unit 33 performs data communication with thekaraoke apparatus 2 via thenetwork 4 under the control of thecontrol unit 31.

サンプル音声データ記憶領域３２ａには、それぞれ異なる複数の歌唱者の歌唱音声を表す音声データが記憶されている。この音声データは、評価サンプルとしての歌唱音声を表すものであるので、「サンプル音声データ」という。このサンプル音声データはサーバ装置３からカラオケ装置２へと送信され、カラオケ装置２によって歌唱音声として再生される。評価者はこの歌唱音声を聴取し、カラオケ装置２に表示される例えば図４に示すような入力画面にその評価を入力する。この評価は、「良い」、「やや良い」、「どちらとも言えない」、「やや悪い」、「悪い」の５段階評価である。入力された評価の結果は、カラオケ装置２からサーバ装置３へと送信され、評価基準データとして評価基準データ記憶領域３２ｂに記憶される。サーバ装置３はこのようにして記憶した評価基準データに基づいて練習者の歌唱を評価する。 The sample voicedata storage area 32a stores voice data representing the singing voices of a plurality of different singers. Since this voice data represents singing voice as an evaluation sample, it is called “sample voice data”. The sample voice data is transmitted from theserver device 3 to thekaraoke device 2 and is reproduced by thekaraoke device 2 as a singing voice. The evaluator listens to this singing voice and inputs the evaluation on an input screen as shown in FIG. This evaluation is a five-step evaluation of “good”, “somewhat good”, “cannot say either”, “somewhat bad”, and “bad”. The input evaluation result is transmitted from thekaraoke apparatus 2 to theserver apparatus 3 and stored as evaluation reference data in the evaluation referencedata storage area 32b. Theserver device 3 evaluates the practitioner's song based on the evaluation reference data stored in this way.

ここで、図５は、評価基準データの一例を示す図である。図５では、曲ＩＤ（Identification：識別情報）「ｍ１」が割り当てられた曲を、歌唱者ＩＤ「ａ１」が割り当てられた歌唱者が歌唱し、評価者ＩＤ「ｐ１」、「ｐ２」、「ｐ３」・・・が割り当てられた評価者がその歌唱を評価した場合を例示している。この評価基準データには、評価者の属性（性別・年齢など）や、評価がなされた日時（評価基準データが生成された日時であり、以下、評価日時という）も含まれている。なお、以下では、曲ＩＤ「ｍ１」が割り当てられた曲を曲ｍ１と呼び、歌唱者ＩＤ「ａ１」が割り当てられた歌唱者を歌唱者ａ１と呼び、評価者ＩＤ「ｐ１」、「ｐ２」、「ｐ３」・・・が割り当てられた評価者をそれぞれ評価者ｐ１，ｐ２，ｐ３と呼ぶ。なお、曲ＩＤとしては例えば曲名やカラオケ曲ナンバーなどの識別情報を用いればよい。また、歌唱者ＩＤや評価者ＩＤは、例えばカラオケボックスなどの店舗が発行した会員番号やユーザＩＤなどの識別情報であってもよいし、歌唱や評価が行われた場所（カラオケボックスの部屋番号や店舗の店番号）などであってもよい。また、歌唱が行われた日時や評価がされた日時を、歌唱者ＩＤや評価者ＩＤとして用いることもできる。 Here, FIG. 5 is a diagram illustrating an example of the evaluation reference data. In FIG. 5, a singer assigned a singer ID “a1” sings a song assigned a tune ID (Identification: identification information) “m1”, and evaluator IDs “p1”, “p2”, “ The case where the evaluator assigned p3 "... evaluated the song is illustrated. This evaluation standard data includes the evaluator's attributes (gender, age, etc.) and the date and time when the evaluation was made (the date and time when the evaluation standard data was generated, hereinafter referred to as evaluation date and time). In the following, the song assigned the song ID “m1” is called the song m1, the singer assigned the singer ID “a1” is called the singer a1, and the evaluator IDs “p1” and “p2”. , “P3”... Are assigned to the evaluators p1, p2, and p3, respectively. For example, identification information such as a song name or karaoke song number may be used as the song ID. Further, the singer ID and the evaluator ID may be identification information such as a membership number or a user ID issued by a store such as a karaoke box, or a place where the singing or evaluation is performed (room number of the karaoke box) Or the store number of the store). The date and time when the singing was performed and the date and time when the singing was performed can also be used as the singer ID and the evaluator ID.

この図５に示すように、曲ｍ１のＡメロ、Ｂメロ、サビの音程（ピッチ）、タイミング、音量（パワー）、技巧、声質（スペクトル）の評価及び総合評価が評価者ｐ１，ｐ２，ｐ３・・・の各々によってなされている。例えば曲ｍ１のＡメロ、Ｂメロ、サビの音程（ピッチ）に対する評価者ｐ１による評価レベルは「４」、「３」、「４」である。同様に、曲ｍ１のＡメロ、Ｂメロ、サビに対する評価者ｐ２による評価レベルは「５」、「５」、「５」であり、評価者ｐ３による評価レベルは「２」、「２」、「２」である。なお、この評価レベルの値は大きいほど良い評価を意味している。つまり、評価レベル１が図４の「悪い」に相当し、評価レベル２が図４の「やや悪い」に相当し、評価レベル３が図４の「どちらとも言えない」に相当し、評価レベル４が図４の「やや良い」に相当し、評価レベル５が図４の「良い」に相当する。 As shown in FIG. 5, the evaluation and overall evaluation of the A melody, B melody, chorus pitch (timing), timing, volume (power), skill, voice quality (spectrum) of the song m1 are evaluators p1, p2, p3. It is made by each of ... For example, the evaluation levels by the evaluator p1 for the pitches (pitch) of A melody, B melody, and chorus of the song m1 are “4”, “3”, and “4”. Similarly, the evaluation levels by the evaluator p2 for the A melody, B melody, and chorus of the song m1 are “5”, “5”, and “5”, and the evaluation levels by the evaluator p3 are “2”, “2”, “2”. Note that the larger the value of the evaluation level, the better the evaluation. That is, theevaluation level 1 corresponds to “bad” in FIG. 4, theevaluation level 2 corresponds to “somewhat bad” in FIG. 4, theevaluation level 3 corresponds to “neither” in FIG. 4 corresponds to “slightly good” in FIG. 4, andevaluation level 5 corresponds to “good” in FIG. 4.

さらに、上述した複数の評価者ｐ１，ｐ２，ｐ３・・・による評価の結果が集計され、その集計値が図５の「集計結果」の項に記述される。図示の例では、例えば歌唱者ａ１が曲ｍ１を歌唱した際のＡメロの音程（ピッチ）に対しては、合計２００人の評価者のうち、３４人の評価者が評価レベル１と評価し、３６人の評価者が評価レベル２と評価し、４５人の評価者が評価レベル３と評価し、５６人の評価者が評価レベル４と評価し、２９人の評価者が評価レベル５と評価していることを示している。つまり、評価レベル４（やや良い）と評価した評価者が最も多いというわけである。一方、歌唱者ａ１が曲ｍ１を歌唱した際のＢメロの音程（ピッチ）に対しては、合計２００人の評価者のうち、４人の評価者が評価レベル１と評価し、２７人の評価者が評価レベル２と評価し、８５人の評価者が評価レベル３と評価し、６４人の評価者が評価レベル４と評価し、２０人の評価者が評価レベル５と評価していることを示している。つまり、評価レベル３（どちらとも言えない）と評価した評価者が最も多い。また、歌唱者ａ１が曲ｍ１を歌唱した際のサビの音程（ピッチ）に対しては、合計２００人の評価者のうち、２７人の評価者が評価レベル１と評価し、３５人の評価者が評価レベル２と評価し、３５人の評価者が評価レベル３と評価し、４８人の評価者が評価レベル４と評価し、５５人の評価者が評価レベル５と評価していることを示している。つまり、評価レベル５（良い）と評価した評価者が最も多い。
以上のような評価基準データが評価基準データ記憶領域３２ｂにそれぞれの曲毎に多数記憶される。Further, the evaluation results by the plurality of evaluators p1, p2, p3... Described above are aggregated, and the aggregated value is described in the “Aggregation result” section of FIG. In the illustrated example, for example, for the pitch (pitch) of A melody when the singer a1 sings the song m1, out of a total of 200 evaluators, 34 evaluators have evaluated theevaluation level 1. 36 evaluators evaluate toevaluation level 2, 45 evaluators evaluate toevaluation level 3, 56 evaluators evaluate toevaluation level 4, and 29 evaluators evaluate toevaluation level 5. It shows that it is evaluating. That is, most evaluators have evaluated the evaluation level 4 (somewhat good). On the other hand, for the pitch (pitch) of the B melody when the singer a1 sings the song m1, four evaluators out of a total of 200 evaluators evaluate theevaluation level 1 to 27 people. Evaluator evaluates toevaluation level 2, 85 evaluators evaluate toevaluation level 3, 64 evaluators evaluate toevaluation level 4, and 20 evaluators evaluate toevaluation level 5. It is shown that. That is, most evaluators have evaluated evaluation level 3 (which can be said to be neither). In addition, for the chorus pitch (pitch) when the singer a1 sings the song m1, out of a total of 200 evaluators, 27 evaluators evaluate it as anevaluation level 1, and 35 evaluations. The evaluator evaluates toevaluation level 2, 35 evaluators evaluate toevaluation level 3, 48 evaluators evaluate toevaluation level 4, and 55 evaluators evaluate toevaluation level 5. Is shown. That is, most evaluators have evaluated the evaluation level 5 (good).
A large number of the above-described evaluation reference data is stored for each song in the evaluation referencedata storage area 32b.

再び図３の説明に戻る。
練習者音声特徴データ記憶領域３２ｃには、カラオケ装置２から送信されてくる練習者音声データの特徴を表す練習者音声特徴データが記憶される。楽譜音データ記憶領域３２ｄには、カラオケ装置２に記憶されている楽譜音データと同様に、曲の楽譜によって規定された歌唱音を表す楽譜音データが記憶されている。この楽譜音データも、歌唱における各種技法を評価するために利用される。Returning to the description of FIG.
The practitioner voice featuredata storage area 32c stores practitioner voice feature data representing the features of the practitioner voice data transmitted from thekaraoke apparatus 2. In the musical scoredata storage area 32d, musical score data representing a singing sound defined by the musical score of a song is stored in the same manner as the musical score data stored in thekaraoke apparatus 2. This musical score data is also used for evaluating various techniques in singing.

［２．動作］
次に、本実施形態の動作説明を行う。
本実施形態の動作は、大別して、歌唱者のサンプル音声データをサーバ装置３に登録する動作と、サンプル音声データに対する評価者の評価結果を収集し、これを評価基準データとしてサーバ装置３に蓄積する動作と、蓄積した評価基準データを用いて練習者の歌唱を評価する動作とに分かれる。以下、これらを順番に説明する。[2. Operation]
Next, the operation of this embodiment will be described.
The operation of this embodiment is roughly divided into an operation for registering the sample voice data of the singer in theserver apparatus 3 and an evaluation result of the evaluator for the sample voice data, and storing this in theserver apparatus 3 as evaluation reference data. And the operation of evaluating the practitioner's singing using the accumulated evaluation standard data. Hereinafter, these will be described in order.

［２−１．サンプル音声データの登録］
まず、歌唱者のサンプル音声データをサーバ装置３に登録する動作について説明する。
図６のシーケンス図において、歌唱者は、カラオケ装置２の操作部２４を操作して、自身の歌唱をサンプル音声データとして登録することを指示し、さらに所望する曲の曲ＩＤを指定して伴奏データの再生を指示する。このとき、歌唱者は自身の歌唱者ＩＤを操作部２４によって入力するか又は制御部２１が歌唱者ＩＤを生成する。そして、制御部２１は、上記の指示に応じてカラオケ伴奏を開始する（ステップＳ１）。即ち、制御部２１は、伴奏・歌詞データ記憶領域２２ａから伴奏データを読み出して音声処理部２６に供給し、音声処理部２６は、伴奏データをアナログ信号に変換し、スピーカ２７に供給して放音させる。同時に、制御部２１は、「伴奏に合わせて歌唱してください」というような歌唱を促すメッセージを表示部２３に表示させてから、伴奏・歌詞データ記憶領域２２ａから歌詞データを読み出して歌詞テロップを表示部２３に表示させる。歌唱者は、表示された歌詞テロップを参照しつつ、スピーカ２７から放音される伴奏に合わせて歌唱を行う。このとき、歌唱者の音声はマイクロフォン２５によって収音されて音声信号に変換され、音声処理部２６へと出力される。音声処理部２６によってＡ／Ｄ変換された音声データは、伴奏開始からの経過時間を表す情報と共に、記憶部２２に記憶（録音）されていく（ステップＳ２）。[2-1. Register sample audio data]
First, the operation | movement which registers the sample audio | voice data of a singer into theserver apparatus 3 is demonstrated.
In the sequence diagram of FIG. 6, the singer operates theoperation unit 24 of thekaraoke apparatus 2 to instruct to register his / her song as sample audio data, and further specifies the song ID of the desired song and accompaniment. Instruct to play data. At this time, the singer inputs his singer ID through theoperation unit 24 or thecontrol unit 21 generates the singer ID. And thecontrol part 21 starts karaoke accompaniment according to said instruction | indication (step S1). That is, thecontrol unit 21 reads the accompaniment data from the accompaniment / lyricdata storage area 22a and supplies the accompaniment data to theaudio processing unit 26. Theaudio processing unit 26 converts the accompaniment data into an analog signal, supplies it to thespeaker 27, and releases it. Let it sound. At the same time, thecontrol unit 21 displays a message prompting singing such as “Please sing along with the accompaniment” on thedisplay unit 23, then reads out the lyric data from the accompaniment / lyricdata storage area 22 a and displays the lyrics telop. It is displayed on thedisplay unit 23. The singer sings along with the accompaniment emitted from thespeaker 27 while referring to the displayed lyrics telop. At this time, the voice of the singer is picked up by themicrophone 25, converted into a voice signal, and output to thevoice processing unit 26. The audio data A / D converted by theaudio processing unit 26 is stored (recorded) in thestorage unit 22 together with information indicating the elapsed time from the start of accompaniment (step S2).

伴奏データの再生が終了すると、制御部２１は歌唱者の音声を録音する処理を終了する。次に、制御部２１は、記憶部２２に記憶されている音声データを、上記の曲ＩＤ及び歌唱者ＩＤと共に通信部２８からサーバ装置３に送信する（ステップＳ３）。サーバ装置３の制御部３１は、通信部３３によって音声データ、曲ＩＤ及び歌唱者ＩＤが受信されたことを検知すると、この音声データをサンプル音声データとして記憶部３２のサンプル音声データ記憶領域３２ａに記憶させるとともに、そのサンプル音声データに対応付けて曲ＩＤ及び歌唱者ＩＤを記憶する（ステップＳ４）。次に、制御部３１は、記憶部３２に記憶されているサンプル音声データを所定時間長のフレーム単位に分離し、フレーム単位でピッチ、スペクトル及びパワーを算出する（ステップＳ５）。 When the reproduction of the accompaniment data ends, thecontrol unit 21 ends the process of recording the singer's voice. Next, thecontrol part 21 transmits the audio | voice data memorize | stored in the memory |storage part 22 from thecommunication part 28 to theserver apparatus 3 with said music ID and singer ID (step S3). When thecontrol unit 31 of theserver device 3 detects that thecommunication unit 33 has received the audio data, the song ID, and the singer ID, the audio data is stored in the sample audiodata storage area 32a of thestorage unit 32 as sample audio data. At the same time, the song ID and the singer ID are stored in association with the sample voice data (step S4). Next, thecontrol unit 31 separates the sample audio data stored in thestorage unit 32 into frames of a predetermined time length, and calculates the pitch, spectrum, and power in units of frames (step S5).

次いで、制御部３１は、サンプル音声データから技法を抽出する（ステップＳ６）。前述したように、技法には、「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」がある。これらのうち、「ビブラート」は、音の高さをほんのわずかに連続的に上下させ、震えるような音色を出すという技法である。「しゃくり」は、目的の音より低い音から発音し、音程を滑らかに目的の音に近づけていくという技法である。「こぶし」は、装飾的に加えるうねるような節回しを行うという技法である。「ファルセット」は、いわゆる「裏声」で歌うという技法である。「つっこみ」は、歌い出しを本来のタイミングよりも早いタイミングにするという技法である。「ため」は、歌い出しを本来のタイミングよりも遅いタイミングにするという技法である。「息継ぎ」は、練習者が息継ぎをするタイミングを意味する。 Next, thecontrol unit 31 extracts a technique from the sample audio data (step S6). As described above, the techniques include “vibrato”, “shakuri”, “fist”, “farset”, “push”, “for”, and “breathing”. Among these, “vibrato” is a technique that produces a timbre-like tone by raising and lowering the pitch of the sound only slightly. “Shikkuri” is a technique in which sound is generated from a sound lower than the target sound, and the pitch is smoothly brought close to the target sound. “Fist” is a technique of performing a undulating curl that is decoratively added. “Falset” is a technique of singing with a so-called “back voice”. “Tsukumi” is a technique in which singing is performed at a timing earlier than the original timing. “Tame” is a technique in which singing is made later than the original timing. “Respiration” means the timing when the practitioner takes a breath.

まず、制御部３１は、これらの各技法が用いられている区間を特定（検出）する。例えば「ビブラート」及び「しゃくり」については、サンプル音声データのピッチに基づいて検出することができる。また、「こぶし」及び「ファルセット」については、サンプル音声データのスペクトルに基づいて検出することができる。また、「ため」及び「つっこみ」については、サンプル音声データのピッチと、楽譜音データ記憶領域３２ｄに記憶されている楽譜音データとに基づいて検出することができる。また、「息継ぎ」については、サンプル音声データのパワーと、楽譜音データ記憶領域３２ｄに記憶されている楽譜音データとに基づいて検出することができる。 First, thecontrol unit 31 specifies (detects) a section in which each of these techniques is used. For example, “vibrato” and “shrimp” can be detected based on the pitch of the sample audio data. Further, “fist” and “falset” can be detected based on the spectrum of the sample audio data. Further, “for” and “tsukkomi” can be detected based on the pitch of the sample audio data and the musical score data stored in the musical scoredata storage area 32d. Further, “breathing” can be detected based on the power of the sample sound data and the musical score data stored in the musical scoredata storage area 32d.

具体的な検出方法は以下のとおりである。
制御部３１は、サンプル音声データと楽譜音データとの対応関係と、サンプル音声データから算出されたピッチとに基づいて、サンプル音声データに含まれる音の開始時刻と当該音に対応する楽譜音データの音の開始時刻とが異なる区間を特定する。ここで、制御部３１は、サンプル音声データのピッチの変化タイミングが楽譜音データのピッチの変化タイミングよりも早く現れている区間、すなわちサンプル音声データに含まれる音の開始時刻が当該音に対応する楽譜音データの音の開始時刻よりも早い区間については、この区間を「つっこみ」の歌唱技法が用いられている区間であると特定する。制御部３１は、このようにして特定した区間の区間情報を、「つっこみ」を示す識別情報と関連付ける。A specific detection method is as follows.
Based on the correspondence between the sample sound data and the score sound data, and the pitch calculated from the sample sound data, thecontrol unit 31 starts the sound included in the sample sound data and the score sound data corresponding to the sound. The section where the start time of the sound is different is specified. Here, thecontrol unit 31 has a section in which the change timing of the pitch of the sample sound data appears earlier than the change timing of the pitch of the score sound data, that is, the start time of the sound included in the sample sound data corresponds to the sound. For a section earlier than the start time of the sound of the musical score data, this section is specified as a section in which the “Tsukumi” singing technique is used. Thecontrol unit 31 associates the section information of the section specified in this way with identification information indicating “push”.

逆に、制御部３１は、サンプル音声データと楽譜音データとの対応関係と、サンプル音声データから算出されたピッチとに基づいて、サンプル音声データのピッチの変化タイミングが楽譜音データのピッチの変化タイミングよりも遅れて現れている区間、すなわちサンプル音声データに含まれる音の開始時刻が当該音に対応する楽譜音データの音の開始時刻よりも遅い区間を検出し、検出した区間を「ため」の歌唱技法が用いられている区間であると特定する。 Conversely, thecontrol unit 31 determines that the change timing of the pitch of the sample sound data is the change of the pitch of the score sound data based on the correspondence between the sample sound data and the score sound data and the pitch calculated from the sample sound data. A section that appears later than the timing, that is, a section in which the start time of the sound included in the sample sound data is later than the start time of the sound of the musical score data corresponding to the sound is detected. It is specified that it is a section in which the singing technique is used.

また、制御部３１は、サンプル音声データから算出したピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を検出し、検出した区間を「ビブラート」の歌唱技法が用いられている区間であると特定する。 In addition, thecontrol unit 31 analyzes a pattern of temporal change of the pitch calculated from the sample audio data, and detects a section in which the pitch continuously fluctuates within a predetermined range above and below the central frequency. The detected section is identified as a section in which the “vibrato” singing technique is used.

また、制御部３１は、サンプル音声データから算出したピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的にピッチが変化する区間を検出し、検出した区間を「しゃくり」の歌唱技法が用いられている区間であると特定する。なお、この処理は、楽譜音データとの対応関係に基づいて行うようにしてもよい。すなわち、制御部３１は、サンプル音声データと楽譜音データとの対応関係に基づいて、サンプル音声データのピッチが、低いピッチから連続的に楽譜音データのピッチに近づいている区間を検出すればよい。 Further, thecontrol unit 31 analyzes the pattern of the temporal change in the pitch calculated from the sample audio data, detects a section where the pitch continuously changes from a low pitch to a high pitch, ”Is identified as the section in which the singing technique is used. This process may be performed based on the correspondence with the musical score data. That is, thecontrol unit 31 may detect a section in which the pitch of the sample sound data is continuously approaching the pitch of the score sound data from a low pitch based on the correspondence relationship between the sample sound data and the score sound data. .

また、制御部３１は、サンプル音声データと楽譜音データとの対応関係と、サンプル音声データから算出されたパワーとに基づいて、楽譜音データが有音である区間であってサンプル音声データのパワー値が所定の閾値よりも小さい区間を検出し、検出した箇所を「息継ぎ」の区間であると特定する。 Further, thecontrol unit 31 is a section in which the musical score sound data is sound based on the correspondence between the sample voice data and the musical score sound data and the power calculated from the sample voice data. A section whose value is smaller than a predetermined threshold is detected, and the detected part is specified as a section of “breathing”.

また、制御部３１は、サンプル音声データから算出されたスペクトルの時間的な変化パターンを解析して、スペクトル特性がその予め決められた変化状態に急激に遷移している区間を検出し、検出した区間を「ファルセット」の歌唱技法が用いられている区間であると特定する。ここで、予め決められた変化状態とは、スペクトル特性の高調波成分が極端に少なくなる状態である。例えば、地声の場合は沢山の高調波成分が含まれるが、ファルセットになると高調波成分の大きさが極端に小さくなる。なお、この場合、制御部３１は、ピッチが大幅に上方に変化したかどうかも参照してもよい。ファルセットは地声と同一のピッチを発生する場合でも用いられることもあるが、一般には地声では発声できない高音を発声するときに使われる技法だからである。したがって、サンプル音声データのピッチが所定音高以上の場合に限って「ファルセット」の検出をするように構成してもよい。また、男声と女声とでは一般にファルセットを用いる音高の領域が異なるので、サンプル音声データの音域や、サンプル音声データから検出されるフォルマントによって性別検出を行い、この結果を踏まえてファルセット検出の音高領域を設定してもよい。 In addition, thecontrol unit 31 analyzes the temporal change pattern of the spectrum calculated from the sample audio data, detects a section where the spectral characteristics are rapidly transitioning to the predetermined change state, and detects it. The section is identified as a section in which the “Falset” singing technique is used. Here, the predetermined change state is a state in which the harmonic component of the spectrum characteristic is extremely reduced. For example, in the case of a local voice, many harmonic components are included, but when a false set is used, the magnitude of the harmonic components becomes extremely small. In this case, thecontrol unit 31 may also refer to whether or not the pitch has changed significantly upward. The falset is sometimes used even when generating the same pitch as the local voice, but is generally a technique used when generating high-pitched sounds that cannot be generated by the local voice. Therefore, the “false set” may be detected only when the pitch of the sample audio data is equal to or higher than a predetermined pitch. In addition, since the pitch range using falset is generally different between male voice and female voice, gender detection is performed based on the range of sample audio data and formants detected from sample audio data, and the pitch of falset detection is based on this result. An area may be set.

また、制御部３１は、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を検出し、検出した部分を「こぶし」の歌唱技法が用いられている部分であると特定する。「こぶし」の場合は、短い区間において声色や発声方法を変えて唸るような味わいを付加する歌唱技法であるため、この技法が用いられている区間においてはスペクトル特性が多様に変化するからである。 In addition, thecontrol unit 31 detects a section in which the mode of change in spectral characteristics changes in a short time, and identifies the detected part as a part where the “fist” singing technique is used. In the case of “fist”, it is a singing technique that adds a taste that can be changed by changing the voice color and utterance method in a short section, so the spectral characteristics change variously in the section where this technique is used. .

以上のようにして、制御部３１は、音声データから各技法が用いられている区間を検出し、検出した区間を示す区間情報をその歌唱技法を示す種別情報と関連付ける。そして、制御部３１は、ステップＳ５にて算出したピッチ、スペクトル及びパワーと、ステップＳ６にて生成した区間情報及び種別情報とを含むサンプル音声特徴データを生成する（ステップＳ７）。次に、制御部２１は、生成したサンプル音声特徴データを、曲ＩＤ及び歌唱者ＩＤと共にサンプル音声データ記憶領域３２ａに記憶する（ステップＳ８）。
以上のような処理を経ることで、サーバ装置３のサンプル音声データ記憶領域３２ａには、サンプル音声データ、曲ＩＤ、歌唱者ＩＤ及びサンプル音声特徴データが互いに対応付けて記憶されることになる。As described above, thecontrol unit 31 detects the section in which each technique is used from the voice data, and associates the section information indicating the detected section with the type information indicating the singing technique. Then, thecontrol unit 31 generates sample voice feature data including the pitch, spectrum, and power calculated in step S5 and the section information and type information generated in step S6 (step S7). Next, thecontrol part 21 memorize | stores the produced | generated sample audio | voice characteristic data in the sample audio | voicedata storage area 32a with music ID and singer ID (step S8).
Through the above processing, sample audio data, song ID, singer ID, and sample audio feature data are stored in association with each other in the sample audiodata storage area 32a of theserver device 3.

［２−２．評価基準データの収集・蓄積］
次に、サンプル音声データに対する評価者の評価結果を収集し、これを評価基準データとしてサーバ装置３に蓄積する動作について説明する。
図７のシーケンス図において、評価者（聴取者）は、カラオケ装置２の操作部２４を操作し、曲ＩＤを指定して歌唱の評価を行うことを指示する。制御部２１は、指定された曲ＩＤを通信部２８からサーバ装置３に送信する（ステップＳ１１）。サーバ装置３の制御部３１は、曲ＩＤを受信すると、サンプル音声データ記憶領域３２ａにおいてその曲ＩＤに対応付けられて記憶されている全ての歌唱者ＩＤを読み出し、カラオケ装置２に送信する（ステップＳ１２）。カラオケ装置２の制御部２１は、受信した歌唱者ＩＤを図８に示すようにして一覧形式で表示部２３に表示させる（ステップＳ１３）。評価者は、このようにして表示された歌唱者ＩＤの中から、カラオケ装置２の操作部２４を操作して所望の歌唱者ＩＤを１つ指定し、「この歌唱者を評価する」というソフトボタンを選択する。制御部２１は、この操作を受け付けると、指定された歌唱者ＩＤをサーバ装置３に送信する（ステップＳ１４）。[2-2. Collection and accumulation of evaluation standard data]
Next, the operation of collecting the evaluation results of the evaluator for the sample voice data and storing it in theserver device 3 as evaluation reference data will be described.
In the sequence diagram of FIG. 7, the evaluator (listener) operates theoperation unit 24 of thekaraoke apparatus 2 and designates the song ID to instruct the singing evaluation. Thecontrol unit 21 transmits the designated song ID from thecommunication unit 28 to the server device 3 (step S11). When receiving the song ID, thecontrol unit 31 of theserver device 3 reads out all the singer IDs stored in the sample voicedata storage area 32a in association with the song ID, and transmits them to the karaoke device 2 (step). S12). Thecontrol unit 21 of thekaraoke apparatus 2 displays the received singer ID on thedisplay unit 23 in a list format as shown in FIG. 8 (step S13). The evaluator operates theoperation unit 24 of thekaraoke apparatus 2 from the singer IDs displayed in this manner, designates one desired singer ID, and the software “evaluates this singer”. Select a button. When accepting this operation, thecontrol unit 21 transmits the designated singer ID to the server device 3 (step S14).

サーバ装置３の制御部３１は、受信した歌唱者ＩＤに対応付けられたサンプル音声データをサンプル音声データ記憶領域３２ａから読み出し、カラオケ装置２に送信する（ステップＳ１５）。カラオケ装置２の制御部２１は、受信したサンプル音声データに基づき、歌唱者の音声を再生する（ステップＳ１６）。即ち、制御部２１は、サンプル音声データを音声処理部２６に供給し、音声処理部２６がそのサンプル音声データをアナログ信号に変換してスピーカ２７から放音させる。 Thecontrol part 31 of theserver apparatus 3 reads the sample audio | voice data matched with received singer ID from the sample audio | voicedata storage area 32a, and transmits to the karaoke apparatus 2 (step S15). Thecontrol unit 21 of thekaraoke apparatus 2 reproduces the singer's voice based on the received sample voice data (step S16). That is, thecontrol unit 21 supplies sample audio data to theaudio processing unit 26, and theaudio processing unit 26 converts the sample audio data into an analog signal and emits sound from thespeaker 27.

サンプル音声データの再生が終了すると、制御部２１は、前述した図４に示すような評価画面を表示部２３に表示させ、評価者に対して評価を行うよう促す（ステップＳ１７）。そこで、評価者は、Ａメロ、Ｂメロ、サビの各々に対し、音程、タイミング、音量、技巧、声質及び総合評価の各評価項目について、評価レベル５（良い）〜評価レベル１（悪い）のいずれかを選択すると共に、自身の性別や年齢といった属性を入力する。このとき、制御部２１が評価者ＩＤを生成するか又は評価者が自身の評価者ＩＤを操作部２４によって入力する。そして、評価者が「この内容で評価する」というソフトボタンを選択すると、制御部２１は、選択された各評価レベルと、入力された属性を表す属性データを、評価者ＩＤ及び評価日時と共に通信部２８からサーバ装置３に送信する（ステップＳ１８）。なお、このときの評価日時は、制御部２１が図示せぬ計時プログラムを実行することで得られる現在日時を用いればよい。 When the reproduction of the sample audio data is completed, thecontrol unit 21 displays the evaluation screen as shown in FIG. 4 on thedisplay unit 23 and prompts the evaluator to perform the evaluation (step S17). Therefore, the evaluator has an evaluation level of 5 (good) to 1 (bad) for each evaluation item of pitch, timing, volume, skill, voice quality, and comprehensive evaluation for each of A melody, B melody, and chorus. Along with selecting one, attributes such as his gender and age are input. At this time, thecontrol unit 21 generates an evaluator ID, or the evaluator inputs its own evaluator ID through theoperation unit 24. When the evaluator selects the soft button “evaluate with this content”, thecontrol unit 21 communicates the selected evaluation level and the attribute data representing the input attribute together with the evaluator ID and the evaluation date and time. It transmits to theserver apparatus 3 from the part 28 (step S18). In addition, what is necessary is just to use the present date obtained by thecontrol part 21 executing the time measuring program which is not shown in figure at this time.

サーバ装置３の制御部３１は、受信した評価レベル、属性データ、評価者ＩＤ及び評価日時を評価基準データ記憶領域３２ｂに記憶する（ステップＳ１９）。そして、制御部３１は、記憶した各評価レベルに基づいて、図５に例示した集計結果を更新する（ステップＳ２０）。
以上のように、評価者による評価が行われる度に、その評価結果がサーバ装置３に送信され、評価基準データとして記憶される。Thecontrol unit 31 of theserver device 3 stores the received evaluation level, attribute data, evaluator ID, and evaluation date / time in the evaluation referencedata storage area 32b (step S19). And thecontrol part 31 updates the total result illustrated in FIG. 5 based on each memorize | stored evaluation level (step S20).
As described above, every time an evaluation by an evaluator is performed, the evaluation result is transmitted to theserver device 3 and stored as evaluation reference data.

［２−３．歌唱の評価］
次に、評価基準データを用いて練習者の歌唱を評価する動作について説明する。
図９のシーケンス図において、練習者は、カラオケ装置２の操作部２４を操作して歌唱したい曲の曲ＩＤを選択し、カラオケ伴奏の再生を指示する。制御部２１は、この操作に応じてカラオケ伴奏を開始する（ステップＳ２１）。即ち、制御部２１は、伴奏・歌詞データ記憶領域２２ａから指定された曲ＩＤに対応する伴奏データを読み出して音声処理部２６に供給し、音声処理部２６がその伴奏データをアナログ信号に変換し、スピーカ２７から放音させる。同時に、制御部２１は、「伴奏に合わせて歌唱してください」というような歌唱を促すメッセージを表示部２３に表示させてから、伴奏・歌詞データ記憶領域２２ａから歌詞データを読み出して歌詞テロップを表示部２３に表示させる。練習者は、表示された歌詞テロップを参照しつつ、スピーカ２７から放音される伴奏に合わせて歌唱を行う。このとき、練習者の音声はマイクロフォン２５によって収音されて音声信号に変換され、音声処理部２６へと出力される。そして、音声処理部２６によってＡ／Ｄ変換された練習者音声データは、伴奏開始からの経過時間を表す情報と共に、記憶部２２の練習者音声データ記憶領域２２ｂに記憶（録音）されていく（ステップＳ２２）。[2-3. Singing evaluation]
Next, the operation | movement which evaluates a practitioner's song using evaluation criteria data is demonstrated.
In the sequence diagram of FIG. 9, the practitioner operates theoperation unit 24 of thekaraoke apparatus 2 to select a song ID of a song that the user wants to sing, and instructs playback of the karaoke accompaniment. Thecontrol unit 21 starts karaoke accompaniment in response to this operation (step S21). That is, thecontrol unit 21 reads the accompaniment data corresponding to the designated song ID from the accompaniment / lyricdata storage area 22a and supplies the accompaniment data to theaudio processing unit 26. Theaudio processing unit 26 converts the accompaniment data into an analog signal. The sound is emitted from thespeaker 27. At the same time, thecontrol unit 21 displays a message prompting singing such as “Please sing along with the accompaniment” on thedisplay unit 23, then reads out the lyric data from the accompaniment / lyricdata storage area 22 a and displays the lyrics telop. It is displayed on thedisplay unit 23. The practitioner sings along with the accompaniment emitted from thespeaker 27 while referring to the displayed lyrics telop. At this time, the practitioner's voice is picked up by themicrophone 25, converted into a voice signal, and output to thevoice processing unit 26. The trainer speech data A / D converted by thespeech processing unit 26 is stored (recorded) in the trainer speechdata storage area 22b of thestorage unit 22 together with information indicating the elapsed time from the start of accompaniment ( Step S22).

伴奏データの再生が終了すると、制御部２１は練習者の歌唱音声を録音する処理を終了する。そして、制御部２１は、練習者音声データ記憶領域２２ｂに記憶された練習者音声データを所定時間長のフレーム単位に分離し、それぞれのフレーム単位でピッチ、スペクトル及びパワーを算出する（ステップＳ２３）。次いで、制御部２１は、練習者音声データから技法を抽出する（ステップＳ２４）。即ち、練習者音声データから各種技法が用いられている区間を検出し、検出した区間を示す区間情報をその歌唱技法を示す種別情報と関連付ける。そして、制御部２１は、算出したピッチ、スペクトル及びパワーと、抽出した技法の区間情報及び種別情報を全て含む練習者音声特徴データを生成する（ステップＳ２５）。この後、制御部２１は、生成した練習者音声特徴データを曲ＩＤとともに通信部２８からサーバ装置３に送信する（ステップＳ２６）。 When the reproduction of the accompaniment data ends, thecontrol unit 21 ends the process of recording the practitioner's singing voice. And thecontrol part 21 isolate | separates the trainer audio | voice data memorize | stored in the trainer audio | voicedata storage area 22b into the frame unit of predetermined time length, and calculates a pitch, a spectrum, and power for each frame unit (step S23). . Next, thecontrol unit 21 extracts a technique from the trainee voice data (step S24). That is, a section in which various techniques are used is detected from the trainer voice data, and the section information indicating the detected section is associated with the type information indicating the singing technique. Then, thecontrol unit 21 generates practitioner voice feature data including all the calculated pitch, spectrum, and power, and the extracted section information and type information (step S25). Thereafter, thecontrol unit 21 transmits the generated trainer voice feature data together with the song ID from thecommunication unit 28 to the server device 3 (step S26).

サーバ装置３の制御部３１は、練習者音声特徴データ及び曲ＩＤを受信すると、受信した練習者音声特徴データと、サンプル音声データ記憶領域３２ａに上記曲ＩＤと対応付けて記憶されている全てのサンプル音声データとを比較し、これらサンプル音声特徴データの中から、練習者音声特徴データとの類似度が最も高いサンプル音声特徴データを選択する（ステップＳ２７）。より具体的には、制御部３１は、Ａメロ、Ｂメロ、サビの各々の歌唱部分について、練習者音声特徴データが表すピッチと、各サンプル音声特徴データが表すピッチとの差分を各々の歌唱部分の全域にわたって積分する。同様に、制御部３１は、練習者音声特徴データが表すパワーと、各サンプル音声特徴データが表すパワーとの差分をＡメロ、Ｂメロ、サビの歌唱部分の全域にわたって積分する。スペクトルやタイミングについても同様である。また、制御部３１は、技法についても上記と同様に、練習者音声特徴データが表す各種技法の区間情報によって表わされる区間と、各サンプル音声特徴データが表す各種技法の区間情報によって表される区間との差分を積分する。もちろん、同じ技法が使われてい区間同士を比較する。そして、制御部３１は、上記のようにして得られた積分値をサンプル音声特徴データ毎に累算し、その累算値が最も小さいサンプル音声特徴データを、練習者音声データとの類似度が最も高いものとして選択する。 When thecontrol unit 31 of theserver device 3 receives the trainer voice feature data and the song ID, the received trainer voice feature data and all the song IDs stored in the sample voicedata storage area 32a in association with the song ID are stored. The sample voice data is compared, and the sample voice feature data having the highest similarity with the trainer voice feature data is selected from the sample voice feature data (step S27). More specifically, for each singing part of A melody, B melody, and chorus, thecontrol unit 31 calculates the difference between the pitch represented by the trainer voice feature data and the pitch represented by each sample voice feature data for each song. Integrate over the entire part. Similarly, thecontrol unit 31 integrates the difference between the power represented by the trainer voice feature data and the power represented by each sample voice feature data over the entire A melody, B melody, and chorus singing portion. The same applies to the spectrum and timing. In addition, thecontrol unit 31 also uses the section represented by the section information of the various techniques represented by the trainer voice feature data and the section represented by the section information of the various techniques represented by the sample voice feature data in the same manner as described above. Integrate the difference between and. Of course, the same technique is used to compare sections. Then, thecontrol unit 31 accumulates the integral value obtained as described above for each sample voice feature data, and the sample voice feature data having the smallest accumulated value has a similarity to the trainer voice data. Choose as the highest.

次に、サーバ装置３の制御部３１は、評価基準データ記憶領域３２ｂから、選択したサンプル特徴データに対応する評価基準データ（即ち図５に例示したような集計結果）を読み出し（ステップＳ２８）、読み出した集計結果をカラオケ装置２に送信（出力）する（ステップＳ２９）。カラオケ装置２の制御部２１は、受信した集計結果を図１０に示すような形態で、表示部２３に表示させる（ステップＳ３０）。図１０に示す例では、前述した歌唱者ａ１が曲ｍ１を歌唱した際のＡメロに対する評価、つまり音程に対しては、合計２００人の評価者のうち、３４人の評価者が評価レベル１（悪い）と評価し、３６人の評価者が評価レベル２（やや悪い）と評価し、４５人の評価者が評価レベル３（どちらとも言えない）と評価し、５６人の評価者が評価レベル４（やや良い）と評価し、２９人の評価者が評価レベル５（良い）と評価している場合が示されている。なお、図１０にはＡメロの評価のみを示しているが、図中の「Ｂメロの評価」と表記されたソフトボタンを選択すれば、図１０と同様の表示形態でＢメロの歌唱に対する評価が表示されるし、「サビの評価」と表記されたソフトボタンを選択すれば、図１０と同様の表示形態でサビの歌唱にに対する評価が表示されるようになっている。
練習者は、このような評価結果を参照することで、自らの歌唱と似ている歌唱者に対する評価を、自らの歌唱に対する評価に置き換えて認識することができる。Next, thecontrol unit 31 of theserver device 3 reads the evaluation reference data corresponding to the selected sample feature data (that is, the counting result as illustrated in FIG. 5) from the evaluation referencedata storage area 32b (step S28). The read count result is transmitted (output) to the karaoke apparatus 2 (step S29). Thecontrol unit 21 of thekaraoke apparatus 2 displays the received count result on thedisplay unit 23 in a form as shown in FIG. 10 (step S30). In the example shown in FIG. 10, 34 evaluators out of a total of 200 evaluators have an evaluation level of 1 for the evaluation of A melody when the singer a1 sings the song m1 described above. (Evaluated as bad), 36 evaluators evaluated as evaluation level 2 (slightly bad), 45 evaluators evaluated as evaluation level 3 (neither), and 56 evaluators evaluated The case where it is evaluated as level 4 (slightly good) and 29 evaluators have evaluated as evaluation level 5 (good) is shown. FIG. 10 shows only the evaluation of A melody, but if the soft button labeled “B melody evaluation” in the figure is selected, the B melody song is displayed in the same display form as FIG. If the evaluation is displayed, and the soft button labeled “Rust Evaluation” is selected, the evaluation for the chorus singing is displayed in the same display form as FIG.
By referring to such an evaluation result, the practitioner can recognize and recognize the evaluation for the singer who is similar to his singing with the evaluation for his own singing.

図９に示したシーケンスは、評価者の属性や評価日時を指定しない場合の動作例を表したものであったが、以下に説明するように、練習者が評価者の属性や評価日時を指定することもできる。具体的には、図９のステップＳ２２の練習者音声の録音が終了した段階で、制御部２１は、図１１に示すような画面を表示する。この画面では練習者が評価者の属性を指定することができる。例えば評価者の性別を選択する欄においては、「男性」、「女性」、「性別の指定無し」といった選択肢が表示されるので、練習者はこれらのうちの所望の選択肢を指定すればよい。また、評価者の年齢を選択する欄では、「１０代」、「２０代」、「３０代」、「４０代」、「５０代」、「６０代以上」、「年齢指定無し」といった選択肢が表示されるので、練習者はこれらのうちの所望の選択肢を指定すればよい。 The sequence shown in FIG. 9 represents an operation example when the evaluator's attributes and the evaluation date / time are not specified, but as described below, the practitioner specifies the evaluator's attributes and the evaluation date / time. You can also Specifically, when the recording of the practitioner voice in step S22 of FIG. 9 is completed, thecontrol unit 21 displays a screen as shown in FIG. On this screen, the practitioner can specify the attributes of the evaluator. For example, in the column for selecting the gender of the evaluator, choices such as “male”, “female”, and “no gender designation” are displayed, and the practitioner may designate a desired option. In the column for selecting the age of the evaluator, options such as “10s”, “20s”, “30s”, “40s”, “50s”, “60s and over”, “no age designation” Is displayed, the practitioner has only to specify a desired option.

さらに、この画面では、練習者が評価日時の範囲を指定することもできる。これは、例えば２００６年における自らの歌唱を、その歌唱時点から３０年ほど過去に遡った１９８０年代ではどのように評価されるか、といったことを練習者が知りたいような場合である。主観的な評価は、その時々の時代背景や流行の推移とともに様々に変化するから、同じ歌唱音声であっても、或る年代では良い歌唱であると評価される一方、別の年代ではあまり評価されないといった事態が起こり得る。そこで、練習者が評価日時の範囲を自由に指定できるようにすれば、自らの歌唱がどの時代の評価基準にマッチしたものかを知ることができ、面白みが増す。図１１に示す例の場合、練習者は、評価日時を選択する欄において、「１９６０年代」、「１９７０年代」、「１９８０年代」、「１９９０年代」、「２０００年代」、「指定無し」といった選択肢から、所望する評価日時の範囲を指定すればよい。なお、この評価システム１が実施される以前の評価基準データは、評価システム１によって各評価者から収集することはできないので、例えば過去の各年代において人気を博した歌手の歌い方から、システム設計者が擬似的に過去の評価基準データを生成し、これを記憶部３２に記憶させておけばよい。 Furthermore, on this screen, the practitioner can also specify the range of evaluation dates. This is a case where the practitioner wants to know how his / her singing in 2006 is evaluated in the 1980s, which goes back about 30 years from the time of singing. Subjective evaluation changes variously with the changing background and trend of the time, so even the same singing voice is evaluated as a good singing in one era, but not so much in another era. It can happen that it is not done. Therefore, if the practitioner can freely specify the range of the evaluation date and time, it is possible to know which era's singing matches the evaluation standard, and the interest increases. In the case of the example shown in FIG. 11, the trainee selects “1960s”, “1970s”, “1980s”, “1990s”, “2000s”, “unspecified” in the column for selecting the evaluation date and time. A desired evaluation date range may be specified from the options. In addition, since evaluation standard data before thisevaluation system 1 is implemented cannot be collected from each evaluator by theevaluation system 1, for example, from the way of singing singer that has gained popularity in each past era, the system design A person may generate past evaluation reference data in a pseudo manner and store it in thestorage unit 32.

そして、練習者がこれらの選択肢を選択してから、「これで評価してもらう」というソフトボタンを選択すると、制御部２１は、図９のステップＳ２３，Ｓ２４，Ｓ２５の処理を経た後、ステップＳ２６において、練習者音声特徴データおよび曲ＩＤと共に、練習者によって選択された評価者の属性を表す属性データや評価日時をサーバ装置３に送信する。 Then, when the practitioner selects these options and then selects the soft button “Get this evaluation”, thecontrol unit 21 goes through steps S23, S24, and S25 in FIG. In S <b> 26, the attribute data indicating the attribute of the evaluator selected by the practitioner and the evaluation date and time are transmitted to theserver device 3 together with the practitioner voice feature data and the song ID.

サーバ装置３の制御部３１は、上記の各種データを受信すると、図９のステップＳ２７の処理を経て、練習者音声データとの類似度が最も高いサンプル音声特徴データを選択する。そして、制御部３１は、ステップＳ２８において、選択したサンプル特徴データと対応する評価基準データのうち、上記属性データによって示される属性条件に合致し、且つ、指定された評価日時の範囲（年代）に属する評価日時に対応する評価基準データを評価基準データ記憶領域３２ｂから読み出し、それらを集計する。そして、制御部３１は、ステップＳ２９において、その集計結果をカラオケ装置２に送信（出力）する。カラオケ装置２の制御部２１は、ステップＳ３０において、受信した集計結果を表示部２３に表示させる。 When thecontrol unit 31 of theserver device 3 receives the above-described various data, it selects the sample voice feature data having the highest similarity with the trainer voice data through the process of step S27 in FIG. Then, in step S28, thecontrol unit 31 matches the attribute condition indicated by the attribute data among the evaluation reference data corresponding to the selected sample feature data, and falls within the specified evaluation date range (age). The evaluation reference data corresponding to the evaluation date and time to which it belongs is read from the evaluation referencedata storage area 32b and totalized. And thecontrol part 31 transmits the output result to thekaraoke apparatus 2 in step S29 (output). In step S30, thecontrol unit 21 of thekaraoke apparatus 2 causes thedisplay unit 23 to display the received count result.

以上説明したように本実施形態によれば、様々な歌唱音声（サンプル音声データ）に対して評価者が主観的に評価した結果を予め記憶しておき、練習者の歌唱音声（練習者音声データ）に類似する歌唱音声（サンプル音声データ）に対する評価を、その練習者の歌唱音声に対する評価として出力する。これにより、評価者の主観的評価を取り入れた評価手法を実現することができる。 As described above, according to the present embodiment, the results of subjective evaluation by the evaluator with respect to various singing voices (sample voice data) are stored in advance, and the singing voice of the practitioner (practice voice data). ) Is output as an evaluation for the singing voice of the practitioner. Thereby, the evaluation method incorporating the subjective evaluation of the evaluator can be realized.

［３．変形例］
上述した実施形態を次のように変形してもよい。
［３−１］上述した実施形態においては、練習者の歌唱を評価する場合を例に挙げて説明したが、これに限らず、練習者の楽器演奏を評価するようにしてもよい。この場合、上述したサンプル音声データに代えてサンプルとなる演奏音データが用いられ、練習者音声データに代えて練習者の演奏音を表す演奏音データが用いられる。また、伴奏・歌詞データ記憶領域２２ａには、練習したい楽器（例えばギター）以外の楽器（例えばベースやドラム）の演奏データが記憶されるし、楽譜音データ記憶領域２２ｃ、３２ｄには、楽譜に演奏音として規定された楽譜音データが記憶される。サーバ装置３の制御部３１は、これらのデータに基づき、上記と同様の処理を経て練習者の演奏を評価する。[3. Modified example]
The above-described embodiment may be modified as follows.
[3-1] In the above-described embodiment, the case where a practitioner's song is evaluated has been described as an example. However, the present invention is not limited thereto, and the practitioner's musical performance may be evaluated. In this case, performance sound data as a sample is used instead of the sample sound data described above, and performance sound data representing the performance sound of the practitioner is used instead of the trainer sound data. The accompaniment / lyricdata storage area 22a stores performance data of musical instruments (for example, bass and drums) other than the musical instrument (for example, guitar) to be practiced, and the musical scoredata storage areas 22c and 32d store musical scores. Musical score data specified as performance sound is stored. Based on these data, thecontrol unit 31 of theserver device 3 evaluates the performance of the practitioner through the same processing as described above.

［３−２］上述した実施形態においては、練習者音声特徴データと最も類似度が高いサンプル音声特徴データを１つ選択するようになっていたが、選択するサンプル音声特徴データの数は１に限定されるものではなく、類似度が高い順から複数のサンプル音声特徴データを選択し、選択したそれぞれのサンプル音声特徴データに対応する評価基準データ（集計結果）を出力するようにしてもよい。また、実施形態では、練習者音声特徴データと、各サンプル音声特徴データとの差分の積分値を用いて類似度を判定していたが、例えば、多次元空間上で、練習者音声特徴データの座標と各サンプル音声特徴データの座標とのユークリッド距離を算出し、そのユークリッド距離が最小となるサンプル音声特徴データを、最も類似度が高いものとして選択するようにしてもよい。[3-2] In the embodiment described above, one sample voice feature data having the highest similarity to the trainer voice feature data is selected. However, the number of sample voice feature data to be selected is one. The present invention is not limited, and a plurality of sample voice feature data may be selected in descending order of similarity, and evaluation reference data (aggregation results) corresponding to each selected sample voice feature data may be output. Further, in the embodiment, the similarity is determined using the integrated value of the difference between the trainer speech feature data and each sample speech feature data. For example, in the multidimensional space, the trainer speech feature data The Euclidean distance between the coordinates and the coordinates of each sample voice feature data may be calculated, and the sample voice feature data with the smallest Euclidean distance may be selected as having the highest similarity.

［３−３］上述した実施形態においては、サンプル音声特徴データや練習者音声特徴データとして、音声のピッチ、タイミング、パワー、技法及びスペクトルの全てを用いたが、これらの少なくともいずれかを用いるだけでもよいし、さらにこれら以外の特徴要素を用いても良い。また、これらのうち、どの特徴要素を用いるかを練習者が操作部２４を用いて選択できるようにしてもよい。同様に、各種の技法のうちいずれかを練習者が選択できるようにしてもよい。[3-3] In the above-described embodiment, all of the pitch, timing, power, technique, and spectrum of the voice are used as the sample voice feature data and the practitioner voice feature data, but only at least one of them is used. However, other characteristic elements may be used. Also, it may be possible for the practitioner to select which of these feature elements to use using theoperation unit 24. Similarly, the practitioner may select any one of various techniques.

［３−４］また、上述した実施形態においては、練習者音声特徴データは、カラオケ装置２の制御部２１が生成するようになっていたが、これに代えて、サーバ装置３の制御部３１が生成するようにしてもよい。また、カラオケ装置２の制御部２１が練習者音声特徴データの入力を促し、練習者が予め用意しておいた練習者音声特徴データを入力するようにしてもよい。この場合、例えば、制御部２１が、練習者音声特徴データの入力を促す画面を表示部２３に表示させ、練習者は、例えばＵＳＢ（Universal Serial Bus）等のインタフェースを介してカラオケ装置２に練習者音声特徴データを入力するようにすればよい。この場合、事前にパーソナルコンピュータ等の装置で練習者音声特徴データを生成するようにしておけばよい。この際も、上述した実施形態と同様に、パーソナルコンピュータが、マイクロフォンで練習者の音声を収音して、収音した音声を分析して練習者音声特徴データを生成する。また、カラオケ装置２にＲＦＩＤリーダを設けて、練習者音声特徴データが書き込まれたＲＦＩＤをカラオケ装置２のＲＦＩＤリーダが読み取るようにしてもよい。[3-4] In the above-described embodiment, the trainer voice feature data is generated by thecontrol unit 21 of thekaraoke device 2. Instead, thecontrol unit 31 of theserver device 3 is used. May be generated. Moreover, thecontrol part 21 of thekaraoke apparatus 2 may prompt the input of the practitioner voice feature data, and the practitioner voice feature data prepared in advance by the practitioner may be input. In this case, for example, thecontrol unit 21 causes thedisplay unit 23 to display a screen that prompts the trainer to input voice feature data, and the practitioner practices thekaraoke apparatus 2 via an interface such as a USB (Universal Serial Bus). The person voice feature data may be input. In this case, the trainer voice feature data may be generated in advance by a device such as a personal computer. At this time, as in the above-described embodiment, the personal computer collects the voice of the practitioner with the microphone and analyzes the collected voice to generate the practitioner voice feature data. Moreover, an RFID reader may be provided in thekaraoke device 2 so that the RFID in which the trainer voice feature data is written is read by the RFID reader of thekaraoke device 2.

［３−５］評価基準データ（集計結果）を出力する形態は、表示に限らず、音声メッセージを出力するような形態であってもよい。また、集計結果を表すメッセージを電子メール形式で練習者のメール端末に送信するという形態であってもよい。また、メッセージを記憶媒体に出力して記憶させるようにしてもよく、この場合、練習者はコンピュータを用いてこの記憶媒体から評価結果を読み出させることで、それを参照することができる。要は、練習者に対して何らかの手段で評価結果を伝えられる（出力する）ものであればよい。[3-5] The form for outputting the evaluation reference data (aggregation result) is not limited to the display, but may be a form for outputting a voice message. Moreover, the form which transmits the message showing a total result to a practitioner's mail terminal in an email format may be sufficient. In addition, the message may be output and stored in a storage medium. In this case, the practitioner can refer to the evaluation result by reading the evaluation result from the storage medium using a computer. The point is that the evaluation result can be transmitted (output) to the practitioner by some means.

［３−６］実施形態において練習者音声データを記憶する際には歌詞を表示し、さらに伴奏データを再生しながら練習者に歌唱させる、所謂カラオケ歌唱を行うようにしていたが、これは必ずしも必要ではない。つまり、練習者が歌詞の表示や伴奏データの再生が無いままで歌唱し、それを録音して評価するようにしてもよい。歌唱能力が相当に高い練習者であっても、歌詞の表示や伴奏が無い状態でうまく歌唱することは容易ではないから、練習者の歌唱能力をより厳密に評価することが可能となる。[3-6] In the embodiment, when the trainee voice data is stored, the lyrics are displayed, and the so-called karaoke singing is performed so that the trainee sings while reproducing the accompaniment data. Not necessary. That is, the practitioner may sing without displaying the lyrics or reproducing the accompaniment data, and record and evaluate it. Even a practitioner with a considerably high singing ability cannot sing well without displaying lyrics or accompaniment, so that the singing ability of the practitioner can be more strictly evaluated.

［３−７］実施形態では、それぞれの評価者による評価をすべて均等に取り扱っていたが、これらの評価に重み付けを行ってもよい。
例えば、各々の評価者の評価能力を判定し、その評価能力に応じて評価基準データに重み付けを行う。具体的には、サーバ装置３の制御部３１は、事前に評価者に対して評価能力の試験を行い、その評価能力を複数レベル（例えば０．１，０．２，０．３，・・・０．９，１．０の計１０レベル）に区分する。そして、そのレベル値を評価者ＩＤに対応付けて記憶部３２に記憶しておく。そして、制御部３１は、評価者によって練習者音声が評価され、その評価結果に基づいて人数を集計する場合、それぞれの上記レベル値を乗算して集計する。例えば、評価レベルが９レベル（レベル値が０．９）の場合、その評価者の評価を０．９人分として取り扱うのである。
また、各評価者の評価能力を事前に判定しておく手間が煩雑であれば、練習者が「評価者」となって他人の歌唱を評価する場合に限って、その練習者の歌唱能力のレベルを評価能力のレベルとしてもよい。つまり、歌唱が巧い人は他人の歌唱に対する評価のレベルも高いであろう、という考え方である。この場合、サーバ装置３の制御部３１は、図５に示したような集計結果を参照して、評価レベル１の集計結果（人数）に「１」を乗算し、評価レベル２の集計結果に「２」を乗算し、評価レベル３の集計結果に「３」を乗算し、評価レベル４の集計結果に「４」を乗算し、評価レベル５の集計結果に「５」を乗算して、これら全ての乗算結果を加算する。そして、制御部３１は、この加算結果を、評価レベル１〜５の全ての評価者の人数（図５では２００人）に「５」を乗算した値で除する。このときの値Ｒ（０≦Ｒ≦１）を評価能力のレベル値とし、そのレベル値を評価者（即ち練習者）に割り当てられた評価者ＩＤに対応付けて記憶部３２に記憶しておく。練習者（即ち評価者）によって他人の練習者音声が評価され、その評価結果に基づいて人数を集計する場合には、制御部３１は、上記レベル値を乗算して集計する。つまり前述した手法と同様に、例えば評価レベルのレベル値が０．９の場合には、その評価者（練習者）の評価を０．９人分として取り扱う。[3-7] In the embodiment, all evaluations by the respective evaluators are handled equally. However, these evaluations may be weighted.
For example, the evaluation ability of each evaluator is determined, and the evaluation reference data is weighted according to the evaluation ability. Specifically, thecontrol unit 31 of theserver device 3 tests the evaluator in advance for the evaluation ability, and evaluates the evaluation ability at a plurality of levels (for example, 0.1, 0.2, 0.3,...).・ A total of 10 levels (0.9 and 1.0). Then, the level value is stored in thestorage unit 32 in association with the evaluator ID. Then, when the evaluator evaluates the practitioner's voice and counts the number of persons based on the evaluation result, thecontrol unit 31 multiplies each level value and totals. For example, when the evaluation level is 9 (the level value is 0.9), the evaluation of the evaluator is handled as 0.9.
Also, if it is troublesome to determine the evaluation ability of each evaluator in advance, only if the practitioner becomes an "evaluator" and evaluates another person's singing, The level may be the level of evaluation ability. That is, it is an idea that a person who is skillful in singing will have a high level of evaluation of others singing. In this case, thecontrol unit 31 of theserver device 3 refers to the tabulation result as shown in FIG. 5 and multiplies the tabulation result (number of people) at theevaluation level 1 by “1” to obtain the tabulation result at theevaluation level 2. Multiply by “2”, multiply the result ofevaluation level 3 by “3”, multiply the result ofevaluation level 4 by “4”, multiply the result ofevaluation level 5 by “5”, All these multiplication results are added. Then, thecontrol unit 31 divides the addition result by a value obtained by multiplying the number of all the evaluators of theevaluation levels 1 to 5 (200 in FIG. 5) by “5”. The value R (0 ≦ R ≦ 1) at this time is set as the level value of the evaluation ability, and the level value is stored in thestorage unit 32 in association with the evaluator ID assigned to the evaluator (that is, the practitioner). . When the practitioner's voice (ie, the evaluator) evaluates the other person's practitioner's voice and counts the number of persons based on the evaluation result, thecontrol unit 31 multiplies the level values and totals. That is, similarly to the above-described method, for example, when the level value of the evaluation level is 0.9, the evaluation of the evaluator (practitioner) is handled as 0.9.

また、上記の変形例［３−２］で述べたように、類似度が高い順から複数のサンプル音声特徴データを選択する場合、選択された複数のサンプル音声特徴データに対応付けられた複数の評価基準データに対して、上記の類似度に応じた重み付けを施してもよい。例えば類似度が高い順から３つのサンプル音声特徴データを選択する場合、制御部３１は、最も類似しているサンプル音声特徴データに対応付けられた評価基準データ（集計結果）を０．５倍し、その次に類似しているサンプル音声特徴データに対応付けられた評価基準データ（集計結果）を０．３倍し、さらにその次に類似しているサンプル音声特徴データに対応付けられた評価基準データ（集計結果）を０．２倍して、これらの合算値を評価結果として出力する。 In addition, as described in the modification [3-2] above, when selecting a plurality of sample sound feature data in descending order of similarity, a plurality of sample sound feature data associated with the selected plurality of sample sound feature data is selected. The evaluation reference data may be weighted according to the similarity. For example, when selecting three sample voice feature data in descending order of similarity, thecontrol unit 31 multiplies the evaluation reference data (aggregation result) associated with the most similar sample voice feature data by 0.5. Then, the evaluation reference data (aggregation result) associated with the next similar sample voice feature data is multiplied by 0.3, and the evaluation reference associated with the next similar sample voice feature data. Data (aggregation result) is multiplied by 0.2, and the sum of these values is output as the evaluation result.

［３−８］実施形態において、サーバ装置３はサンプル音声データからサンプル音声特徴データを抽出して事前に記憶しておいたが、そうではなくて、サーバ装置３がサンプル音声データだけを記憶しておき、評価を行う必要がある度にサンプル音声データからサンプル音声特徴データを抽出するようにしてもよい。サンプル音声データにはそのデータ自身の特徴が含まれているのだから、この変形例の場合であっても、サーバ装置３はサンプル音声データの特徴を記憶していると言える。つまり、上記のいずれの場合であっても、サーバ装置３は「それぞれ異なる複数の歌唱音声の特徴を表す複数の特徴データ」を記憶している。なお、模範音声データや練習者音声データはＷＡＶＥ形式やＭＰ３形式のデータとしたが、データの形式はこれに限定されるものではなく、音声を示すデータであればどのような形式のデータであってもよい。[3-8] In the embodiment, theserver device 3 extracts the sample voice feature data from the sample voice data and stores it in advance. However, theserver device 3 stores only the sample voice data instead. In addition, sample voice feature data may be extracted from the sample voice data every time evaluation is required. Since the sample audio data includes the characteristics of the data itself, it can be said that theserver apparatus 3 stores the characteristics of the sample audio data even in this modification. That is, in any case described above, theserver device 3 stores “a plurality of feature data representing features of a plurality of different singing voices”. The model voice data and the practice person voice data are data in the WAVE format or the MP3 format. However, the data format is not limited to this, and any format may be used as long as the data indicates voice. May be.

［３−９］上述した実施形態では、カラオケ装置２とサーバ装置３とが通信ネットワークで接続された評価システム１が、本実施形態に係る機能の全てを実現するようになっている。これに対し、通信ネットワークで接続された３以上の装置が上記機能を分担するようにし、それら複数の装置を備えるシステムが同実施形態のシステムを実現するようにしてもよい。または、ひとつの装置が上記機能のすべてを実現するようにしてもよい。[3-9] In the above-described embodiment, theevaluation system 1 in which thekaraoke device 2 and theserver device 3 are connected via a communication network realizes all the functions according to the present embodiment. On the other hand, three or more devices connected via a communication network may share the above functions, and a system including the plurality of devices may realize the system of the embodiment. Alternatively, one device may realize all of the above functions.

［３−１０］本発明においては評価者による多数の評価結果が必要である。そこで、評価者が練習者の歌唱を評価する行為に対して何らかの特典を与え、評価行為を促進するように工夫することが望ましい。具体的には、サーバ装置３の記憶部３２に、評価者に対して特典として与えられる金額乃至ポイントと、その評価者の評価者ＩＤとを対応付けて記憶しておき、制御部３１が、図７のステップＳ１９において、評価基準データと評価者ＩＤとを記憶部３２に記憶させた場合、その評価者ＩＤに対応付けられて記憶されている金額乃至ポイントを増加させて更新する。この金額乃至ポイントは、例えば評価者がカラオケ装置２を用いてカラオケ歌唱を楽しむ場合に、その利用料金の支払いに充てることができる。[3-10] In the present invention, many evaluation results by the evaluator are required. Therefore, it is desirable to devise such that the evaluator gives some privilege to the act of evaluating the practitioner's singing and promotes the evaluation act. Specifically, thestorage unit 32 of theserver device 3 stores the amount or points given as a privilege to the evaluator in association with the evaluator ID of the evaluator, and thecontrol unit 31 In step S19 of FIG. 7, when the evaluation reference data and the evaluator ID are stored in thestorage unit 32, the amount or points stored in association with the evaluator ID is increased and updated. For example, when the evaluator enjoys karaoke singing using thekaraoke device 2, the amount or points can be used to pay the usage fee.

［３−１１］上述した実施形態におけるカラオケ装置２の制御部２１またはサーバ装置３の制御部３１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＣＤ（Compact Disk）−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、ＲＡＭなどの記録媒体に記憶した状態で提供し得る。また、インターネットのようなネットワーク経由でカラオケ装置２またはサーバ装置３にダウンロードさせることも可能である。[3-11] A program executed by thecontrol unit 21 of thekaraoke apparatus 2 or thecontrol unit 31 of theserver apparatus 3 in the above-described embodiment is a magnetic tape, a magnetic disk, a flexible disk, an optical recording medium, a magneto-optical recording medium, It can be provided in a state where it is stored in a recording medium such as a CD (Compact Disk) -ROM, a DVD (Digital Versatile Disk), or a RAM. It is also possible to download to thekaraoke apparatus 2 or theserver apparatus 3 via a network such as the Internet.

システム全体の構成を示すブロック図である。It is a block diagram which shows the structure of the whole system.カラオケ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a karaoke apparatus.サーバ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a server apparatus.カラオケ装置によって表示される評価画面の一例を示す図である。It is a figure which shows an example of the evaluation screen displayed by a karaoke apparatus.サーバ装置によって記憶される評価基準データの一例を示す図である。It is a figure which shows an example of the evaluation reference | standard data memorize | stored by the server apparatus.練習者のサンプル音声データをサーバ装置３に登録する動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement which registers the practice person's sample audio | voice data into the server apparatus.サンプル音声データに対する評価者の評価結果を収集し、評価基準データとしてサーバ装置３に蓄積する動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement which collects the evaluation result of the evaluator with respect to sample audio | voice data, and accumulate | stores in theserver apparatus 3 as evaluation reference data.カラオケ装置によって表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed by a karaoke apparatus.評価基準データを用いて練習者の歌唱を評価する動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement which evaluates a learner's song using evaluation criteria data.カラオケ装置によって表示される評価画面の一例を示す図である。It is a figure which shows an example of the evaluation screen displayed by a karaoke apparatus.カラオケ装置によって表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed by a karaoke apparatus.

符号の説明Explanation of symbols

１…評価システム、２ａ，２ｂ，２ｃ…カラオケ装置、３…サーバ装置、４…ネットワーク、２１…制御部、２２…記憶部、２３……表示部、２４…操作部、２５…マイクロフォン、２６…音声処理部、２７…スピーカ、２８…通信部、３１…制御部、３２…記憶部、３３…通信部。DESCRIPTION OFSYMBOLS 1 ... Evaluation system, 2a, 2b, 2c ... Karaoke apparatus, 3 ... Server apparatus, 4 ... Network, 21 ... Control part, 22 ... Memory | storage part, 23 ... Display part, 24 ... Operation part, 25 ... Microphone, 26 ...Audio processing unit 27 ...speaker 28 ...communication unit 31 ...control unit 32 ...storage unit 33 ... communication unit

Claims

Translated fromJapanese

それぞれ異なる複数の歌唱音声又は演奏音の特徴を表す複数の特徴データと、各々の前記歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データとを対応付けて記憶する記憶手段と、
練習者の歌唱音声又は演奏音の特徴を表す特徴データを取得する取得手段と、
前記記憶手段によって記憶されている特徴データから、前記取得手段によって取得された特徴データに類似する、１以上の特徴データを選択する選択手段と、
前記選択手段によって選択された特徴データに対応付けられた評価基準データを前記記憶手段から読み出し、前記練習者の歌唱音声又は演奏音に対する評価結果として出力する出力手段と
を備えることを特徴とする評価装置。Storage means for storing a plurality of characteristic data representing characteristics of a plurality of different singing voices or performance sounds, and evaluation reference data representing a listener's evaluation for each of the singing voices or performance sounds, in association with each other;
Acquisition means for acquiring characteristic data representing characteristics of the singing voice or performance sound of the practitioner;
Selection means for selecting one or more feature data similar to the feature data acquired by the acquisition means from the feature data stored by the storage means;
An evaluation unit comprising: an output unit that reads out evaluation reference data associated with the feature data selected by the selection unit from the storage unit and outputs the evaluation result as an evaluation result for the singing voice or performance sound of the practitioner. apparatus.

前記出力手段は、前記評価基準データを前記練習者の歌唱音声又は演奏音に対する評価結果として前記練習者に報知することを特徴とする請求項１記載の評価装置。 2. The evaluation apparatus according to claim 1, wherein the output means notifies the practitioner of the evaluation reference data as an evaluation result for the singing voice or performance sound of the practitioner.

前記歌唱音声又は演奏音を表す複数の音声データを記憶する練習者音声データ記憶手段と、
複数の音声再生装置とネットワークを介してデータ通信を行う通信手段と、
前記練習者音声データ記憶手段によって記憶された音声データを前記通信手段によって前記音声再生装置に配信する配信手段と、
前記音声再生装置によって再生された歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データを前記音声再生装置から取得する評価基準データ取得手段と、
前記歌唱音声又は演奏音の特徴を表す特徴データと、前記評価基準データ取得手段によって取得された前記評価基準データとを対応付けて前記記憶手段に記憶させる登録手段と
を備えることを特徴とする請求項１記載の評価装置。Trainer voice data storage means for storing a plurality of voice data representing the singing voice or performance sound;
A communication means for performing data communication with a plurality of audio reproduction devices via a network;
Distribution means for distributing the voice data stored by the practitioner voice data storage means to the voice reproduction device by the communication means;
Evaluation reference data acquisition means for acquiring evaluation reference data representing a listener's evaluation on a singing voice or performance sound reproduced by the voice reproduction device;
The registration means which matches the characteristic data showing the characteristic of the said singing voice or performance sound, and the said evaluation standard data acquired by the said evaluation standard data acquisition means, and memorize | stores in the said memory | storage means. Item 1. The evaluation apparatus according to Item 1.

前記評価基準データ取得手段は、前記評価基準データとともに、前記評価基準データが生成された日時である評価日時を取得し、
前記登録手段は、前記特徴データ及び前記評価基準データに対応付けて、前記評価日時を前記記憶手段に記憶させ、
前記選択手段は、指定された日時の範囲に属する評価日時に対応付けられて前記記憶手段に記憶されている特徴データの中から、前記取得手段によって取得された特徴データに類似する、１以上の特徴データを選択する
ことを特徴とする請求項３記載の評価装置。The evaluation criterion data acquisition means acquires an evaluation date and time that is the date and time when the evaluation criterion data was generated together with the evaluation criterion data,
The registration unit stores the evaluation date and time in the storage unit in association with the feature data and the evaluation reference data,
The selection unit is similar to the feature data acquired by the acquisition unit from among the feature data stored in the storage unit in association with the evaluation date and time belonging to the specified date and time range. The evaluation apparatus according to claim 3, wherein characteristic data is selected.

前記評価基準データ取得手段は、前記評価基準データとともに、聴取者の属性を示す属性データを取得し、
前記登録手段は、前記特徴データ及び前記評価基準データに対応付けて、前記属性データを前記記憶手段に記憶させ、
前記選択手段は、指定された属性条件を満たす属性データに対応付けられて前記記憶手段に記憶されている特徴データの中から、前記取得手段によって取得された特徴データに類似する、１以上の特徴データを選択する
ことを特徴とする請求項３記載の評価装置。The evaluation criterion data acquisition means acquires attribute data indicating an attribute of a listener together with the evaluation criterion data,
The registration unit stores the attribute data in the storage unit in association with the feature data and the evaluation reference data,
The selection means is one or more features similar to the feature data acquired by the acquisition means from the feature data stored in the storage means in association with attribute data satisfying the specified attribute condition. 4. The evaluation apparatus according to claim 3, wherein data is selected.

前記選択手段は、前記記憶手段に記憶されている特徴データの中から、前記取得手段によって取得された特徴データとの類似度が高い順から複数の特徴データを選択し、
前記出力手段は、前記選択手段によって選択された複数の特徴データに対応付けられた評価基準データを前記記憶手段から読み出し、これらの複数の評価基準データに対して、前記類似度に応じた重み付けを施して前記評価結果として出力する
ことを特徴とする請求項１記載の評価装置。The selection unit selects a plurality of feature data from the feature data stored in the storage unit in descending order of similarity to the feature data acquired by the acquisition unit,
The output means reads evaluation reference data associated with a plurality of feature data selected by the selection means from the storage means, and weights the plurality of evaluation reference data according to the similarity. The evaluation apparatus according to claim 1, wherein the evaluation apparatus outputs the evaluation result.

前記聴取者の評価能力を判定する判定手段を備え、
前記記憶手段には、前記判定手段によって判定された評価能力に応じて重み付けされた評価基準データが記憶されている
ことを特徴とする請求項１記載の評価装置。A determination means for determining the evaluation ability of the listener;
The evaluation apparatus according to claim 1, wherein the storage unit stores evaluation reference data weighted according to the evaluation ability determined by the determination unit.

聴取者に対して与えられる金額乃至ポイントと、当該聴取者に割り当てられた聴取者識別情報とを対応付けて記憶する金額記憶手段と
前記音声に対する評価を表す評価基準データを、聴取者に割り当てられた聴取者識別情報と共に取得し、取得した前記評価基準データ及び聴取者識別情報を、前記音声の特徴を表す特徴データに対応付けて前記記憶手段に記憶させる蓄積手段と、
前記蓄積手段が評価基準データ及び前記聴取者識別情報を前記記憶手段に記憶した場合、記憶した聴取者識別情報に対応付けられて前記金額記憶手段に記憶されている金額乃至ポイントを増加させて更新する更新手段と
を備えることを特徴とする請求項１に記載の評価装置。Amount storage means for storing the amount or points given to the listener in association with the listener identification information assigned to the listener, and evaluation reference data representing the evaluation of the sound can be assigned to the listener. Storage means for acquiring the evaluation reference data and the listener identification information acquired together with the listener identification information and storing the acquired evaluation reference data and the listener identification information in the storage means in association with feature data representing the characteristics of the sound;
When the storage means stores the evaluation standard data and the listener identification information in the storage means, the amount or points stored in the amount storage means are updated in association with the stored listener identification information. The update device according to claim 1, further comprising:

前記特徴データは、前記歌唱音声又は演奏音のピッチ、タイミング、スペクトル、パワー、及び、歌唱又は演奏に用いられる技法の種類と区間を示す技法データのうち、少なくともいずれか一つを示すデータであることを特徴とする請求項１に記載の評価装置。 The feature data is data indicating at least one of the pitch, timing, spectrum, power of the singing voice or performance sound, and technique data indicating the type and section of the technique used for singing or performance. The evaluation apparatus according to claim 1, wherein:

それぞれ異なる複数の歌唱音声又は演奏音の特徴を表す複数の特徴データと、各々の前記歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データとを対応付けて記憶する記憶手段と、制御手段とを備えた評価装置の制御方法であって、
前記制御手段が、練習者の歌唱音声又は演奏音の特徴を表す特徴データを取得する第１のステップと、
前記制御手段が、前記記憶手段によって記憶されている特徴データから、前記第１のステップにおいて取得された特徴データに類似する、１以上の特徴データを選択する第２のステップと、
前記制御手段が、前記第２のステップにおいて選択された特徴データに対応付けられている評価基準データを前記記憶手段から読み出し、前記練習者の歌唱音声又は演奏音に対する評価結果として出力する第３のステップと
を備えることを特徴とする制御方法。Storage means for storing a plurality of characteristic data representing characteristics of a plurality of different singing voices or performance sounds, and evaluation reference data representing an evaluation of a listener for each of the singing voices or performance sounds, and control means A method for controlling an evaluation apparatus comprising:
A first step in which the control means acquires characteristic data representing characteristics of the practitioner's singing voice or performance sound;
A second step in which the control means selects one or more feature data similar to the feature data acquired in the first step from the feature data stored in the storage means;
The control means reads out the evaluation standard data associated with the feature data selected in the second step from the storage means, and outputs the evaluation result as an evaluation result for the singing voice or performance sound of the practitioner. And a control method comprising the steps of:

それぞれ異なる複数の歌唱音声又は演奏音の特徴を表す複数の特徴データと、各々の前記歌唱音声又は演奏音に対する聴取者の評価を表す評価基準データとを対応付けて記憶する記憶手段を備えたコンピュータに、
練習者の歌唱音声又は演奏音の特徴を表す特徴データを取得する取得機能と、
前記記憶手段によって記憶されている特徴データから、前記取得機能によって取得された特徴データに類似する、１以上の特徴データを選択する選択機能と、
前記選択機能によって選択された特徴データに対応付けられている評価基準データを前記記憶手段から読み出し、前記練習者の歌唱音声又は演奏音に対する評価結果として出力する出力機能と
を実現させるプログラム。A computer comprising storage means for storing a plurality of characteristic data representing characteristics of a plurality of different singing voices or performance sounds and evaluation reference data representing a listener's evaluation of each of the singing voices or performance sounds in association with each other. In addition,
An acquisition function for acquiring feature data representing the features of the practitioner's singing voice or performance sound;
A selection function for selecting one or more feature data similar to the feature data acquired by the acquisition function from the feature data stored by the storage unit;
A program that realizes an output function that reads evaluation reference data associated with feature data selected by the selection function from the storage unit and outputs the evaluation result as an evaluation result for the singing voice or performance sound of the practitioner.