JP2008312002A

Movatterモバイル変換

Info

Publication number: JP2008312002A
Application number: JP2007158776A
Authority: JP
Inventors: Toshiaki Ishibashi; 利晃石橋; Makoto Tanaka; 田中　　良
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-06-15
Filing date: 2007-06-15
Publication date: 2008-12-25

Abstract

PROBLEM TO BE SOLVED: To provide a television conference apparatus which suppresses sound pickup level changes depending on a speaker position and in which a speaker, a microphone and a camera are provided adjacently in the vicinity of a monitor. SOLUTION: A level adjusting circuit 19 is provided in poststage of an adaptive filter 18. A controller 14 receives selection instruction data from a signal selector 17, and sets a level adjustment amount corresponding to a sound pickup beam signal indicated by the selection instruction data in the level adjusting circuit 19. Since each sound pickup beam has a different distance between a sound pickup region and a television conference apparatus depending on a beam direction, the amplification amount of the sound pickup beam signal is increased for a beam whose distance between the sound pickup region and the television conference apparatus is long, while it is decreased for a beam whose distance between the sound pickup region and the television conference apparatus is short. COPYRIGHT: (C)2009,JPO&INPIT

Description

Translated fromJapanese

この発明は、モニタ付近にスピーカ、マイク、およびカメラを近接して設置したテレビ会議装置に関する。 The present invention relates to a video conference apparatus in which a speaker, a microphone, and a camera are installed in the vicinity of a monitor.

近年、遠隔地において通信会議を行う通信会議装置が普及している。通信会議装置は、マイクで収音した音声を相手側に送信し、相手側から音声を受信する。また、最近では映像データを送受信するテレビ会議装置が普及している（例えば特許文献１参照）。特許文献１の装置では、会議室全体の撮影映像と、発言者をズームアップした撮影映像と、を切り換えて送信することができる。 In recent years, communication conference apparatuses that perform communication conferences at remote locations have become widespread. The communication conference device transmits the sound collected by the microphone to the other party and receives the voice from the other party. Recently, video conference devices that transmit and receive video data have become widespread (see, for example, Patent Document 1). With the apparatus ofPatent Document 1, it is possible to switch and transmit a captured image of the entire conference room and a captured image in which the speaker is zoomed up.

テレビ会議では、各会議参加者は相手の映像が映し出されているモニタの方向を見ながら会話することが自然である。したがって、スピーカ、およびカメラをモニタ付近に設置することが一般的である。
特開平２−２０２２７５号公報In a video conference, it is natural for each conference participant to talk while looking at the direction of the monitor on which the other party's video is projected. Therefore, it is common to install a speaker and a camera near the monitor.
JP-A-2-202275

しかし、特許文献１の装置では、話者の位置を特定するために、各話者の位置にマイクを設置していた。この場合、話者の人数分のマイクを設置しなければならず、コストがかかり、汎用性に乏しいものであった。
一方、指向性マイクをモニタ付近に設置することも考えられるが、会議参加者は、装置前面に設置された会議机を囲むようにして在席することが一般的であり、各話者の位置により距離が変化する（例えば装置正面方向は遠くなる）ため、話者毎の収音レベルが大きく変化する。However, in the apparatus ofPatent Document 1, a microphone is installed at each speaker position in order to identify the speaker position. In this case, it is necessary to install microphones for the number of speakers, which is expensive and lacks versatility.
On the other hand, a directional microphone may be installed near the monitor, but it is common for conference participants to sit around the conference desk installed on the front of the device, and the distance depends on the location of each speaker. (For example, the front direction of the apparatus becomes far) changes, so that the sound collection level for each speaker changes greatly.

この発明は、話者の位置による収音レベルの変化を抑えたテレビ会議装置であって、モニタ付近にスピーカ、マイク、およびカメラを近接して設置したテレビ会議装置を提供することを目的とする。 It is an object of the present invention to provide a video conference apparatus that suppresses a change in the sound pickup level due to the position of a speaker, and in which a speaker, a microphone, and a camera are installed in the vicinity of a monitor. .

この発明のテレビ会議装置は、映像を撮影するカメラ、音声を放音する放音部、および音声を収音する収音部を近接する位置に備えたテレビ会議装置であって、前記収音部は、複数のマイクを配列してなるマイクアレイと、複数の方向に対して収音ビームを形成するとともに、各収音ビーム強度を比較することで話者方位を同定し、話者方位に対応する収音ビームを選択し、この選択した収音ビームを収音信号として出力する収音制御部と、からなり、外部から入力された入力信号を信号処理し、前記放音部に入力する入力信号処理部と、前記収音信号のレベルを減衰、または増幅するレベル調整回路と、前記レベル調整回路の調整量を前記話者方位毎に記憶する設定テーブルと、前記収音制御部から同定した話者方位を入力し、前記設定テーブルを読み出して該話者方位に対応する調整量を前記レベル調整回路に設定する制御部と、を備えたことを特徴とする。 The video conference apparatus according to the present invention is a video conference apparatus including a camera that shoots video, a sound emitting unit that emits sound, and a sound collecting unit that collects sound at close positions, the sound collecting unit The speaker array is identified by comparing the collected sound beam intensities with a microphone array formed by arranging multiple microphones, and by comparing the collected beam intensities. A sound collection control unit that selects a sound collection beam to be output and outputs the selected sound collection beam as a sound collection signal, and performs input processing on an input signal input from the outside and inputs the signal to the sound emission unit Identified from a signal processing unit, a level adjustment circuit that attenuates or amplifies the level of the collected sound signal, a setting table that stores the adjustment amount of the level adjustment circuit for each speaker orientation, and the sound collection control unit Enter the speaker direction, and And a control unit that sets an adjustment amount corresponding to 該話's azimuth reads Bull to the level adjusting circuit, and further comprising a.

この構成では、複数の方向に収音ビームを形成し、これらの収音ビーム強度から話者方位を同定する。制御部は、同定した話者方位に対応するレベル調整量を設定テーブルから読み出し、レベル調整回路に設定する。例えば、テレビ会議装置の正面方向に対応する話者方位であれば、音声信号を増幅する調整量とし、テレビ会議装置の側面方向に対応する話者方位であれば、音声信号を減衰（またはそのまま出力）する調整量とする。これは、会議参加者がテレビ会議装置の前面の会議机を囲むようにして在席することが一般的であり、例えば装置正面方向は装置と会議参加者との距離が遠くなるためである。 In this configuration, sound collecting beams are formed in a plurality of directions, and the speaker orientation is identified from these sound collecting beam intensities. The control unit reads the level adjustment amount corresponding to the identified speaker orientation from the setting table and sets it in the level adjustment circuit. For example, if the speaker orientation corresponds to the front direction of the video conference device, the adjustment amount is set to amplify the audio signal. If the speaker orientation corresponds to the side direction of the video conference device, the audio signal is attenuated (or left as it is). Output). This is because a conference participant is usually present so as to surround a conference desk in front of the video conference device. For example, the distance between the device and the conference participant is increased in the front direction of the device.

この発明は、さらに、前記レベル調整回路は、前記収音信号を減衰、または増幅するコンプレッサと、前記コンプレッサのゲインを調整する入力ゲイン調整部と、からなり、前記制御部は、前記レベル調整回路の調整量として前記コンプレッサの特性を設定するとともに、前記収音信号のレベルを検出し、その収音信号のレベルに基づいて、前記入力ゲイン調整部のゲイン調整量を設定することを特徴とする。 In the present invention, the level adjustment circuit further includes a compressor that attenuates or amplifies the collected sound signal, and an input gain adjustment unit that adjusts a gain of the compressor, and the control unit includes the level adjustment circuit. The characteristic of the compressor is set as an adjustment amount, and the level of the sound pickup signal is detected, and the gain adjustment amount of the input gain adjustment unit is set based on the level of the sound pickup signal. .

この構成では、コンプレッサにより収音信号をコントロールする。また、コンプレッサの前段にゲイン調整部を設置する。制御部は、収音信号のレベルを検出し、このレベルに基づいてゲイン調整量を設定する。例えば、収音信号のレベルが低すぎる場合にはゲインを大きくし、収音信号のレベルが高すぎる場合にはゲインを大きくする。これにより、コンプレッサには適正なレベルの入力がなされる。 In this configuration, the collected sound signal is controlled by the compressor. In addition, a gain adjuster is installed in front of the compressor. The control unit detects the level of the collected sound signal and sets the gain adjustment amount based on this level. For example, the gain is increased when the level of the collected sound signal is too low, and the gain is increased when the level of the collected sound signal is too high. As a result, an appropriate level is input to the compressor.

この発明は、さらに、前記制御部は、前記収音制御部が収音ビームを選択したときにその収音ビームのレベルを検出して時間平均値を算出し、各収音ビームの時間平均値が所定のレベルを超えた場合、または各収音ビームの時間平均値を比較して特定の収音ビームの時間平均値が所定値以上大きい場合、その収音ビームの話者方位に対応する調整量を下げる設定とすることを特徴とする。 In the present invention, the control unit further detects a level of the sound collecting beam when the sound collecting control unit selects the sound collecting beam, calculates a time average value, and calculates a time average value of each sound collecting beam. If the sound level exceeds a predetermined level, or if the time average value of a specific sound collection beam is greater than the predetermined value by comparing the time average value of each sound collection beam, the adjustment corresponding to the speaker orientation of that sound collection beam The amount is set to be lowered.

この構成では、各収音ビーム信号が選択されたときのレベルを検出し、そのレベルの時間平均値を算出する。この時間平均値が所定のレベルを超えた場合、または各収音ビームの時間平均値を比較し、特定の方向の収音ビームの時間平均値が他の収音ビームよりも高く（例えば１．５倍程度）なった場合、ゲイン調整回路の調整量を下げる設定とする。これにより、一部発話音声だけが大きくなることを防止する。 In this configuration, the level when each sound collecting beam signal is selected is detected, and the time average value of the level is calculated. When this time average value exceeds a predetermined level, or the time average values of the sound collecting beams are compared, the time average value of the sound collecting beams in a specific direction is higher than the other sound collecting beams (for example, 1.. When it is about 5 times), the adjustment amount of the gain adjustment circuit is set to be lowered. As a result, it is possible to prevent only a part of the uttered voice from being increased.

この発明は、さらに、前記入力信号を適応型フィルタで処理した擬似エコー信号を、前記収音部が出力した収音信号から減算し、当該減算した後の収音信号を前記レベル調整回路に出力する適応型エコーキャンセラを備えたことを特徴とする。 The present invention further subtracts a pseudo echo signal obtained by processing the input signal with an adaptive filter from the sound collection signal output by the sound collection unit, and outputs the sound collection signal after the subtraction to the level adjustment circuit. An adaptive echo canceller is provided.

この構成では、ゲイン調整回路の前段に適応型エコーキャンセラを備える。これにより、エコー成分を低減する。 In this configuration, an adaptive echo canceller is provided upstream of the gain adjustment circuit. Thereby, an echo component is reduced.

この発明によれば、話者方位に応じてレベル調整量を設定するので、装置と話者との距離の違いによる収音レベルの変化を抑えることができる。また、モニタ付近にスピーカ、マイク、およびカメラを近接して設置したため、各話者にマイクを設置する必要がない。 According to the present invention, since the level adjustment amount is set according to the speaker orientation, it is possible to suppress the change in the sound collection level due to the difference in the distance between the apparatus and the speaker. Further, since a speaker, a microphone, and a camera are installed close to each other in the vicinity of the monitor, it is not necessary to install a microphone for each speaker.

図面を参照して、本発明の実施形態に係るテレビ会議装置について説明する。
図１は、テレビ会議装置の外観図であり、図２は、テレビ会議装置の構成を示すブロック図である。テレビ会議装置は、スピーカＳＰ１〜ＳＰ８、マイクＭ１〜Ｍ１２、およびカメラ１１を備えており、これらが近接して一体型の筐体としてモニタ２の上に設置されている。A video conference apparatus according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is an external view of a video conference apparatus, and FIG. 2 is a block diagram showing a configuration of the video conference apparatus. The video conference apparatus includes speakers SP1 to SP8, microphones M1 to M12, and acamera 11, which are installed on the monitor 2 in close proximity as an integral casing.

スピーカＳＰ１〜ＳＰ８は、直線状に配列されてスピーカアレイを構成する。マイクＭ１〜Ｍ１２も直線状に配列されてマイクアレイを構成する。なお、本実施形態では、スピーカの個数を８個、マイクの個数を１２個とする例を示すが、配列個数はこの例に限定するものではない。また、スピーカ、マイクの配列間隔は等間隔でなくともよい。 The speakers SP1 to SP8 are arranged in a straight line to constitute a speaker array. The microphones M1 to M12 are also arranged linearly to constitute a microphone array. In the present embodiment, an example is shown in which the number of speakers is eight and the number of microphones is twelve, but the number of arrangements is not limited to this example. Moreover, the arrangement intervals of the speakers and microphones do not have to be equal.

図２に示すように、テレビ会議装置は、上記スピーカＳＰ１〜ＳＰ８、マイクＭ１〜Ｍ１２、およびカメラ１１に加え、入出力Ｉ／Ｆ１２、画像データ処理部１３、制御部１４、Ａ／Ｄ変換部１５、収音ビーム生成部１６、信号選択部１７、エコーキャンセラ１８、レベル調整回路１９、放音制御部２０、およびＤ／Ａ変換部２１を備えている。 As shown in FIG. 2, in addition to the speakers SP1 to SP8, the microphones M1 to M12, and thecamera 11, the video conference apparatus includes an input / output I /F 12, an imagedata processing unit 13, acontrol unit 14, and an A / D conversion unit. 15, a sound collectionbeam generation unit 16, asignal selection unit 17, anecho canceller 18, alevel adjustment circuit 19, a soundemission control unit 20, and a D /A conversion unit 21.

制御部１４は、例えばマイコン等により構成され、カメラ１１、収音ビーム生成部１６、信号選択部１７、レベル調整回路１９、および放音制御部２０に接続されており、テレビ会議装置を統括的に制御する。例えばリモコン（図示せず）から入力されるユーザの操作に応じて、カメラ１１の撮影範囲を設定したり、収音レベル、放音レベル等をコントロールする。また、後述するレベル調整回路１９のレベル調整量を設定する。 Thecontrol unit 14 includes, for example, a microcomputer and is connected to thecamera 11, the sound collectionbeam generation unit 16, thesignal selection unit 17, thelevel adjustment circuit 19, and the soundemission control unit 20, and controls the video conference apparatus. To control. For example, in accordance with a user operation input from a remote controller (not shown), the shooting range of thecamera 11 is set, and the sound collection level, sound output level, and the like are controlled. Further, a level adjustment amount of alevel adjustment circuit 19 described later is set.

入出力Ｉ／Ｆ１２は、ネットワーク端子、オーディオ端子、ビデオ端子に接続されている。入出力Ｉ／Ｆ１２は、これらの端子を介して相手先テレビ会議装置と音声、および映像を送受信する。ネットワーク端子を介して送受信する場合、ネットワーク通信データ形式からなる音声、および映像の各データを受信する。受信した映像データは画像データ処理部１３に出力される。受信した音声データは、デジタル音声信号に変換されてエコーキャンセラ１８、および放音制御部２０に出力される。 The input / output I /F 12 is connected to a network terminal, an audio terminal, and a video terminal. The input / output I /F 12 transmits / receives audio and video to / from the other party video conference apparatus via these terminals. When transmitting and receiving via a network terminal, audio and video data in the network communication data format are received. The received video data is output to the imagedata processing unit 13. The received audio data is converted into a digital audio signal and output to theecho canceller 18 and the soundemission control unit 20.

また、入出力Ｉ／Ｆ１２は、画像データ処理部１３から入力される映像データをネットワーク通信データ形式で相手先テレビ会議装置に送信し、レベル調整回路１９から入力されるデジタル音声信号をネットワーク通信データ形式で相手先テレビ会議装置に送信する。 The input / output I /F 12 transmits the video data input from the imagedata processing unit 13 to the other party video conference apparatus in the network communication data format, and the digital audio signal input from thelevel adjustment circuit 19 is transmitted to the network communication data. Send it in a format to the other party's video conference device.

カメラ１１は、自装置の前に居る会議者が含まれる範囲を撮像して、映像信号を画像データ処理部１３に出力する。カメラ１１がパン、チルト、ズーム機能を搭載している場合、撮影範囲は制御部１４によって設定される。その他、撮影設定（コントラスト等）も制御部１４によって設定される。 Thecamera 11 images a range including a conference person in front of its own device, and outputs a video signal to the imagedata processing unit 13. When thecamera 11 is equipped with pan, tilt, and zoom functions, the shooting range is set by thecontrol unit 14. In addition, shooting settings (such as contrast) are also set by thecontrol unit 14.

画像データ処理部１３は、カメラ１１から入力された映像信号を映像データ（圧縮データ）に変換し、これを入出力Ｉ／Ｆ１２に出力する。また、入出力Ｉ／Ｆ１２から入力された映像データをエンコードして、映像信号としてモニタ２に出力する。 The imagedata processing unit 13 converts the video signal input from thecamera 11 into video data (compressed data), and outputs this to the input / output I /F 12. The video data input from the input / output I /F 12 is encoded and output to the monitor 2 as a video signal.

マイクアレイの各マイクＭ１〜Ｍ１２は、自装置の前に居る会議者（話者）の発声音を収音して収音音声信号を生成する。
Ａ／Ｄ変換部１５は、各マイクＭ１〜Ｍ１２にそれぞれ対応して収音アンプ１５１、Ａ／Ｄ変換器１５２を備えている。収音アンプ１５１は、収音音声信号を増幅し、Ａ／Ｄ変換器１５２は、増幅された収音音声信号をデジタル音声信号に変換して、収音ビーム生成部１６に出力する。Each of the microphones M1 to M12 of the microphone array collects a voice of a conference person (speaker) in front of its own device and generates a collected voice signal.
The A /D converter 15 includes asound collection amplifier 151 and an A /D converter 152 corresponding to each of the microphones M1 to M12. Thesound collecting amplifier 151 amplifies the sound collecting sound signal, and the A /D converter 152 converts the amplified sound collecting sound signal into a digital sound signal and outputs the digital sound signal to the sound collectingbeam generating unit 16.

収音ビーム生成部１６は、Ａ／Ｄ変換部１５から入力された各デジタル音声信号に対して所定の遅延処理を行った後合成し、特定の方向から到来する音声を強調した信号である収音ビーム信号ＭＢ１〜ＭＢ８を生成する。収音ビーム信号ＭＢ１〜ＭＢ８は、図３に示すように、マイクＭ１〜Ｍ１２が設置された長尺面側で当該長尺面に沿ってそれぞれに異なる方向の音声を収音したものである。図３では、音声会議装置の前面に設置された長方形状の会議机を囲むように各話者が存在するため、最も傾いた方向の収音ビームの収音領域である収音ビーム領域ＭＢ１１，ＭＢ１８が音声会議装置に最も近い収音領域となる。一方、音声会議装置前面の中央方向の収音ビームの収音ビーム領域である収音ビーム領域ＭＢ１４，ＭＢ１５が音声会議装置に最も遠い収音ビーム領域となる。なお、収音ビームの数、領域の位置はこの例に限るものではない。制御部１４が各デジタル音声信号の遅延量をコントロールすることで、収音ビーム領域を変更することができる。 The sound collectionbeam generation unit 16 performs a predetermined delay process on each digital audio signal input from the A /D conversion unit 15 and then synthesizes the collected signal, and is a signal that is a signal in which sound coming from a specific direction is emphasized. Sound beam signals MB1 to MB8 are generated. As shown in FIG. 3, the sound collecting beam signals MB1 to MB8 are obtained by collecting sounds in different directions along the long surface on the long surface side where the microphones M1 to M12 are installed. In FIG. 3, since each speaker exists so as to surround a rectangular conference desk installed in front of the audio conference apparatus, a sound collection beam region MB11, which is a sound collection region of the sound collection beam in the most inclined direction, MB18 is the sound collection area closest to the audio conference apparatus. On the other hand, the sound collection beam regions MB14 and MB15, which are the sound collection beam regions of the sound collection beam in the center direction on the front of the voice conference device, are the sound collection beam regions farthest from the voice conference device. The number of sound collecting beams and the position of the area are not limited to this example. Thecontrol unit 14 can change the sound collection beam region by controlling the delay amount of each digital audio signal.

信号選択部１７は、収音ビーム信号ＭＢ１〜ＭＢ８のうち最もレベルの高い信号を選択し、その収音ビーム信号をメイン収音ビーム信号ＭＳとしてエコーキャンセラ１８に出力する。また、選択した収音ビーム信号を制御部１４に通知する。 Thesignal selection unit 17 selects a signal having the highest level among the collected sound beam signals MB1 to MB8 and outputs the collected sound beam signal to theecho canceller 18 as a main collected beam signal MS. Further, thecontrol unit 14 is notified of the selected sound collection beam signal.

図４は、信号選択部１７の主要構成を示すブロック図である。
信号選択部１７は、ＢＰＦ（バンドパスフィルタ）１７１、全波整流回路１７２、ピーク検出回路１７３、レベル比較器１７４、および信号選択回路１７５を備えている。FIG. 4 is a block diagram illustrating a main configuration of thesignal selection unit 17.
Thesignal selection unit 17 includes a BPF (band pass filter) 171, a fullwave rectification circuit 172, apeak detection circuit 173, alevel comparator 174, and asignal selection circuit 175.

ＢＰＦ１７１は、人の音声の主成分帯域を通過帯域とするバンドパスフィルタであり、収音ビーム信号ＭＢ１〜ＭＢ８を帯域通過フィルタ処理して、全波整流回路１７２に出力する。全波整流回路１７２は、収音ビーム信号ＭＢ１〜ＭＢ８を全波整流（絶対値化）し、ピーク検出回路１７３は、全波整流された収音ビーム信号ＭＢ１〜ＭＢ８のピーク検出を行い、ピーク値データＰｓ１〜Ｐｓ８を出力する。レベル比較器１７４は、ピーク値データＰｓ１〜Ｐｓ８を比較して、最も高いレベルのピーク値データＰｓに対応する収音ビーム信号を選択する選択指示データを信号選択回路１７５に与える。また、レベル比較器１７４は、最も高いレベルのピーク値データＰｓに対応する収音ビーム信号を選択する選択指示データを制御部１４にも与える。信号選択回路１７５は、選択指示データが示す収音ビーム信号を選択し、メイン収音ビーム信号ＭＳとしてエコーキャンセラ１８に出力する。
これは、発話者が存在する収音領域に対応する収音ビーム信号の信号レベルが他の領域に対応する収音ビーム信号の信号レベルよりも高いことを利用している。TheBPF 171 is a band-pass filter having a passband that is a main component band of human speech, and performs band-pass filter processing on the collected sound beam signals MB1 to MB8 and outputs them to the full-wave rectifier circuit 172. The full-wave rectifier circuit 172 performs full-wave rectification (absolute value) on the collected sound beam signals MB1 to MB8, and thepeak detection circuit 173 performs peak detection on the collected sound beam signals MB1 to MB8. The value data Ps1 to Ps8 are output. Thelevel comparator 174 compares the peak value data Ps 1 to Ps 8 and gives selection instruction data for selecting the sound collection beam signal corresponding to the peak value data Ps of the highest level to thesignal selection circuit 175. Further, thelevel comparator 174 also provides thecontrol unit 14 with selection instruction data for selecting the collected sound beam signal corresponding to the peak value data Ps of the highest level. Thesignal selection circuit 175 selects the sound collection beam signal indicated by the selection instruction data, and outputs it to theecho canceller 18 as the main sound collection beam signal MS.
This utilizes the fact that the signal level of the sound collecting beam signal corresponding to the sound collecting region where the speaker is present is higher than the signal level of the sound collecting beam signal corresponding to the other region.

制御部１４は、レベル比較器１７４から入力した選択指示データに基づいて、カメラ１１の撮影設定を変更する。例えば、選択された収音ビーム信号の対応する領域の映像を撮影するように、カメラ１１のパン、チルト、ズームを設定する。また、制御部１４は、選択指示データに基づいて、レベル調整回路１９のレベル調整量を設定する。 Thecontrol unit 14 changes the shooting setting of thecamera 11 based on the selection instruction data input from thelevel comparator 174. For example, the pan, tilt, and zoom of thecamera 11 are set so as to capture an image of a region corresponding to the selected sound pickup beam signal. Further, thecontrol unit 14 sets the level adjustment amount of thelevel adjustment circuit 19 based on the selection instruction data.

エコーキャンセラ１８は、適応型フィルタ１８１とポストプロセッサ１８２とを備えている。適応型フィルタ１８１は、入力音声信号に基づいて、スピーカアレイからマイクアレイに回り込む回帰音声信号を擬似した擬似回帰音信号を生成する。ポストプロセッサ１８２は、信号選択部１７から出力されるメイン収音ビーム信号ＭＳから擬似回帰音信号を減算して、出力音声信号ＭＳｓとしてレベル調整回路１９に出力する。これによりエコー成分を消去する。また、出力音声信号ＭＳｓは適応型フィルタ１８１に入力され、適応型フィルタ１８１は、入力された出力音声信号に基づいてエコー成分を消去するようにフィルタ係数を更新する。 Theecho canceller 18 includes anadaptive filter 181 and apost processor 182. Theadaptive filter 181 generates a pseudo regression sound signal that simulates a regression voice signal that circulates from the speaker array to the microphone array based on the input voice signal. Thepost processor 182 subtracts the pseudo regression sound signal from the main sound collection beam signal MS output from thesignal selection unit 17 and outputs the result to thelevel adjustment circuit 19 as an output audio signal MSs. This eliminates the echo component. The output audio signal MSs is input to theadaptive filter 181. Theadaptive filter 181 updates the filter coefficient so as to cancel the echo component based on the input output audio signal.

図５は、レベル調整回路１９の構成を示すブロック図である。レベル調整回路１９は、入力ゲインコントローラ１９１、コンプレッサ１９２、および出力ゲインコントローラ１９３を備えている。入力ゲインコントローラ１９１は、レベル調整回路１９に入力された音声信号のゲインを調整する。 FIG. 5 is a block diagram showing the configuration of thelevel adjustment circuit 19. Thelevel adjustment circuit 19 includes aninput gain controller 191, acompressor 192, and anoutput gain controller 193. Theinput gain controller 191 adjusts the gain of the audio signal input to thelevel adjustment circuit 19.

コンプレッサ１９２は、入力ゲインコントローラ１９１から入力された音声信号のレベルを検出し、レベル調整を行う。具体的には、図６に示すように、入力音声信号に対して予め定められた比で増幅、または減衰を行い、出力を行うものである。入力に対する出力の比（特性）は制御部１４により設定される。 Thecompressor 192 detects the level of the audio signal input from theinput gain controller 191 and performs level adjustment. Specifically, as shown in FIG. 6, the input audio signal is amplified or attenuated at a predetermined ratio and output. The ratio of output to input (characteristic) is set by thecontrol unit 14.

図６は、入力音声信号と出力音声信号のレベル比を表した図である。同図に示すグラフの横軸は入力信号（ＩＮ）のレベルを示し、縦軸は出力信号（ＯＵＴ）のレベルを示す。同図においては入力音声信号に対する出力音声信号のレベル比の設定（特性）を３種類示している。特性１は、入力音声信号に対する出力音声信号のレベル比が常に１対１である。すなわち、入力音声信号を増幅も減衰もせずに出力する設定である。 FIG. 6 is a diagram showing the level ratio between the input audio signal and the output audio signal. The horizontal axis of the graph shown in the figure represents the level of the input signal (IN), and the vertical axis represents the level of the output signal (OUT). The figure shows three types of setting (characteristics) of the level ratio of the output audio signal to the input audio signal. Incharacteristic 1, the level ratio of the output audio signal to the input audio signal is always 1: 1. That is, the input audio signal is set to be output without being amplified or attenuated.

特性２は、入力音声信号が無音（−∞ｄＢ）〜所定のレベルａ１の間は入力音声信号のレベル増加量よりも出力音声信号のレベル増加量が大きくなるように設定されている。入力音声信号のレベルがａ１であるとき、出力音声信号のレベルはｂ２（ａ１＜ｂ２）となる。入力音声信号が所定のレベルａ１（ｄＢ）〜ａ２（ｄＢ）の間は入力音声信号のレベル増加量に対する出力音声信号のレベル増加量が１対１となる。つまり、特性１のレベル比と並行する。入力音声信号のレベルがａ２であるとき、出力音声信号のレベルはｂ４（ａ２＜ｂ４）となる。入力音声信号が所定のレベルａ２〜最大（０ｄＢ）の間は入力音声信号のレベル増加量よりも出力音声信号のレベル増加量が小さくなるように設定されている。入力音声信号のレベルが０ｄＢであるとき、出力音声信号のレベルも０ｄＢになるように設定される。 The characteristic 2 is set so that the level increase amount of the output sound signal is larger than the level increase amount of the input sound signal when the input sound signal is silent (−∞ dB) to a predetermined level a1. When the level of the input audio signal is a1, the level of the output audio signal is b2 (a1 <b2). When the input audio signal is between the predetermined levels a1 (dB) to a2 (dB), the level increase of the output audio signal is 1: 1 with respect to the level increase of the input audio signal. That is, it is parallel to the level ratio ofcharacteristic 1. When the level of the input audio signal is a2, the level of the output audio signal is b4 (a2 <b4). While the input audio signal is between the predetermined level a2 and the maximum (0 dB), the level increase amount of the output audio signal is set smaller than the level increase amount of the input audio signal. When the input audio signal level is 0 dB, the output audio signal level is also set to 0 dB.

特性３は、特性２と同じ傾向であるが、信号増幅量が小さく抑えられている。入力音声信号が無音（−∞ｄＢ）〜所定のレベルａ１の間は入力音声信号のレベル増加量よりも出力音声信号のレベル増加量が大きくなるように設定されている。入力音声信号のレベルがａ１であるとき、出力音声信号のレベルはｂ１（ａ１＜ｂ１＜ｂ２）となる。入力音声信号が所定のレベルａ１（ｄＢ）〜ａ２（ｄＢ）の間は入力音声信号のレベル増加量に対する出力音声信号のレベル増加量が１対１となる。つまり、特性１、特性２のレベル比と並行する。入力音声信号のレベルがａ２であるとき、出力音声信号のレベルはｂ３（ａ２＜ｂ３＜ｂ４）となる。入力音声信号が所定のレベルａ２〜最大（０ｄＢ）の間は入力音声信号のレベル増加量よりも出力音声信号のレベル増加量が小さくなるように設定されている。入力音声信号のレベルが０ｄＢであるとき、出力音声信号のレベルも０ｄＢになるように設定される。 The characteristic 3 has the same tendency as the characteristic 2, but the signal amplification amount is kept small. When the input audio signal is silent (−∞ dB) to a predetermined level a1, the level increase amount of the output audio signal is set larger than the level increase amount of the input audio signal. When the level of the input audio signal is a1, the level of the output audio signal is b1 (a1 <b1 <b2). When the input audio signal is between the predetermined levels a1 (dB) to a2 (dB), the level increase of the output audio signal is 1: 1 with respect to the level increase of the input audio signal. That is, it is parallel to the level ratio of characteristic 1 and characteristic 2. When the level of the input audio signal is a2, the level of the output audio signal is b3 (a2 <b3 <b4). While the input audio signal is between the predetermined level a2 and the maximum (0 dB), the level increase amount of the output audio signal is set smaller than the level increase amount of the input audio signal. When the input audio signal level is 0 dB, the output audio signal level is also set to 0 dB.

制御部１４は、これらの特性を示すテーブルを内蔵メモリ（図示せず）に記録しており、これらの特性のいずれかをコンプレッサ１９２に設定する。制御部１４は、レベル比較器１７４から入力した選択指示データに基づいて、コンプレッサ１９２に設定すべき特性を以下のようにして決定する。すなわち、図３において音声会議装置から最も近い収音ビーム領域であるＭＢ１１，ＭＢ１８に対応する収音ビーム信号が選択指示データに含まれているとき、上記特性１を選択する。音声会議装置から最も遠い中央の収音ビーム領域であるＭＢ１４，ＭＢ１５に対応する収音ビーム信号が選択指示データに含まれているとき、上記特性２を選択する。また、音声会議装置からの距離がその中間に位置する収音ビーム領域であるＭＢ１２，ＭＢ１３，ＭＢ１６，ＭＢ１７に対応する収音ビーム信号が選択指示データに含まれているとき、上記特性３を選択する。 Thecontrol unit 14 records a table indicating these characteristics in a built-in memory (not shown), and sets any of these characteristics in thecompressor 192. Based on the selection instruction data input from thelevel comparator 174, thecontrol unit 14 determines the characteristics to be set in thecompressor 192 as follows. That is, when the sound collection beam signals corresponding to MB11 and MB18, which are the sound collection beam regions closest to the voice conference apparatus in FIG. 3, are included in the selection instruction data, theabove characteristic 1 is selected. When the sound collection beam signal corresponding to MB14 and MB15, which is the central sound collection beam region farthest from the audio conference apparatus, is included in the selection instruction data, the above characteristic 2 is selected. In addition, when the sound collection beam signal corresponding to MB12, MB13, MB16, MB17, which is the sound collection beam region located at the middle of the distance from the audio conference apparatus, is included in the selection instruction data, the above characteristic 3 is selected. To do.

これは、話者の位置による収音レベルの変化を抑えるためである。音声会議装置からの距離が遠くなるほど、同じ発話音量で発言されたとしても収音される信号レベルは低下する。そのため、音声会議装置からの距離が遠くなるほど、コンプレッサ１９２で入力音声信号を増幅するように設定する。また、制御部１４は、コンプレッサ１９２に入力される入力音声信号が図６におけるレベルａ１（ｄＢ）〜ａ２（ｄＢ）の間となるように、入力ゲインコントローラ１９１のゲインを設定する。制御部１４は、入力ゲインコントローラ１９１を介して出力音声信号ＭＳｓのレベルを検出し、出力音声信号ＭＳｓのレベルが上記ａ１未満であれば、ａ１以上となるように入力ゲインコントローラ１９１のゲインを設定する。また、出力音声信号ＭＳｓのレベルが上記ａ２以上であれば、ａ２未満となるように入力ゲインコントローラ１９１のゲインを設定する。すなわち、コンプレッサ１９２の出力が聴覚上自然に聞こえる領域である、入力音声信号のレベル増加量に対する出力音声信号のレベル増加量が１対１となる設定領域になるように入力ゲインコントローラ１９１のゲインを設定する。 This is to suppress a change in sound collection level due to the position of the speaker. As the distance from the audio conference apparatus increases, the level of the signal that is collected decreases even if the speech is spoken at the same speech volume. Therefore, thecompressor 192 is set to amplify the input voice signal as the distance from the voice conference apparatus increases. Further, thecontrol unit 14 sets the gain of theinput gain controller 191 so that the input audio signal input to thecompressor 192 is between levels a1 (dB) to a2 (dB) in FIG. Thecontrol unit 14 detects the level of the output audio signal MSs via theinput gain controller 191, and if the level of the output audio signal MSs is less than a1, the gain of theinput gain controller 191 is set to be a1 or more. To do. If the level of the output audio signal MSs is equal to or higher than a2, the gain of theinput gain controller 191 is set so as to be lower than a2. That is, the gain of theinput gain controller 191 is set so that the level increase amount of the output audio signal with respect to the level increase amount of the input audio signal becomes a setting region in which the output of thecompressor 192 is heard naturally. Set.

出力ゲインコントローラ１９３は、コンプレッサ１９２の出力レベルが高すぎる場合に、これを抑制する。出力ゲインコントローラ１９３のゲインは制御部１４により設定される。制御部１４は、出力音声信号ＭＳｓのレベルがａ１未満であればａ１以上となるように入力ゲインコントローラ１９１のゲインを設定するが、あまりにもゲインを大きくすると（例えば２倍程度にすると）聴覚的に不自然となる。そこで、制御部１４は、入力ゲインコントローラ１９１に設定したゲインを補正するように、出力ゲインコントローラ１９３のゲインを設定する。例えば入力ゲインコントローラ１９１のゲインを２倍に設定した場合、出力ゲインコントローラ１９３のゲインを０．５倍に設定する。 Theoutput gain controller 193 suppresses this when the output level of thecompressor 192 is too high. The gain of theoutput gain controller 193 is set by thecontrol unit 14. If the level of the output audio signal MSs is less than a1, thecontrol unit 14 sets the gain of theinput gain controller 191 so that it is equal to or higher than a1, but if the gain is increased too much (for example, about 2 times), it is auditory. It becomes unnatural. Therefore, thecontrol unit 14 sets the gain of theoutput gain controller 193 so as to correct the gain set in theinput gain controller 191. For example, when the gain of theinput gain controller 191 is set to double, the gain of theoutput gain controller 193 is set to 0.5 times.

なお、コンプレッサの特性の数、およびレベルは図６の例に限定されるものではない。音声会議装置の使用状況（ビームの数、収音ビーム領域との距離）に応じて適宜設定すればよい。また、図６では、入力音声信号を増幅して出力する例を示しているが、減衰して出力する特性を設定してもよい。 Note that the number and level of the characteristics of the compressor are not limited to the example of FIG. What is necessary is just to set suitably according to the use condition (the number of beams, distance with a sound collection beam area | region) of an audio conference apparatus. Further, FIG. 6 shows an example in which the input audio signal is amplified and output, but the characteristic of attenuating and outputting may be set.

放音制御部２０は、入力音声信号に所定の遅延処理を行い、Ｄ／Ａ変換部２１における各Ｄ／Ａコンバータ２１１に入力する。各Ｄ／Ａコンバータ２１１は、入力された音声信号をアナログ音声信号に変換し、ＡＭＰ２１２に入力する。ＡＭＰ２１２は、アナログ音声信号を増幅してスピーカＳＰ１〜ＳＰ８に入力し、スピーカＳＰ１〜ＳＰ８は、音声を放音する。 The soundemission control unit 20 performs predetermined delay processing on the input sound signal and inputs the input sound signal to each D /A converter 211 in the D /A conversion unit 21. Each D /A converter 211 converts the input audio signal into an analog audio signal and inputs the analog audio signal to theAMP 212. TheAMP 212 amplifies the analog audio signal and inputs it to the speakers SP1 to SP8, and the speakers SP1 to SP8 emit sound.

放音制御部２０は、スピーカアレイの各スピーカに入力する音声信号に遅延処理を行うことで、所定方向に強い指向性を有する放音ビームを形成することができる。また、所定位置に焦点を結ぶように放音ビームを形成することもできる。各スピーカは、焦点との実距離がそれぞれ異なるが、これらのスピーカを焦点から等距離に配列したようなタイミングで放音されるように音声信号を遅延すればよい。 The soundemission control unit 20 can form a sound emission beam having strong directivity in a predetermined direction by performing delay processing on the audio signal input to each speaker of the speaker array. Further, the sound emitting beam can be formed so as to focus on a predetermined position. Each speaker has a different actual distance from the focal point, but it is only necessary to delay the audio signal so that sound is emitted at a timing such that these speakers are arranged at equal distances from the focal point.

なお、図３では８つの収音ビーム領域ＭＢ１１〜ＭＢ１８の音声を収音する例について示したが、図７に示すように、音声会議装置の前面に並行して、４つの収音ビームを形成してもよい。図７は、音声会議装置の前面の会議机に並行して会議参加者が存在する場合の例を示す。この場合、収音ビーム領域ＭＢ２１〜ＭＢ２４は、それぞれ音声会議装置との距離が略等しくなる。この場合、上述のように中央の収音ビーム領域に対応する収音ビーム信号を増幅すると、これらの収音領域で発話した音声だけが大きくなってしまう。そこで、制御部１４は、コンプレッサ１９２の設定を、上記特性３、または特性１に変更するように設定する。すなわち、制御部１４は、図４においてレベル比較器１７４から選択指示データを受信したとき、その収音ビームの収音レベル（またはエネルギ）を取得し、この平均値を算出する。この平均値をメモリ（図示せず）に記憶しておく。制御部１４は、この平均値が所定のレベルを超えた場合、または各収音ビームの平均値を比較し、中央方向の収音ビームの平均値が他の収音ビームよりも高く（例えば１．５倍程度）なった場合、会議参加者が音声会議装置の前面に並行して存在するとみなし、コンプレッサ１９２の設定を、上記特性３、または特性１に変更するように設定する。これにより、一部発話音声だけが大きくなることを防止する。 Although FIG. 3 shows an example in which voices in eight sound collecting beam areas MB11 to MB18 are picked up, as shown in FIG. 7, four sound collecting beams are formed in parallel with the front face of the audio conference apparatus. May be. FIG. 7 shows an example in which conference participants exist in parallel with the conference desk on the front surface of the audio conference apparatus. In this case, the sound collection beam areas MB21 to MB24 are approximately equal in distance to the audio conference apparatus. In this case, as described above, if the sound collection beam signal corresponding to the central sound collection beam region is amplified, only the speech uttered in these sound collection regions is increased. Therefore, thecontrol unit 14 sets the setting of thecompressor 192 so as to change to the characteristic 3 or the characteristic 1. That is, when the selection instruction data is received from thelevel comparator 174 in FIG. 4, thecontrol unit 14 acquires the sound collection level (or energy) of the sound collection beam and calculates the average value. This average value is stored in a memory (not shown). When this average value exceeds a predetermined level, or the average value of each sound collecting beam is compared, thecontrol unit 14 compares the average value of the sound collecting beams in the center direction with a higher average value than other sound collecting beams (for example, 1 If it is about .5 times), it is considered that the conference participant exists in front of the voice conference apparatus in parallel, and the setting of thecompressor 192 is set to change to the characteristic 3 or the characteristic 1. As a result, it is possible to prevent only a part of the uttered voice from being increased.

なお、音声会議装置にユーザインタフェース（ボタンなど）を設置し、ユーザが手動で設定を変更するようにしてもよい。 Note that a user interface (such as a button) may be installed in the audio conference apparatus, and the user may manually change the setting.

テレビ会議装置の外観図である。It is an external view of a video conference apparatus.テレビ会議装置の構成を示すブロック図である。It is a block diagram which shows the structure of a video conference apparatus.テレビ会議装置により形成される収音ビーム領域を示す図である。It is a figure which shows the sound collection beam area | region formed with a video conference apparatus.図２に示す信号選択部１７の構成を示すブロック図である。It is a block diagram which shows the structure of thesignal selection part 17 shown in FIG.レベル調整回路１９の構成を示すブロック図である。3 is a block diagram showing a configuration of alevel adjustment circuit 19. FIG.入力音声信号と出力音声信号のレベル比を表した図である。It is a figure showing the level ratio of an input audio signal and an output audio signal.音声会議装置の前面の会議机に並行して会議参加者が存在する場合の例を示す図である。It is a figure which shows an example in case a conference participant exists in parallel with the conference desk in front of an audio conference apparatus.

符号の説明Explanation of symbols

１１−カメラ
ＳＰ１〜ＳＰ８−スピーカ
Ｍ１〜Ｍ１２−マイク11-Cameras SP1-SP8-Speakers M1-M12-Microphone

Claims

Translated fromJapanese

映像を撮影するカメラ、音声を放音する放音部、および音声を収音する収音部を近接する位置に備えたテレビ会議装置であって、
前記収音部は、複数のマイクを配列してなるマイクアレイと、複数の方向に対して収音ビームを形成するとともに、各収音ビーム強度を比較することで話者方位を同定し、話者方位に対応する収音ビームを選択し、この選択した収音ビームを収音信号として出力する収音制御部と、からなり、
外部から入力された入力信号を信号処理し、前記放音部に入力する入力信号処理部と、
前記収音信号のレベルを減衰、または増幅するレベル調整回路と、
前記レベル調整回路の調整量を前記話者方位毎に記憶する設定テーブルと、
前記収音制御部から同定した話者方位を入力し、前記設定テーブルを読み出して該話者方位に対応する調整量を前記レベル調整回路に設定する制御部と、
を備えたテレビ会議装置。A video conferencing apparatus provided with a camera that shoots video, a sound emitting unit that emits sound, and a sound collecting unit that collects sound at close positions,
The sound collection unit forms a sound collection beam in a plurality of directions with a microphone array in which a plurality of microphones are arranged, and compares the sound collection beam intensities to identify the speaker direction, A sound collection control unit that selects a sound collection beam corresponding to the person's direction and outputs the selected sound collection beam as a sound collection signal;
An input signal processing unit that processes an input signal input from the outside and inputs the signal to the sound emitting unit;
A level adjustment circuit for attenuating or amplifying the level of the collected sound signal;
A setting table for storing the adjustment amount of the level adjustment circuit for each speaker orientation;
A controller that inputs the speaker orientation identified from the sound collection controller, reads the setting table, and sets an adjustment amount corresponding to the speaker orientation in the level adjustment circuit;
Video conferencing equipment.

前記レベル調整回路は、前記収音信号を減衰、または増幅するコンプレッサと、前記コンプレッサのゲインを調整する入力ゲイン調整部と、からなり、
前記制御部は、前記レベル調整回路の調整量として前記コンプレッサの特性を設定するとともに、前記収音信号のレベルを検出し、その収音信号のレベルに基づいて、前記入力ゲイン調整部のゲイン調整量を設定する請求項１に記載のテレビ会議装置。The level adjustment circuit includes a compressor that attenuates or amplifies the collected sound signal, and an input gain adjustment unit that adjusts the gain of the compressor.
The control unit sets the characteristics of the compressor as an adjustment amount of the level adjustment circuit, detects the level of the sound pickup signal, and adjusts the gain of the input gain adjustment unit based on the level of the sound pickup signal The video conference apparatus according to claim 1, wherein an amount is set.

前記制御部は、前記収音制御部が収音ビームを選択したときにその収音ビームのレベルを検出して時間平均値を算出し、
各収音ビームの時間平均値が所定のレベルを超えた場合、または各収音ビームの時間平均値を比較して特定の収音ビームの時間平均値が所定値以上大きい場合、その収音ビームの話者方位に対応する調整量を下げる設定とする請求項１、または請求項２に記載のテレビ会議装置。The control unit calculates a time average value by detecting a level of the sound collecting beam when the sound collecting control unit selects the sound collecting beam,
When the time average value of each sound collecting beam exceeds a predetermined level, or when the time average value of each sound collecting beam is compared with the time average value of each sound collecting beam, the sound collecting beam The video conference apparatus according to claim 1, wherein the adjustment amount corresponding to the speaker direction is set to be lowered.

前記入力信号を適応型フィルタで処理した擬似エコー信号を、前記収音部が出力した収音信号から減算し、当該減算した後の収音信号を前記レベル調整回路に出力する適応型エコーキャンセラを備えた請求項１、請求項２、または請求項３に記載のテレビ会議装置。 An adaptive echo canceller that subtracts a pseudo echo signal obtained by processing the input signal with an adaptive filter from a sound collection signal output by the sound collection unit and outputs the subtracted sound collection signal to the level adjustment circuit; The video conference apparatus according to claim 1, 2, or 3.