CN106887236A

Movatterモバイル変換

Info

Publication number: CN106887236A
Application number: CN201510973445.2A
Authority: CN
Inventors: 朱沄杰; 徐伟明; 何颋; 黄松岳
Original assignee: Ningbo Sangdena Electronic Technology Co Ltd
Current assignee: Ningbo Sangdena Electronic Technology Co Ltd
Priority date: 2015-12-16
Filing date: 2015-12-16
Publication date: 2017-06-23

Abstract

For the difficult point of voice collecting under remote, noise background, sound image combined determination target speaker is carried out with reference to video frequency pick-up head, microphone array, the problems such as determining difficulty so as to solve the target speaker for existing using camera, high directivity microphone or microphone array merely, need mechanical rotation device, present invention proposition carries out sound image combined determination target speaker positioning with reference to a kind of voice acquisition device of monitoring camera, microphone array beam forming technique, so that the performance that outdoor remote speech enhancing is gathered under improving ambient noise conditions.

Description

A kind of remote speech harvester of sound image combined positioning

Technical field

The present invention relates to a kind of voice acquisition device, more particularly, to a kind of remote speech of sound image combined positioningHarvester.

Background technology

In fields such as security, security protections, all kinds of video monitoring systems have been used widely.Rely on all kinds of videosMonitoring system, can be confirmed to related personnel in long-distance video, be screened.Utilizing video monitoring systemIf being gathered using language, dialog information by remote speech when carrying out remote suspect's confirmation, screening,Operating efficiency can be leveraged.But remote speech collection is carried out under actual environment background noise conditions stillWith high difficulty.

Due to the presence of ambient noise, remote speech must be protected when gathering using the high directivity of harvesterDemonstrate,prove the remote speech quality of collection.Current remote speech harvester is mainly using the rifle of interference tubular constructionFormula microphone forms high directivity.

As Chinese patent ZL 2010101269089 discloses a kind of sound pick up equipment, it includes：Accommodating body, firstPiezoelectric element, the second piezoelectric element and circuit unit, the accommodating body have pickup mouthful, and the piezoelectric element is setIn in accommodating body, to sense the vibration of high frequency sound wave and converted output signal, second piezoelectric element is arranged at appearancePut in vivo, the converted output signal to sense the vibration of low-frequency sound wave, the circuit unit is electrically connected to the first piezoelectricityElement and the second piezoelectric element, to receive the signal of the first piezoelectric element and the second piezoelectric element, and are processedProduce voice signal；Therefore, with preferable sensing sensitivity and broader audio, tonequality can be lifted.

Chinese patent ZL2010591158.2 discloses a kind of long range sound pick up equipment of video location, byThe sound focusing structure staving that circumference centered on video camera installs 2 built-in directional microphones forms sensing highProperty, obtain reference noise, sound focusing with reference to the omnidirectional microphone that 2 pickup ambient noises are installed outside bucket body sidewallStructure staving is rotated with camera, and the device is only obtained after operating personnel carry out video location according to video image contentTake with the equidirectional voice signal of camera, and carry out adaptive noise reduction treatment using digital signal processor.

But because the remote speech harvester of above-mentioned formation high directivity can only be formed immediately ahead of deviceFixed high directivity wave beam, needs to be spoken come the remote of alignment motion by the rotation of device in actual usePeople's target, increased extra mechanical servo antrol cost；Simultaneously, it is contemplated that video monitoring is to remoteTarget has larger field, with portrait can be seen by focusing from long-distance video image, but often cannot be straightConnect discriminating, find speech act, cause video frequency pick-up head and remote pickup device to carry out Mechanical Moving alignment meshThere is a problem of being difficult synchronous during mark speaker, will also cause Design of Monitoring and Control System to use upper inconvenience.

Microphone array is made up of multiple microphones according to certain topological structure, can be by beamforming algorithm to notSignal on equidirectional forms different responses, namely array space directional property, make array microphone that there is soundThe function such as source positioning and tracking, voice extraction and separation and denoising, so as to improve the voice under complex backgroundSignal quality, making up isolated microphone cannot obtain defect with utilization space information, and can avoid using machineryTumbler carrys out alignment target speaker.

The model domain compensation that Chinese patent ZL 2013102011025 is disclosed in a kind of remote speech identification is newly squareMethod, the method is proposed in simulating chamber for the difficulty in microphone array remote speech collection indoors and identificationReverberation acoustic enviroment, and pass through be input into interior space size generate diverse location room shock response sequenceRow, so as to the compensation that indoor remote speech is carried out in model domain improves collection and identifying processing performance.

But the outdoor scene remote speech for fields such as security, security protections gathers occasion, and the required voice that carries out is adoptedSpeaker's object distance of collection is much larger than indoor application scene, and there is serious ambient noise.Now, merelyIt is difficult to obtain speaker direction so as to carry out speech enhan-cement and collection by microphone array algorithm.

The content of the invention

For the difficult point of voice collecting under remote, noise background, enter with reference to video frequency pick-up head, microphone arrayThe sound image combined determination target speaker of row, so as to solve to use camera, high directivity microphone or Mike merelyThe problems such as target speaker that wind array is present determines difficulty, needs mechanical rotation device, the present invention proposes to combine prisonControl camera, a kind of voice acquisition device of microphone array beam forming technique carry out sound image combined determination targetSpeaker positions, so that the performance that outdoor remote speech enhancing is gathered under improving ambient noise conditions.

A kind of remote speech harvester of sound image combined positioning, including with lower module,

Monitoring camera：For gathering long-distance video image；

Microphone array：For voice signal multichannel collecting, preposition treatment and analog-to-digital conversion；

Beam scanning module, its input connection microphone array output end：For carrying out beam scanning, obtainThe directional spreding information of remote speech and noise；

Sound image combined processing module, its input connects the output of monitoring camera and beam scanning module respectivelyEnd：For the image information, the voice of beam scanning module acquisition and the noise direction letter that transmit monitoring cameraBreath sends into the sound image combined monitoring display screen being provided with after Coordinate Conversion and carries out sound image combined locating and displaying；

Sound image combined monitoring display screen, its input connects sound image combined processing module output end：For receiving soundThe sound image combined information sent as joint imaging processing module simultaneously carries out screen display.

Target chosen module：For policer operation personnel according to the image on sound image combined monitoring display screen, soundUnited information selected target speaker.

The output end of wave beam alignment modules, its input difference linking objective chosen module and microphone array：RootThe target speaker selected according to target chosen module carries out microphone array wave beam alignment for direction.

Voice acquisition module, its input connects the output end of wave beam alignment modules：To the language of wave beam alignment modulesMessage breath is acquired.

The microphone array includes enhancing module, and each channel speech signal output part of microphone array is through increasingStrong module connects beam scanning module and wave beam alignment modules respectively, and the enhancing module is used to strengthen microphone arrayThe voice messaging of row.

The enhancing module includes pre-amplification circuit and analog-digital converter.

The microphone array includes reflector：For carrying out voice signal focusing on microphone；

The target chosen module is provided with mouse input, and operating personnel are by observing sound image combined monitoring display screenBy mouse input selected target speaker, target chosen module exports target speaker's after Coordinate ConversionDirectional information is to wave beam alignment modules.

Using the sound image combined positioning remote speech harvester the step of it is as follows：

One initialization step：Each module parameter Initialize installation；

One video acquisition step：Monitoring camera gathers long-distance video image；

One beam scanning step：Microphone array is scanned to remote speech and signal travel direction, acquisition sideTo distributed intelligence；

One sound image combined process step：Merge addition video figure after beam scanning result is carried out into Coordinate ConversionPicture, forms sound image combined video image.

One sound image combined step display：Sound image combined display screen is shown sound image combined result；

One selected step of target：Operating personnel combine image, acoustic information on sound image combined display screen and utilizeMouse selected target speaker, and the directional information that target is spoken is exported by Coordinate Conversion；

One wave beam alignment procedures：Selected target speaker direction input microphone array is carried out into wave beam pairIt is accurate；

One speech acquisition step：Microphone array wave beam registration signal is acquired.

Brief description of the drawings

Fig. 1 is the structure composition block diagram of the embodiment of the present invention；

Fig. 2 is the microphone reflector schematic diagram of the embodiment of the present invention；

Fig. 3 is 5 yuan of microphone arrays of the embodiment of the present invention and its is connected circuit diagram with microprocessor；

Fig. 4 is the beam scanning principle schematic of the embodiment of the present invention；

Fig. 5 is connected circuit diagram for the camera of the embodiment of the present invention with microprocessor.

Specific embodiment

In order that technology contents of the invention, feature, advantage are more obvious understandable, following examples will be with reference to attachedThe present invention is further illustrated for figure.

Microphone array is by 5 in the array remote speech harvester embodiment of the sound image combined positioningMicrophone (m0, m1 ..., m4) the composition microphone lines array of individual equidistant arrangement, each microphone in arrayReflector shown in Fig. 2 is installed, with axis into 45 degree, reflection cover material is stainless to the reflecting surface of reflectorSteel is used with adapting to the outdoor mounted of apparatus of the present invention, in order to focus on remote speech in the present embodiment, sets hairPenetrate cover diameter d₀=40cm.The voice signal that microphone array is obtained carries out beam scanning using beam scanning algorithmObtain the directional spreding information of remote speech and noise.

Microphone array is made up of microphone and hardware circuit, wherein microphone array by small volume, simple structure,The preceding storing that good omnidirectional microphone m0 ..., m4, the NJM2100 operational amplifier chip of electroacoustic performance is constitutedBig circuit and MAX118 modulus conversion chips constitute (as shown in Figure 3), in the present embodiment, in order to gatherRemote speech, sets microphone spacing d=40cm.

Beam scanning module, sound image combined processing module, wave beam are aligned and strengthen module, target chosen module etc.Comprising modules belong to digital signal processing module, in the present embodiment using ARM9 S3C2440 microprocessorsCarry out software programming realization.

Microphone array is with the connected mode of microprocessor：5 microphone output signal warps in microphone arrayMultichannel modulus conversion chip is input into after the 2 grades of pre-amplification circuits amplification for crossing the composition of operational amplifier shown in Fig. 2MAX118, S3C2440 microprocessor by I/O port GPB2, the input channel end A1 of 3,4 control MAX118,A2, A3, by timer output pin TOUT0, TOUT1 control MAX118 reading/write-in port WR,RD carries out the analog-to-digital conversion of sample frequency 16ksps, and 8bit moduluses are carried out by data wire DATA0 to DATA7Transmission of the transformation result to S3C2440 microprocessors.

Multicenter voice signal in the array remote speech harvester embodiment of the sound image combined positioningAfter analog-to-digital conversion enters microprocessor, with the number between each digital signal processing module that software programming form is runAccording to, controlling stream connected mode as shown in figure 3, being described as follows：

Beam scanning module is superimposed after time delay adjustment is carried out gradually to each channel signal time delay of microphone arrayTo obtain the corresponding beam-formed signal of different wave beams.Beam scanning principle combination Fig. 3 is described as follows：In embodiments of the present invention, it is X-axis with horizontal line where 5 yuan of microphone linear arrays, with 5 yuan of microphone linear arraysLocation coordinate is set up in middle microphone m2 positions for the origin of coordinates, and each array element spacing of linear array is d.With thisThe center array element microphone m2 of embodiment linear array carries out beam scanning as benchmark：That is, the voice that m2 is receivedSignal does not make delay compensation, the voice signal x that rest channels microphone is received in linear array_iCarry out following delay compensationX ' is obtained after calculating_i(as shown in Figure 4)：

x′_i(k, j)=x_i(k′)；

Wherein i is the numbering of each passage in linear array；Beam scanning is taken at intervals of 1.25 degree, then to microphone linear arrayFront 180 degree scope is scanned need to be scanned 144 times, and 72 times, i.e. j=0 are respectively scanned in left and right, and ± 1, ± 2, ± 3 ... ± 72Represent beam scanning sequence number；θ_jIt is the scanning beam formed after the adjustment of each time delay, C is the velocity of sound (this in air340m/s is taken in embodiment), f_sFor the sample frequency of Microphone Array Speech signal, (unit is Hz, in this realityApply and take 16000Hz in example), round () represents rounding operation.Then each channel speech is believed after delay compensation graduallyNumber x '_iIt is overlapped and be capable of achieving positive and negative 90 degree of scopes (to microphone linear array front 180 degree in the present embodimentScope carries out beam scanning) beam scanning, calculating in window L long (L=800 in the present embodiment) to receivingNoisy speech carry out beam scanning can obtain comprising remote speech sound source, noise source direction positive and negative 90Degree scope beam information E (θ_j), j=0, ± 1, ± 2, ± 3 ... ± 7.

Monitoring camera video acquisition：Because monitoring camera video acquisition is technology generally in the art,Not to this partially unfolded specific descriptions in the embodiment of the present invention, the present embodiment uses band generally in the artThe CMOS camera for having 0V9650 chips carries out long-distance video collection, the video figure of camera collectionSound image combined treatment is carried out as being input into S3C2440 microprocessors by USB interface generally in the art.

Sound image combined treatment：Sound image combined process step is to respectively by microphone array and the wave beam of camera acquisitionScanning result and video image carry out acoustics Combined Treatment, obtain beam scanning in S3C2440 microprocessorsBeam information coordinate transform is carried out according to camera visual field.OV9650 cameras are taken the photograph to focus in the present embodimentAs head (video image format is set to 640 × 320, frame per second 15fps).The present embodiment uses OV965060 meters of remote fixed-sites of distance of camera head monitor, centered on camera axis, measurement is obtainedThe corresponding level angle of fixed-site in OV9650 cameras visual field at 60 meters of distances as monitoring objectiveIt is ± 45 degree.Then carry out being imaged during beam scanning result is converted to sound image combined treatment by following Coordinate ConversionCorresponding beam data in head field range：

Specifically, by display screen after sound image combined treatment with red highlighted curve in 640 × 320 video imagesThe acoustic energy wave beam situation of corresponding angle, policer operation in the most inferior horn displaing coordinate converted images information of displayPersonnel can be easily according to image harmony energy beam Combined Treatment show that selected need collection is to speakThe voice of people.After acoustic image processes coordinate transform, corresponding wave beam in the camera field range that will be obtainedData separate interpolation algorithm generally in the art is calculated 320 spot beam curves, and is superimposed upon camera320 spot beam curves are represented with red highlighted curve in acquisition view data.

Sound image combined display：Camera superposition wave beam curve data after acoustic image is processed sends into this areaIn general monitoring display screen, then can ensure in the present embodiment in OV9650 cameras visual field at 60 meters of distancesNoise, signal energy distribution situation are intuitively with red highlighted song in corresponding ± 45 degree of horizontal extents in placeThe mode Overlapping display of line is on 640 × 320 display screens.

Target is selected：Operating personnel can simultaneously see prison by directly observing 640 × 320 monitoring imagesVoice, noise energy wave beam in control image at 60 meters of distances in personnel's image and correspondence visual field, it is specialWhen not having many people, many vehicles or other noise sources in visual field, operating personnel easily basis can regardFrequency image and sound waves beam curve joint determine target speaker, and carry out target speaker using mouseDetermination.After mouse determines target speaker, using technology generally in the art, can be by mouse on screenMark determines that the horizontal coordinate position z of point is converted to corresponding angle on target θ_t.Changing concrete principle is：

Wave beam is aligned and enhancing：In the present embodiment, distant object speaker is determined by sound image combined positioningBehind direction, by each channel signal of microphone array according to angle on target θ_tCorresponding each channel time delay value is calculated to enterRow polishing, to polishing after each channel signal weighted superposition, obtain alignment target speaker Wave beam forming outputSignal, so as to obtain remote enhancing voice；

Voice collecting：Wave beam alignment and enhanced remote speech are adopted using technology generally in the artCollection, and be maintained in the storage medium of setting.

The preferred embodiments of the present invention are the foregoing is only, is not intended to limit the invention.Sound disclosed by the inventionIt is to combine reflector microphone array as the characteristics of the array remote speech harvester of alignment by union is maximumThe noise speech directional information for providing and the target video image alignment by union that monitoring camera is provided are provided and obtain remoteDistance objective speaker, overcome conventional method it is remote, there are ambient noise conditions under be difficult to determine targetThe shortcoming of speaker, after determining target speaker direction especially by sound image combined positioning, with reference to reflector andMicrophone array speech enhancement can further suppress the influence of ambient noise, improve remote speech collection propertyEnergy.

Claims

1. a kind of remote speech harvester of sound image combined positioning, it is characterised in that：Including with lower module,

Monitoring camera：For gathering long-distance video image；

Beam scanning module, its input connection microphone array output end：For carrying out beam scanning, long distance is obtainedFrom voice and the directional spreding information of noise；

Sound image combined processing module, its input connects the output end of monitoring camera and beam scanning module respectively：WithThe voice and noise directional information obtained in the image information, beam scanning module of transmitting monitoring camera are through sittingSending into the sound image combined monitoring display screen being provided with after mark conversion carries out sound image combined locating and displaying；

Sound image combined monitoring display screen, its input connects sound image combined processing module output end：For receiving acoustic image connectionSound image combined information that synthesized image processing module is sent simultaneously carries out screen display；

Target chosen module：Combined according to the image on sound image combined monitoring display screen, sound for policer operation personnelInformation selected target speaker；

The output end of wave beam alignment modules, its input difference linking objective chosen module and microphone array：According to meshThe selected target speaker of mark chosen module carries out microphone array wave beam alignment for direction；

Voice acquisition module, its input connects the output end of wave beam alignment modules：To the voice letter of wave beam alignment modulesBreath is acquired.

2. the remote speech harvester of a kind of sound image combined positioning according to claim 1, it is characterised in that：The microphone array includes enhancing module, each enhanced mould of channel speech signal output part of microphone arrayBlock connects beam scanning module and wave beam alignment modules respectively, and the enhancing module is used to strengthen microphone arrayVoice messaging.

3. the remote speech harvester of a kind of sound image combined positioning according to claim 2, it is characterised in that：The enhancing module includes pre-amplification circuit and analog-digital converter.

4. the remote speech harvester of a kind of sound image combined positioning according to claim 1, it is characterised in that：The microphone array includes reflector：For carrying out voice signal focusing on microphone.

5. the remote speech harvester of a kind of sound image combined positioning according to claim 1, it is characterised in that：The target chosen module is provided with mouse input, and operating personnel are passed through by observing sound image combined monitoring display screenMouse input selected target speaker, target chosen module exports the direction of target speaker after Coordinate ConversionInformation is to wave beam alignment modules.

6. described in usage right requirement 1-5 any one the step of the remote speech harvester of sound image combined positioning：

One initialization step：Each module parameter Initialize installation；

One beam scanning step：Microphone array is scanned to remote speech and signal travel direction, obtains direction pointCloth information；

One sound image combined process step：Merge addition video image, shape after beam scanning result is carried out into Coordinate ConversionInto sound image combined video image.

One selected step of target：Operating personnel combine image, acoustic information on sound image combined display screen and utilize mouseSelected target speaker, and the directional information that target is spoken is exported by Coordinate Conversion；

One wave beam alignment procedures：Selected target speaker direction input microphone array is carried out into wave beam alignment；