CN207149252U

Movatterモバイル変換

Info

Publication number: CN207149252U
Application number: CN201720953479.XU
Authority: CN
Inventors: 李飞; 程旭; 赵珣; 袁俊杰; 吕文杨
Original assignee: Anhui Hear Technology Co Ltd
Current assignee: Anhui Hear Technology Co Ltd
Priority date: 2017-08-01
Filing date: 2017-08-01
Publication date: 2018-03-27
Anticipated expiration: 2027-08-01

Abstract

The utility model proposes a kind of speech processing system, wherein, the system includes：Including at least the first sound pick up equipment and the second sound pick up equipment, wherein the first sound pick up equipment and the second sound pick up equipment are connected with processing unit, first sound pick up equipment gathers the first voice messaging of the first user, second sound pick up equipment gathers the second voice messaging of second user, the first voice and the second voice are identified to obtain corresponding word content and corresponding user for processing unit, and according to customer segment shorthand content.In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminated the reliance on manual identified voice messaging and recorded manually, improve record efficiency, reduce human cost, and the probability of error of omission can be reduced.Especially during hearing is handled a case, the pressure during procurator handles a case can be alleviated so that personnel in charge of the case can put into more energy in the trial of case, lifting hearing quality.

Description

Speech processing system

Technical field

It the utility model is related to technical field of voice recognition, more particularly to a kind of speech processing system.

Background technology

At present, under some conference scenarios, hearing scene or talk scene, majority is by video and audio recording and artificialManually recorded mode forms recording documents, in order to the later stage checks and traces.But note when meeting, hearing or talkRecord personnel not only need to listen to voice, and it also requires being manually entered participant, suspect or what is said or talked about on computersThe content of speaking of people is talked about, for confirmation of signing, archive and follow-up business circulation.Record is in meeting, hearing or talkScene does not only result in record and felt exhausted unbearably, it is necessary to carry out the work listened to, record and checked etc. simultaneously,And also occur that situations such as omitting details or key content.

Especially during actually query or inquiry, personnel in charge of the case carries out hearing, record and verification etc. simultaneouslyWork, does not only result in personnel in charge of the case and feels exhausted unbearably, and also occurs that situations such as omitting details or crucial confession content.

Utility model content

The utility model is intended to one of technical problem at least solving in correlation technique to a certain extent.

Therefore, first purpose of the present utility model is to propose a kind of speech processing system, existed with alleviating recordPressure in meeting, hearing or conversation on course so that meeting, hearing or talk personnel can put into more energy in meetingIn view, hearing, conversation on course, for solving existing record personnel while carrying out the work of listen to, record and check etc.,Record is not only resulted in feel exhausted unbearably, and also occurs that the problem of omitting details or key content.

For the above-mentioned purpose, the utility model first aspect embodiment proposes a kind of speech processing system, including：

At least two sound pick up equipments and the processing unit for being handled voice；The sound pick up equipment includes the first tenMixer and the second sound pick up equipment；

Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit；

First sound pick up equipment, for gathering the first voice of the first user；

Second sound pick up equipment, for gathering the second voice of second user；

The processing unit, for obtaining first voice or second voice, to first voice orSecond voice is identified to obtain corresponding word content and corresponding user, and records institute according to the customer segmentState word content.

As a kind of possible implementation of the utility model first aspect embodiment, sound card, respectively with described firstSound pick up equipment, second sound pick up equipment connection and processing unit connection；

The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the placeManage device processing.

As a kind of possible implementation of the utility model first aspect embodiment, the sound card is integrated in describedIn two sound pick up equipments；First sound pick up equipment is connected by second sound pick up equipment with the processing unit.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, including：Pick upSound unit, transcription unit and display screen；Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unitIt is connected respectively with the pickup unit and display screen；

Wherein, the pickup unit, for receiving first voice or second voice, to the voice receivedCarry out pickup and carry out automatic noise reduction dereverberation；

The transcription unit, for carrying out speech recognition to the voice after processing, in being carried in the voiceAppearance changes into the word content and user corresponding to determining the word content, associates the word content and corresponding user,The user according to corresponding to identifying the word content, judge whether the word content and the preceding paragraph content are same user,If not same user, then segmentation records the word content；

The display screen, for showing the word content of record.

As a kind of possible implementation of the utility model first aspect embodiment, the transcription unit, including：

Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by instituteThe content transformation carried in predicate sound extracts vocal print feature into the word content from the voice；

Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memoryIt is right, when the vocal print feature extracted is not present in the vocal print memory, then the vocal print feature extracted is depositedStore up the vocal print memory and form user's mark, associate the word content and the user's mark；

The vocal print feram memory, for storing the vocal print feature of the user extracted first.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to：

The memory cell being connected with the transcription unit and pickup unit, for store first voice that receives andSecond voice；

The transcription unit, is additionally operable to during the word content is recorded, according to sentence insertion and the sentence pairThe first information for the raw tone answered；Wherein, the voice that the first information includes receiving is in the memory cellAddress and raw tone timestamp information corresponding with the sentence；

The broadcast unit being connected with the transcription unit, for when clicking on the sentence, believing according to described firstBreath plays the raw tone corresponding to the sentence.

The transcription unit, is additionally operable to during the word content is recorded, according to paragraph insertion and the paragraph pairSecond information of the raw tone answered；Wherein, the voice that second information includes receiving is in the memory cellAddress and raw tone timestamp information corresponding with the paragraph；

The keyword extracting unit being connected with the transcription unit, for extracting keyword, shape from the word contentInto the keyword and the incidence relation of place paragraph；

The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and describedSecond information, raw tone corresponding to paragraph where playing the keyword.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to：Database, text template and/or sentence template during for stored record；

It is connected with the transcription unit and the database and chooses unit, for before the transcription unit is recordedA target text template is chosen from all text templates, and the meaning that current speech is stated is matched in recording processThink the first sentence template the meaning stated it is consistent when, first sound template is sent to the transcription unit and rememberedRecord, wherein, first sound template is one in all sentence templates in the database.

The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification；

The translation unit being connected with the transcription unit, for receiving the interpretive order of user, wrapped in the interpretive orderThe target language after conversion is included, the word content is translated by target language from current languages according to the interpretive order.

As a kind of possible implementation of the utility model first aspect embodiment, the processing unit sets for terminalIt is standby.

As a kind of possible implementation of the utility model first aspect embodiment, first sound pick up equipment and instituteState includes microphone array respectively in the second sound pick up equipment, wherein, first sound pick up equipment is line style microphone array, describedSecond sound pick up equipment is dish-type microphone array.

A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described firstSound pick up equipment and second sound pick up equipment are put according to the position relationship of setting at work.

A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described firstThe pickup scope of sound pick up equipment covers first user；Second sound pick up equipment and the distance of the second user will setIn fixed distance range.

The speech processing system of the utility model embodiment, including at least the first sound pick up equipment and the second sound pick up equipment, itsIn the first sound pick up equipment and the second sound pick up equipment be connected with processing unit, the first sound pick up equipment gathers the first voice of the first userInformation, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is carried out to the first voice and the second voiceIdentification obtains corresponding word content and corresponding user, and according to customer segment shorthand content.In the present embodiment, profitWith the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified languageMessage ceases and recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.

Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,Alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial, carryHearing quality is risen, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and verification etc. simultaneouslyWork, do not only result in personnel in charge of the case and feel exhausted unbearably, and also occur that and omit details or the problem of crucial confession content.

The additional aspect of the utility model and advantage will be set forth in part in the description, partly by from following descriptionIn become obvious, or by it is of the present utility model practice recognize.

Brief description of the drawings

The above-mentioned and/or additional aspect of the utility model and advantage from the following description of the accompanying drawings of embodiments willBecome obvious and be readily appreciated that, wherein：

Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides；

Fig. 2 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 3 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 4 is a kind of application schematic diagram for speech processing system that the utility model embodiment provides；

Fig. 5 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 6 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 7 is the structural representation for another speech processing system that the utility model embodiment provides；

Fig. 8 is the structural representation for another speech processing system that the utility model embodiment provides.

Embodiment

Embodiment of the present utility model is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginningSame or similar element is represented to same or similar label eventually or there is the element of same or like function.Below by ginsengThe embodiment for examining accompanying drawing description is exemplary, it is intended to for explaining the utility model, and it is not intended that to the utility modelLimitation.

Below with reference to the accompanying drawings the speech processing system of the utility model embodiment is described.

The record of most queries or interrogation record is still using Word or WPS hand-kept suspicion of crime populations nowFor for confirmation of signing, archive and follow-up business circulation, not only resulting in personnel in charge of the case and feeling exhausted unbearably, but also can send outRaw situations such as omitting details or crucial confession content.

In view of the above-mentioned problems, the utility model embodiment proposes a kind of speech processing system, alleviate procurator and handled a casePressure in journey so that personnel in charge of the case can put into more energy in trial of handling a case, lifting hearing quality.

Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides.As shown in figure 1, shouldSpeech processing system includes：Including at least the first sound pick up equipment 10 and the second adaptive device 20 and for handling voiceProcessing unit 30 wherein, the first sound pick up equipment 10 and the second sound pick up equipment 20 are connected with processing unit 30.

First sound pick up equipment 10, for gathering the first voice of the first user.

Second sound pick up equipment 20, for gathering the second voice of second user.

Processing unit 30, for obtaining the first voice and the second voice, the first voice and the second voice are identifiedTo corresponding word content and corresponding user, and according to corresponding customer segment shorthand content.

As a kind of example, on the basis of Fig. 1, Fig. 2 provides the structural representation of another speech processing system.Such asShown in Fig. 2, speech processing system also includes a sound card 40, and the sound card 40 fills with the first sound pick up equipment 10, the second pickup respectivelyPut 20 connections and processing unit 30 connects.

In the present embodiment, user corresponding to the voice being currently received can be identified by sound card 30, and by recognition resultIt is sent to processing unit 30 to connect, user corresponding to word content of such can of processing unit 30 in recognition result entersRow record.

Specifically, sound card 40 is a hardware element, including two-way input interface, is connected all the way with the first sound pick up equipment 10Connect, receive the first voice of the first sound pick up equipment 10 collection, another way is connected with the second sound pick up equipment 20, receives the second pickup dressPut the second voice of 20 collections.Sound card 40 can distinguish the input interface corresponding to the voice received, and then can identifyCorresponding user, it is possible to achieve the separation of automatic speech role.

It should include under the conference scenario that multiple sound pick up equipments be present, on sound card 40 consistent with sound pick up equipment quantity defeatedIncoming interface, the corresponding input interface of each sound pick up equipment, and then sound card 40 can identify that to be currently received voice institute rightThe role answered.

As another example, on the basis of Fig. 2, Fig. 3 provides the structural representation of another speech processing system.As shown in figure 3, the sound card 40 in the speech processing system is integrated in the second sound pick up equipment 20, sound card 40 respectively with the first pickupDevice 10, the second sound pick up equipment 20 and processing unit 30 connect, and enable to the first sound pick up equipment 10 to be filled by the second pickupPut 20 to be connected with processing unit 30, can so avoid setting multiple interfaces in processing unit 30, pass through interface and sound pick up equipmentConnection.Alternatively, in the present embodiment, the second sound pick up equipment 20 includes the microphone (MIC) of collection), MIC connects with sound card 40Connect.

Further, voice preprocessor or software are also provided with the second sound pick up equipment 20, it is pre- with voiceProcessing routine or software carried out noise filtering, and analog-to-digital conversion to the voice received, then by pretreated voiceIt is input in processing unit 30 and carries out speech recognition, improves the accuracy of speech recognition.

It is alternatively possible to voice preprocessor or software are built in processing unit 30, before speech recognitionVoice pretreatment is carried out, and then the voice to receiving carried out noise filtering, and analog-to-digital conversion, then to voice after pretreatmentSpeech recognition is carried out, improves the accuracy of speech recognition.

As a kind of example, processing unit 30 can be a mobile workstation, can be notebook computer, super notesThe terminal device such as sheet, personal computer (Personal Computer, abbreviation PC), mobile phone or ipad.Can be in processing unitBeing provided with 30 being capable of speech recognition and software or hardware by recognition result transcription into word content.

In the present embodiment, in order to improve pickup effect, the first sound pick up equipment 10 and the second sound pick up equipment 20 can beMicrophone, sound pick-up etc., it is preferable that include microphone array in the first sound pick up equipment 10 and the second sound pick up equipment 20.Due toMicrophone array can realize orientation pickup, so as to wiping out background noise, improve the pickup quality of sound pick up equipment.

Speech processing system provided by the utility model, including at least the first sound pick up equipment and the second sound pick up equipment, whereinFirst sound pick up equipment and the second sound pick up equipment are connected with processing unit, and the first sound pick up equipment gathers the first voice letter of the first userBreath, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is known to the first voice and the second voiceWord content and corresponding user corresponding to not obtaining, and according to customer segment shorthand content.In the present embodiment, utilizeThe mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified voiceInformation is simultaneously recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.

As shown in figure 4, it is a kind of application schematic diagram of the present utility model.Speech processing system provided by the utility model is usedInquested in interrogator under suspect this scene.Under the scene suspect generally be sitting on stool, before will not setBarrier is put, therefore, the first sound pick up equipment 20 can be arranged to a line style microphone array.Typically can before interrogatorDesk is provided with, therefore, the second sound pick up equipment 20 can be arranged to a dish-type microphone array.

Specifically, line style microphone array points to the first user, and herein, the first user is suspect, gathers suspect'sFirst voice is as confession.The distance of line style microphone array and suspect are most long can be up to 5 meters.Dish-type microphone array is setPut in front of second user, second user is hearing people, gathers the second voice of interrogator.Line style microphone array and dish-typeMicrophone array can gather 8 road voices respectively.

In use, can according to actual scene adjust line style microphone array the elevation angle, can raise up or underBow.Generally, line style microphone array pickup angle is 30 degree, when in use, it is necessary to ensure suspect in the first pickupIn the range of the pickup of device.For example, line style microphone array can be pointed to suspect and be directed at suspect's face, or with lineThe angle of line centered on the axis of type microphone array, suspect's face disalignment or so is no more than 15 degree.

Further, dish-type microphone array is in line style microphone array dead astern or side rear, and second user is examinedNews personnel will maintain a certain distance with dish-type microphone array, the distance between interrogator and dish-type microphone array controlIn default distance range, apart from the excessive voice that can not gather interrogator well, distance can closely cause very much angle of depression mistakeIt is big to influence pickup quality.

Hearing people must can not point to line style microphone array at line style microphone array rear, such as dead asternPeople or lateral deviation are inquested to hearing people, the problems such as so causing not knowing the pickup of suspect.

Further, line style microphone array connects with dish-type microphone array, dish-type microphone array and one it is superNotebook connects, and the super notebook is the processing unit 30 in the utility model.Processing unit 30 obtains current speech and carried outIt identification, can identify corresponding to voice it is hearing people, and then can will identify that word content belongs to hearing people.Work as hearingAfter the completion of people puts question to, when suspect is answered, after processing unit 30 can receive voice again, the voice can recognize thatFrom suspect, and then segmentation record can be carried out to the word content identified, in order to subsequently consult, identified in wordIt can be shown after appearance on the screen of super notebook.

In Interrogation Procedure, typically start to put question to suspect by interrogator, and then can first distinguish hearing peopleThe sound characteristic of member, and then can distinguishes interrogator and suspect below.For example, " can be answered " by " asking " mode comeDistinguish the word content of interrogator and the word content of suspect.

Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,To alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial,Lifting hearing quality, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and core peer simultaneouslyThe work in face, personnel in charge of the case is not only resulted in and is felt exhausted unbearably, and also occur that and omit asking for details or crucial confession contentTopic.

As a kind of example, on the basis of above-described embodiment, the structure that Fig. 5 provides another speech processing system is shownIt is intended to.As shown in figure 5, the processing unit 30 in the speech processing system includes：Pickup unit 301, transcription unit 302 and displayScreen 303.Pickup unit 301 is connected with transcription unit 302, and transcription unit 302 is connected with display screen 303.

Pickup unit 301 receives the first voice or the second voice, and pickup is carried out to the voice received and is carried out automaticNoise reduction dereverberation, to improve the accuracy of subsequent speech recognition.Further, transcription unit 302 by pickup unit 301 to being handledRear voice carries out speech recognition, and then by user corresponding to the content transformation carried in voice into word content and determination, closesJoin word content and corresponding user.In the present embodiment, pickup unit 301 can be that the hardware being arranged in processing unit 30 connectsMouthful, the hardware interface can realize that reverberation is conciliate in the reception to voice.Transcription unit 302 is the speech recognition in processing unit 30Chip, speech recognition can be carried out to the voice that receives, voice content is converted into word description.

As a kind of example, pickup unit 301 is connected with sound card 40, is receiving the same of the first voice or the second voiceWhen, the recognition result that sound card 40 transmits can be received, determines user corresponding to the current speech that receives, so it is rightThe voice received is identified, by the content transformation carried in voice into word content, and associate word content with it is correspondingUser.

As a kind of example, transcription unit 302 can extract the vocal print feature of the voice received, and then according to vocal printFeature determines user corresponding to word content.Fig. 6 is another speech processing system that the utility model embodiment providesStructural representation.Fig. 6 speech processing system transfers r/w cell 302 includes：

Speech recognition subelement 3021, contrast subunit 3022 and vocal print memory 3023.Speech recognition subelement 3021It is connected with from pickup unit 301, the first voice or the second voice from pickup unit 301 after reception processing.

Contrast subunit 3022 is connected with speech recognition subelement 3021, and vocal print memory 3023 is sub with speech recognition respectivelyUnit 3021 and contrast subunit 3022 connect.

Wherein, speech recognition subelement 3021 carries out voice to receiving the voice after the automatic noise reduction dereverberation of pickupIdentification, by the content transformation carried in voice into the word content, and extracts vocal print feature from voice.Further,The vocal print feature extracted is sent to contrast subunit 3022 by speech recognition subelement 3021, and contrast subunit 3022 will be extractedTo vocal print sign be compared with existing vocal print feature in vocal print memory, when the vocal print feature extracted is not present in soundIn line memory, then the vocal print feature extracted is stored and to vocal print memory and forms user's mark, associate word content andUser's mark.Wherein, user's mark is used to mark user corresponding to word content, for example, user's mark can be user C orUser 5 etc..

In the present embodiment, the vocal print memory 3023 that is set in transcription unit 302 can store the vocal print occurred firstFeature.That is, under each new usage scenario, a new vocal print memory 3023 can be all established, is using itJust, it is any vocal print feature of storage in the vocal print memory 3023.During speech recognition, whenever one new sound of appearanceAfter line feature, just by the storage of this vocal print feature into vocal print memory 3023, the voice for coming to subsequent acquisition is knownNot, user corresponding to the voice is determined.After usage scenario is switched, vocal print feature in vocal print memory 3023 can't be byIt is shared, simply used for this usage scenario.

This time need to say, although the vocal print feature in vocal print memory 3023 can not be shared between different scenes,It is that can be managed center or security department is acquired as sample, such as public security system.

Specifically, transcription unit 302 can according to corresponding to word content in recognition result user, judge the word contentWhether it is same user with the preceding paragraph word content, if non-same user, segmentation is recorded in the word in the recognition resultHold.

In the present embodiment, the word content of transcription can be sent to display screen 303 and word content exists by transcription unit 302Shown on display screen, multiple viewing areas can be divided into display screen 303, and one of viewing area is documents editingRegion, for word content recorded before showing, another region word content adds region, current real-time for showingThe word content identified.It is shown separately just to have manually by setting multiple viewing areas to realize the automatic addition of wordDebug.

In the present embodiment, the sound property of the voice extracted by transcription unit 302 can carry out role's separation, so thatObtaining transcription unit 302 can realize that conversational mode is recorded, for example can be according to " question and answer " mode under man-to-man sceneRecorded.

Further, transcription unit 302 can also use voice activity detection (Voice Activity Detection,Abbreviation VAD) it is segmented, for example, certain time interval can be set, between Jing Yin time interval exceedes the default timeEvery when, it is possible to by the word content of same user this it is Jing Yin point out carry out cutting, then by below word content recordIn the next paragraph.

In the above-described example on basis, the structure that Fig. 7 provides another speech processing system of the present utility model is shownIt is intended to.As shown in fig. 7, processing unit also includes：Memory cell 304 and broadcast unit 305.

Memory cell 304 is connected with transcription unit 302 and pickup unit 301 respectively, can store the first language receivedSound and the second voice.Transcription unit 302 is embedded in raw tone corresponding with sentence during shorthand content, according to sentenceThe first information.The first information includes address in memory cell 304 of the voice that receives and corresponding with sentence originalThe timestamp information of voice.Timestamp that the sentence starts and the timestamp of end can be recorded out.

Broadcast unit 305 is connected with transcription unit 302, and user is when some sentence in the word for clicking on record, rootAccording to the first information embedded in the sentence, it is possible to get voice address in memory cell 304, and according to the address withAnd timestamp information, it is possible to it is determined that the starting point and end point of raw tone corresponding with the sentence, and then play out thisVoice in the individual period.In the present embodiment, broadcast unit 305 can be loudspeaker or microphone array, such as can beCollar plate shape microphone etc..

Further, due to being provided with broadcast unit 305, the actual word content recorded is also played, so for notThe suspect of understanding word can play to suspect by way of machine reads aloud notes and listen, and effectively mitigate the personnel of procuratorial workOperating pressure.

In the present embodiment, by being that each sentence is embedded in the first of raw tone corresponding with the sentence in word contentInformation, the original contents of playback required for can neatly clicking on.

Especially in Interrogation Procedure, raw tone can be played according to each sentence in the record of trial.This is for the later stageThe unreasonable demand or behavior of withdrawing a confession that suspect proposes in court trial process, there is provided trial evidence, can precisely recall.It is and existingBeing recorded a video by synchronization, the data time to be formed is long, capacity is big, and suspect can not timely and accurately be navigated to by, which often leading to, turns overFor the video and audio recording of part, solves the problem that can not precisely recall in the prior art.

On Fig. 7 basis, Fig. 8 provides the structural representation of another speech processing system of the present utility model.As depicted in figure 8, the processing unit 30 also includes：Keyword extracting unit 306, database 307, choose unit 308, edit cell309 and translation unit 310.Wherein, keyword extracting unit 306 is connected with transcription unit 302, chooses unit 308 respectively with turningR/w cell 302 and database 307 are connected, and edit cell 309 is connected with transcription unit 302, translation unit 310 and transcription unit302 connections.

In the present embodiment, transcription unit 302 is corresponding with paragraph former according to paragraph insertion during shorthand contentSecond information of beginning voice；Wherein, the second information include the voice that receives address in the memory unit and with paragraph pairThe raw tone timestamp information answered.

Extraction unit 306 can be handled (Natural Language Processing, abbreviation NLP) by natural-soundingTechnology, keyword, such as time, place, personage, event and origin of an incident key are automatically extracted on the word content identifiedWord.Further, after keyword is got, keyword can be marked for extraction unit 306, such as keyword is dashed forwardGo out to be highlighted.Further, after keyword is got, keyword and place section can be established with paragraph where keywordIncidence relation between falling.In the present embodiment, keyword can be utilized to form a keyword set, and set for each keywordPut positioning and click on button, can be by clicking on the click button, user being capable of paragraph corresponding to fast positioning to keyword

Further, keyword extracting unit 306 can also receive modification of the user to phrase, and be marked, whenAfter there is the phrase next time, it is possible to shown using the phrase of modification, and count the frequency that hot word occurs in word content, willIt more than being added as new keyword for certain frequency, and can in real time come into force, can effectively lift the knowledge of the keywordOther accuracy rate.

Further, a database can also be set in processing unit 30, can be prestored in the databaseText template and/or some phrases or sentence for reusing.

Interrogator can choose a target text template by choosing unit 308 from database, select meshAfter marking text template, the can of transcription unit 302 carries out the record of word content according to the call format of target text template.EnterOne step, select draft template to be used as target text template from historical record or hard disk by choosing unit 308.The present embodimentIn, new text template can be created by selecting unit 308 and is deposited into database, can also be to the target text of selectionTemplate enters edlin, such as can change the size of font, deletion action such as the color of font or page footer etc..

Further, unit 308 is chosen during record, for example, during hearing record, inquests peopleVoice " what is your name " or " have a talk about cry what " of the member for suspect's name, are identified it is known that the intention of hearing peopleFor " name ", a simple record " name " can be thus formed in notes.Further, unit 308 is chosen to supportUser carries out self-defined editor to conventional sentence template.

Further, the edit cell 309 in processing unit 30 can be carried out to the word content of the transcription of transcription unit 302Editor, for example, typesetting is carried out to word content, or automatic check spelling mistake and basic syntax mistake, help user fastSpeed check and correction record manuscript.Further, edit cell 309 can also remove modal particle and unnecessary vocabulary, to ensure to recordIt is regular.In the present embodiment, automatically word content can be checked and arranged by edit cell 309, furtherReduce the working strength of interrogator so that interrogator can concentrate one's energy to be inquested.

Further, the translation unit 310 in processing unit 30 can realize the interpretative function of a variety of languages.Specifically,The interpretive order that translation unit 310 inputs according to user, wherein, interpretive order includes the target language after conversion, Ran HougenAccording to the word content that the interpretive order will identify that target language is translated from current languages.For example, it can be translated into from ChineseLanguage is tieed up, English is translated into from Chinese, Japanese etc. is translated into from Chinese.

In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and rememberedRecord, eliminates the reliance on manual identified voice messaging and is recorded manually, improves record efficiency, reduces human cost, and can be withReduce the probability of error of omission.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically showThe description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example descriptionPoint is contained at least one embodiment or example of the present utility model.In this manual, to the schematic table of above-mentioned termState and be necessarily directed to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be withCombined in an appropriate manner in any one or more embodiments or example.In addition, in the case of not conflicting, this areaTechnical staff the different embodiments or example and the feature of different embodiments or example described in this specification can be enteredRow combines and combination.

In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importanceOr the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed orImplicitly include at least one this feature.In description of the present utility model, " multiple " are meant that at least two, such as twoIt is individual, three etc., unless otherwise specifically defined.

For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or passDefeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipmentPut.The more specifically example (non-exhaustive list) of computer-readable medium includes following：Electricity with one or more wiringConnecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only depositsReservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitableMedium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other mediaHis suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each several part of the present utility model can be realized with hardware, software, firmware or combinations thereof.In above-mentioned embodiment, what multiple steps or method can be performed in memory and by suitable instruction execution system with storageSoftware or firmware are realized.Such as, if with hardware come realize with another embodiment, can be with well known in the artAny one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signalDiscrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), sceneProgrammable gate array (FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carriesSuddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage mediumIn matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

In addition, each functional unit in each embodiment of the utility model can be integrated in a processing module,Can be that unit is individually physically present, can also two or more units be integrated in a module.It is above-mentioned integratedModule can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated mouldIf block is realized in the form of software function module and counted as independent production marketing or in use, one can also be stored inIn calculation machine read/write memory medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch aboveEmbodiment of the present utility model is stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as new to this practicalityThe limitation of type, one of ordinary skill in the art can be changed to above-described embodiment in the scope of the utility model, repairChange, replace and modification.

Claims

A kind of 1. speech processing system, it is characterised in that including：
At least the first sound pick up equipment and the second sound pick up equipment, and the processing unit for being handled voice；
Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit；
First sound pick up equipment, for gathering the first voice of the first user；
Second sound pick up equipment, for gathering the second voice of second user；
The processing unit, for obtaining first voice or second voice, to first voice or describedSecond voice is identified to obtain corresponding word content and corresponding user, and is recorded according to the corresponding customer segmentThe word content.
2. speech processing system according to claim 1, it is characterised in that also include：
Sound card, it is connected respectively with first sound pick up equipment, second sound pick up equipment and the processing unit connects；
The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the processing and filledPut connection.
3. speech processing system according to claim 2, it is characterised in that the sound card is integrated in the second pickup dressIn putting；First sound pick up equipment is connected by second sound pick up equipment with the processing unit.
4. according to the speech processing system described in claim any one of 1-3, it is characterised in that the processing unit, including：Pick upSound unit, transcription unit and display screen；Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unitIt is connected respectively with the pickup unit and display unit；
Wherein, the pickup unit, for receiving first voice or second voice, the voice received is carried outPickup simultaneously carries out automatic noise reduction dereverberation；
The transcription unit, for carrying out speech recognition to the voice after being handled by the pickup unit, by the voiceThe content transformation of middle carrying into the word content and user corresponding to determining the word content, associate the word content withCorresponding user, and the user according to corresponding to identifying the word content, judge that the word content is with the preceding paragraph contentNo is same user, and if not same user, then segmentation records the word content；
The display screen, for showing the word content of record.
5. speech processing system according to claim 4, it is characterised in that the transcription unit, including：
Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by institute's predicateThe content transformation carried in sound extracts vocal print feature into the word content from the voice；
Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memory, whenThe vocal print feature extracted is not present in the vocal print memory, then is stored the vocal print feature extracted to instituteState vocal print memory and form user's mark, associate the word content and the user's mark；
The vocal print memory, for storing the vocal print feature of the user extracted first.
6. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to：
The memory cell being connected with the transcription unit and pickup unit, for storing first voice that receives and describedSecond voice；
The transcription unit, is additionally operable to during the word content is recorded, and is embedded according to sentence corresponding with the sentenceThe first information of raw tone；Wherein, ground of the voice that the first information includes receiving in the memory cellLocation and raw tone timestamp information corresponding with the sentence；
The broadcast unit being connected with the transcription unit, for when clicking on the sentence, institute to be played according to the first informationState the raw tone corresponding to sentence.
7. speech processing system according to claim 6, it is characterised in that the processing unit, in addition to：
The transcription unit, is additionally operable to during the word content is recorded, and is embedded according to paragraph corresponding with the paragraphSecond information of raw tone；Wherein, ground of the voice that second information includes receiving in the memory cellLocation and raw tone timestamp information corresponding with the paragraph；
The keyword extracting unit being connected with the transcription unit, for extracting keyword from the word content, form instituteState the incidence relation of keyword and place paragraph；
The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and described secondInformation, raw tone corresponding to paragraph where playing the keyword.
8. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to：Database,Text template and/or sentence template during for stored record；
Be connected with the transcription unit and the database choose unit, for before the transcription unit is recorded from instituteThere is in text template one target text template of selection, and match in recording process the meaning that current speech stated theWhen the meaning stated of one sentence template is consistent, first sound template is sent to the transcription unit and recorded,Wherein, first sound template is one in all sentence templates in the database.
9. speech processing system according to claim 4, it is characterised in that the processing unit, in addition to：
The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification；
The translation unit being connected with the transcription unit, for receiving the interpretive order of user, the interpretive order includes turningTarget language after changing, the word content is translated by target language from current languages according to the interpretive order.
10. according to the speech processing system described in claim any one of 5-9, it is characterised in that the processing unit is terminalEquipment.
11. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment andInclude microphone array in second sound pick up equipment respectively, wherein, first sound pick up equipment is line style microphone array, instituteIt is dish-type microphone array to state the second sound pick up equipment.
12. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipment andSecond sound pick up equipment is put according to the position relationship of setting at work.
13. according to the speech processing system described in claim any one of 5-9, it is characterised in that first sound pick up equipmentPickup scope covers first user；The distance of second sound pick up equipment and the second user will be in setting apart from modelIn enclosing.