Movatterモバイル変換


[0]ホーム

URL:


CN207149252U - Speech processing system - Google Patents

Speech processing system
Download PDF

Info

Publication number
CN207149252U
CN207149252UCN201720953479.XUCN201720953479UCN207149252UCN 207149252 UCN207149252 UCN 207149252UCN 201720953479 UCN201720953479 UCN 201720953479UCN 207149252 UCN207149252 UCN 207149252U
Authority
CN
China
Prior art keywords
voice
equipment
unit
sound pick
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201720953479.XU
Other languages
Chinese (zh)
Inventor
李飞
程旭
赵珣
袁俊杰
吕文杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Hear Technology Co Ltd
Original Assignee
Anhui Hear Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Hear Technology Co LtdfiledCriticalAnhui Hear Technology Co Ltd
Priority to CN201720953479.XUpriorityCriticalpatent/CN207149252U/en
Application grantedgrantedCritical
Publication of CN207149252UpublicationCriticalpatent/CN207149252U/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The utility model proposes a kind of speech processing system, wherein, the system includes:Including at least the first sound pick up equipment and the second sound pick up equipment, wherein the first sound pick up equipment and the second sound pick up equipment are connected with processing unit, first sound pick up equipment gathers the first voice messaging of the first user, second sound pick up equipment gathers the second voice messaging of second user, the first voice and the second voice are identified to obtain corresponding word content and corresponding user for processing unit, and according to customer segment shorthand content.In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminated the reliance on manual identified voice messaging and recorded manually, improve record efficiency, reduce human cost, and the probability of error of omission can be reduced.Especially during hearing is handled a case, the pressure during procurator handles a case can be alleviated so that personnel in charge of the case can put into more energy in the trial of case, lifting hearing quality.

Description

Speech processing system
Technical field
It the utility model is related to technical field of voice recognition, more particularly to a kind of speech processing system.
Background technology
At present, under some conference scenarios, hearing scene or talk scene, majority is by video and audio recording and artificialManually recorded mode forms recording documents, in order to the later stage checks and traces.But note when meeting, hearing or talkRecord personnel not only need to listen to voice, and it also requires being manually entered participant, suspect or what is said or talked about on computersThe content of speaking of people is talked about, for confirmation of signing, archive and follow-up business circulation.Record is in meeting, hearing or talkScene does not only result in record and felt exhausted unbearably, it is necessary to carry out the work listened to, record and checked etc. simultaneously,And also occur that situations such as omitting details or key content.
Especially during actually query or inquiry, personnel in charge of the case carries out hearing, record and verification etc. simultaneouslyWork, does not only result in personnel in charge of the case and feels exhausted unbearably, and also occurs that situations such as omitting details or crucial confession content.
Utility model content
The utility model is intended to one of technical problem at least solving in correlation technique to a certain extent.
Therefore, first purpose of the present utility model is to propose a kind of speech processing system, existed with alleviating recordPressure in meeting, hearing or conversation on course so that meeting, hearing or talk personnel can put into more energy in meetingIn view, hearing, conversation on course, for solving existing record personnel while carrying out the work of listen to, record and check etc.,Record is not only resulted in feel exhausted unbearably, and also occurs that the problem of omitting details or key content.
For the above-mentioned purpose, the utility model first aspect embodiment proposes a kind of speech processing system, including:
At least two sound pick up equipments and the processing unit for being handled voice;The sound pick up equipment includes the first tenMixer and the second sound pick up equipment;
Wherein, first sound pick up equipment is connected with second sound pick up equipment with the processing unit;
First sound pick up equipment, for gathering the first voice of the first user;
Second sound pick up equipment, for gathering the second voice of second user;
The processing unit, for obtaining first voice or second voice, to first voice orSecond voice is identified to obtain corresponding word content and corresponding user, and records institute according to the customer segmentState word content.
As a kind of possible implementation of the utility model first aspect embodiment, sound card, respectively with described firstSound pick up equipment, second sound pick up equipment connection and processing unit connection;
The sound card, for identifying user corresponding to the voice being currently received, and recognition result is sent to the placeManage device processing.
As a kind of possible implementation of the utility model first aspect embodiment, the sound card is integrated in describedIn two sound pick up equipments;First sound pick up equipment is connected by second sound pick up equipment with the processing unit.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, including:Pick upSound unit, transcription unit and display screen;Wherein, the pickup unit is connected with second sound pick up equipment, the transcription unitIt is connected respectively with the pickup unit and display screen;
Wherein, the pickup unit, for receiving first voice or second voice, to the voice receivedCarry out pickup and carry out automatic noise reduction dereverberation;
The transcription unit, for carrying out speech recognition to the voice after processing, in being carried in the voiceAppearance changes into the word content and user corresponding to determining the word content, associates the word content and corresponding user,The user according to corresponding to identifying the word content, judge whether the word content and the preceding paragraph content are same user,If not same user, then segmentation records the word content;
The display screen, for showing the word content of record.
As a kind of possible implementation of the utility model first aspect embodiment, the transcription unit, including:
Speech recognition subelement, for carrying out speech recognition to the voice after being handled by the pickup unit, by instituteThe content transformation carried in predicate sound extracts vocal print feature into the word content from the voice;
Contrast subunit, for the vocal print feature extracted to be compared with the vocal print feature in vocal print memoryIt is right, when the vocal print feature extracted is not present in the vocal print memory, then the vocal print feature extracted is depositedStore up the vocal print memory and form user's mark, associate the word content and the user's mark;
The vocal print feram memory, for storing the vocal print feature of the user extracted first.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
The memory cell being connected with the transcription unit and pickup unit, for store first voice that receives andSecond voice;
The transcription unit, is additionally operable to during the word content is recorded, according to sentence insertion and the sentence pairThe first information for the raw tone answered;Wherein, the voice that the first information includes receiving is in the memory cellAddress and raw tone timestamp information corresponding with the sentence;
The broadcast unit being connected with the transcription unit, for when clicking on the sentence, believing according to described firstBreath plays the raw tone corresponding to the sentence.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
The transcription unit, is additionally operable to during the word content is recorded, according to paragraph insertion and the paragraph pairSecond information of the raw tone answered;Wherein, the voice that second information includes receiving is in the memory cellAddress and raw tone timestamp information corresponding with the paragraph;
The keyword extracting unit being connected with the transcription unit, for extracting keyword, shape from the word contentInto the keyword and the incidence relation of place paragraph;
The broadcast unit, after being additionally operable to inquire about or clicking the keyword, according to the incidence relation and describedSecond information, raw tone corresponding to paragraph where playing the keyword.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:Database, text template and/or sentence template during for stored record;
It is connected with the transcription unit and the database and chooses unit, for before the transcription unit is recordedA target text template is chosen from all text templates, and the meaning that current speech is stated is matched in recording processThink the first sentence template the meaning stated it is consistent when, first sound template is sent to the transcription unit and rememberedRecord, wherein, first sound template is one in all sentence templates in the database.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit, in addition to:
The edit cell being connected with the transcription unit, the word content gone out for editing Real time identification;
The translation unit being connected with the transcription unit, for receiving the interpretive order of user, wrapped in the interpretive orderThe target language after conversion is included, the word content is translated by target language from current languages according to the interpretive order.
As a kind of possible implementation of the utility model first aspect embodiment, the processing unit sets for terminalIt is standby.
As a kind of possible implementation of the utility model first aspect embodiment, first sound pick up equipment and instituteState includes microphone array respectively in the second sound pick up equipment, wherein, first sound pick up equipment is line style microphone array, describedSecond sound pick up equipment is dish-type microphone array.
A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described firstSound pick up equipment and second sound pick up equipment are put according to the position relationship of setting at work.
A kind of possible implementation as the utility model first aspect embodiment, it is characterised in that described firstThe pickup scope of sound pick up equipment covers first user;Second sound pick up equipment and the distance of the second user will setIn fixed distance range.
The speech processing system of the utility model embodiment, including at least the first sound pick up equipment and the second sound pick up equipment, itsIn the first sound pick up equipment and the second sound pick up equipment be connected with processing unit, the first sound pick up equipment gathers the first voice of the first userInformation, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is carried out to the first voice and the second voiceIdentification obtains corresponding word content and corresponding user, and according to customer segment shorthand content.In the present embodiment, profitWith the mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified languageMessage ceases and recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.
Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,Alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial, carryHearing quality is risen, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and verification etc. simultaneouslyWork, do not only result in personnel in charge of the case and feel exhausted unbearably, and also occur that and omit details or the problem of crucial confession content.
The additional aspect of the utility model and advantage will be set forth in part in the description, partly by from following descriptionIn become obvious, or by it is of the present utility model practice recognize.
Brief description of the drawings
The above-mentioned and/or additional aspect of the utility model and advantage from the following description of the accompanying drawings of embodiments willBecome obvious and be readily appreciated that, wherein:
Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides;
Fig. 2 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 3 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 4 is a kind of application schematic diagram for speech processing system that the utility model embodiment provides;
Fig. 5 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 6 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 7 is the structural representation for another speech processing system that the utility model embodiment provides;
Fig. 8 is the structural representation for another speech processing system that the utility model embodiment provides.
Embodiment
Embodiment of the present utility model is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginningSame or similar element is represented to same or similar label eventually or there is the element of same or like function.Below by ginsengThe embodiment for examining accompanying drawing description is exemplary, it is intended to for explaining the utility model, and it is not intended that to the utility modelLimitation.
Below with reference to the accompanying drawings the speech processing system of the utility model embodiment is described.
The record of most queries or interrogation record is still using Word or WPS hand-kept suspicion of crime populations nowFor for confirmation of signing, archive and follow-up business circulation, not only resulting in personnel in charge of the case and feeling exhausted unbearably, but also can send outRaw situations such as omitting details or crucial confession content.
In view of the above-mentioned problems, the utility model embodiment proposes a kind of speech processing system, alleviate procurator and handled a casePressure in journey so that personnel in charge of the case can put into more energy in trial of handling a case, lifting hearing quality.
Fig. 1 is a kind of structural representation for speech processing system that the utility model embodiment provides.As shown in figure 1, shouldSpeech processing system includes:Including at least the first sound pick up equipment 10 and the second adaptive device 20 and for handling voiceProcessing unit 30 wherein, the first sound pick up equipment 10 and the second sound pick up equipment 20 are connected with processing unit 30.
First sound pick up equipment 10, for gathering the first voice of the first user.
Second sound pick up equipment 20, for gathering the second voice of second user.
Processing unit 30, for obtaining the first voice and the second voice, the first voice and the second voice are identifiedTo corresponding word content and corresponding user, and according to corresponding customer segment shorthand content.
As a kind of example, on the basis of Fig. 1, Fig. 2 provides the structural representation of another speech processing system.Such asShown in Fig. 2, speech processing system also includes a sound card 40, and the sound card 40 fills with the first sound pick up equipment 10, the second pickup respectivelyPut 20 connections and processing unit 30 connects.
In the present embodiment, user corresponding to the voice being currently received can be identified by sound card 30, and by recognition resultIt is sent to processing unit 30 to connect, user corresponding to word content of such can of processing unit 30 in recognition result entersRow record.
Specifically, sound card 40 is a hardware element, including two-way input interface, is connected all the way with the first sound pick up equipment 10Connect, receive the first voice of the first sound pick up equipment 10 collection, another way is connected with the second sound pick up equipment 20, receives the second pickup dressPut the second voice of 20 collections.Sound card 40 can distinguish the input interface corresponding to the voice received, and then can identifyCorresponding user, it is possible to achieve the separation of automatic speech role.
It should include under the conference scenario that multiple sound pick up equipments be present, on sound card 40 consistent with sound pick up equipment quantity defeatedIncoming interface, the corresponding input interface of each sound pick up equipment, and then sound card 40 can identify that to be currently received voice institute rightThe role answered.
As another example, on the basis of Fig. 2, Fig. 3 provides the structural representation of another speech processing system.As shown in figure 3, the sound card 40 in the speech processing system is integrated in the second sound pick up equipment 20, sound card 40 respectively with the first pickupDevice 10, the second sound pick up equipment 20 and processing unit 30 connect, and enable to the first sound pick up equipment 10 to be filled by the second pickupPut 20 to be connected with processing unit 30, can so avoid setting multiple interfaces in processing unit 30, pass through interface and sound pick up equipmentConnection.Alternatively, in the present embodiment, the second sound pick up equipment 20 includes the microphone (MIC) of collection), MIC connects with sound card 40Connect.
Further, voice preprocessor or software are also provided with the second sound pick up equipment 20, it is pre- with voiceProcessing routine or software carried out noise filtering, and analog-to-digital conversion to the voice received, then by pretreated voiceIt is input in processing unit 30 and carries out speech recognition, improves the accuracy of speech recognition.
It is alternatively possible to voice preprocessor or software are built in processing unit 30, before speech recognitionVoice pretreatment is carried out, and then the voice to receiving carried out noise filtering, and analog-to-digital conversion, then to voice after pretreatmentSpeech recognition is carried out, improves the accuracy of speech recognition.
As a kind of example, processing unit 30 can be a mobile workstation, can be notebook computer, super notesThe terminal device such as sheet, personal computer (Personal Computer, abbreviation PC), mobile phone or ipad.Can be in processing unitBeing provided with 30 being capable of speech recognition and software or hardware by recognition result transcription into word content.
In the present embodiment, in order to improve pickup effect, the first sound pick up equipment 10 and the second sound pick up equipment 20 can beMicrophone, sound pick-up etc., it is preferable that include microphone array in the first sound pick up equipment 10 and the second sound pick up equipment 20.Due toMicrophone array can realize orientation pickup, so as to wiping out background noise, improve the pickup quality of sound pick up equipment.
Speech processing system provided by the utility model, including at least the first sound pick up equipment and the second sound pick up equipment, whereinFirst sound pick up equipment and the second sound pick up equipment are connected with processing unit, and the first sound pick up equipment gathers the first voice letter of the first userBreath, the second sound pick up equipment gather the second voice messaging of second user, and processing unit is known to the first voice and the second voiceWord content and corresponding user corresponding to not obtaining, and according to customer segment shorthand content.In the present embodiment, utilizeThe mode of speech recognition, voice signal is automatically changed into letter signal and recorded, eliminate the reliance on manual identified voiceInformation is simultaneously recorded manually, improves record efficiency, reduces human cost, and can reduce the probability of error of omission.
Generally, under different application scenarios, the position that two sound pick up equipments are placed is different, it may be necessary to not similar shapeThe sound pick up equipment of shape.In the present embodiment, the first sound pick up equipment 10 can be line style microphone array, and the second sound pick up equipment 20 is diskType microphone array.The position relationship of first sound pick up equipment 10 and the second sound pick up equipment 20 during it is possible to further set work,Put according to the position relationship.For example, the first sound pick up equipment 10 is in front of the front of the second sound pick up equipment 20 or side.
As shown in figure 4, it is a kind of application schematic diagram of the present utility model.Speech processing system provided by the utility model is usedInquested in interrogator under suspect this scene.Under the scene suspect generally be sitting on stool, before will not setBarrier is put, therefore, the first sound pick up equipment 20 can be arranged to a line style microphone array.Typically can before interrogatorDesk is provided with, therefore, the second sound pick up equipment 20 can be arranged to a dish-type microphone array.
Specifically, line style microphone array points to the first user, and herein, the first user is suspect, gathers suspect'sFirst voice is as confession.The distance of line style microphone array and suspect are most long can be up to 5 meters.Dish-type microphone array is setPut in front of second user, second user is hearing people, gathers the second voice of interrogator.Line style microphone array and dish-typeMicrophone array can gather 8 road voices respectively.
In use, can according to actual scene adjust line style microphone array the elevation angle, can raise up or underBow.Generally, line style microphone array pickup angle is 30 degree, when in use, it is necessary to ensure suspect in the first pickupIn the range of the pickup of device.For example, line style microphone array can be pointed to suspect and be directed at suspect's face, or with lineThe angle of line centered on the axis of type microphone array, suspect's face disalignment or so is no more than 15 degree.
Further, dish-type microphone array is in line style microphone array dead astern or side rear, and second user is examinedNews personnel will maintain a certain distance with dish-type microphone array, the distance between interrogator and dish-type microphone array controlIn default distance range, apart from the excessive voice that can not gather interrogator well, distance can closely cause very much angle of depression mistakeIt is big to influence pickup quality.
Hearing people must can not point to line style microphone array at line style microphone array rear, such as dead asternPeople or lateral deviation are inquested to hearing people, the problems such as so causing not knowing the pickup of suspect.
Further, line style microphone array connects with dish-type microphone array, dish-type microphone array and one it is superNotebook connects, and the super notebook is the processing unit 30 in the utility model.Processing unit 30 obtains current speech and carried outIt identification, can identify corresponding to voice it is hearing people, and then can will identify that word content belongs to hearing people.Work as hearingAfter the completion of people puts question to, when suspect is answered, after processing unit 30 can receive voice again, the voice can recognize thatFrom suspect, and then segmentation record can be carried out to the word content identified, in order to subsequently consult, identified in wordIt can be shown after appearance on the screen of super notebook.
In Interrogation Procedure, typically start to put question to suspect by interrogator, and then can first distinguish hearing peopleThe sound characteristic of member, and then can distinguishes interrogator and suspect below.For example, " can be answered " by " asking " mode comeDistinguish the word content of interrogator and the word content of suspect.
Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,To alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial,Lifting hearing quality, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and core peer simultaneouslyThe work in face, personnel in charge of the case is not only resulted in and is felt exhausted unbearably, and also occur that and omit asking for details or crucial confession contentTopic.
As a kind of example, on the basis of above-described embodiment, the structure that Fig. 5 provides another speech processing system is shownIt is intended to.As shown in figure 5, the processing unit 30 in the speech processing system includes:Pickup unit 301, transcription unit 302 and displayScreen 303.Pickup unit 301 is connected with transcription unit 302, and transcription unit 302 is connected with display screen 303.
Pickup unit 301 receives the first voice or the second voice, and pickup is carried out to the voice received and is carried out automaticNoise reduction dereverberation, to improve the accuracy of subsequent speech recognition.Further, transcription unit 302 by pickup unit 301 to being handledRear voice carries out speech recognition, and then by user corresponding to the content transformation carried in voice into word content and determination, closesJoin word content and corresponding user.In the present embodiment, pickup unit 301 can be that the hardware being arranged in processing unit 30 connectsMouthful, the hardware interface can realize that reverberation is conciliate in the reception to voice.Transcription unit 302 is the speech recognition in processing unit 30Chip, speech recognition can be carried out to the voice that receives, voice content is converted into word description.
As a kind of example, pickup unit 301 is connected with sound card 40, is receiving the same of the first voice or the second voiceWhen, the recognition result that sound card 40 transmits can be received, determines user corresponding to the current speech that receives, so it is rightThe voice received is identified, by the content transformation carried in voice into word content, and associate word content with it is correspondingUser.
As a kind of example, transcription unit 302 can extract the vocal print feature of the voice received, and then according to vocal printFeature determines user corresponding to word content.Fig. 6 is another speech processing system that the utility model embodiment providesStructural representation.Fig. 6 speech processing system transfers r/w cell 302 includes:
Speech recognition subelement 3021, contrast subunit 3022 and vocal print memory 3023.Speech recognition subelement 3021It is connected with from pickup unit 301, the first voice or the second voice from pickup unit 301 after reception processing.
Contrast subunit 3022 is connected with speech recognition subelement 3021, and vocal print memory 3023 is sub with speech recognition respectivelyUnit 3021 and contrast subunit 3022 connect.
Wherein, speech recognition subelement 3021 carries out voice to receiving the voice after the automatic noise reduction dereverberation of pickupIdentification, by the content transformation carried in voice into the word content, and extracts vocal print feature from voice.Further,The vocal print feature extracted is sent to contrast subunit 3022 by speech recognition subelement 3021, and contrast subunit 3022 will be extractedTo vocal print sign be compared with existing vocal print feature in vocal print memory, when the vocal print feature extracted is not present in soundIn line memory, then the vocal print feature extracted is stored and to vocal print memory and forms user's mark, associate word content andUser's mark.Wherein, user's mark is used to mark user corresponding to word content, for example, user's mark can be user C orUser 5 etc..
In the present embodiment, the vocal print memory 3023 that is set in transcription unit 302 can store the vocal print occurred firstFeature.That is, under each new usage scenario, a new vocal print memory 3023 can be all established, is using itJust, it is any vocal print feature of storage in the vocal print memory 3023.During speech recognition, whenever one new sound of appearanceAfter line feature, just by the storage of this vocal print feature into vocal print memory 3023, the voice for coming to subsequent acquisition is knownNot, user corresponding to the voice is determined.After usage scenario is switched, vocal print feature in vocal print memory 3023 can't be byIt is shared, simply used for this usage scenario.
This time need to say, although the vocal print feature in vocal print memory 3023 can not be shared between different scenes,It is that can be managed center or security department is acquired as sample, such as public security system.
Specifically, transcription unit 302 can according to corresponding to word content in recognition result user, judge the word contentWhether it is same user with the preceding paragraph word content, if non-same user, segmentation is recorded in the word in the recognition resultHold.
In the present embodiment, the word content of transcription can be sent to display screen 303 and word content exists by transcription unit 302Shown on display screen, multiple viewing areas can be divided into display screen 303, and one of viewing area is documents editingRegion, for word content recorded before showing, another region word content adds region, current real-time for showingThe word content identified.It is shown separately just to have manually by setting multiple viewing areas to realize the automatic addition of wordDebug.
In the present embodiment, the sound property of the voice extracted by transcription unit 302 can carry out role's separation, so thatObtaining transcription unit 302 can realize that conversational mode is recorded, for example can be according to " question and answer " mode under man-to-man sceneRecorded.
Further, transcription unit 302 can also use voice activity detection (Voice Activity Detection,Abbreviation VAD) it is segmented, for example, certain time interval can be set, between Jing Yin time interval exceedes the default timeEvery when, it is possible to by the word content of same user this it is Jing Yin point out carry out cutting, then by below word content recordIn the next paragraph.
In the above-described example on basis, the structure that Fig. 7 provides another speech processing system of the present utility model is shownIt is intended to.As shown in fig. 7, processing unit also includes:Memory cell 304 and broadcast unit 305.
Memory cell 304 is connected with transcription unit 302 and pickup unit 301 respectively, can store the first language receivedSound and the second voice.Transcription unit 302 is embedded in raw tone corresponding with sentence during shorthand content, according to sentenceThe first information.The first information includes address in memory cell 304 of the voice that receives and corresponding with sentence originalThe timestamp information of voice.Timestamp that the sentence starts and the timestamp of end can be recorded out.
Broadcast unit 305 is connected with transcription unit 302, and user is when some sentence in the word for clicking on record, rootAccording to the first information embedded in the sentence, it is possible to get voice address in memory cell 304, and according to the address withAnd timestamp information, it is possible to it is determined that the starting point and end point of raw tone corresponding with the sentence, and then play out thisVoice in the individual period.In the present embodiment, broadcast unit 305 can be loudspeaker or microphone array, such as can beCollar plate shape microphone etc..
Further, due to being provided with broadcast unit 305, the actual word content recorded is also played, so for notThe suspect of understanding word can play to suspect by way of machine reads aloud notes and listen, and effectively mitigate the personnel of procuratorial workOperating pressure.
In the present embodiment, by being that each sentence is embedded in the first of raw tone corresponding with the sentence in word contentInformation, the original contents of playback required for can neatly clicking on.
Especially in Interrogation Procedure, raw tone can be played according to each sentence in the record of trial.This is for the later stageThe unreasonable demand or behavior of withdrawing a confession that suspect proposes in court trial process, there is provided trial evidence, can precisely recall.It is and existingBeing recorded a video by synchronization, the data time to be formed is long, capacity is big, and suspect can not timely and accurately be navigated to by, which often leading to, turns overFor the video and audio recording of part, solves the problem that can not precisely recall in the prior art.
On Fig. 7 basis, Fig. 8 provides the structural representation of another speech processing system of the present utility model.As depicted in figure 8, the processing unit 30 also includes:Keyword extracting unit 306, database 307, choose unit 308, edit cell309 and translation unit 310.Wherein, keyword extracting unit 306 is connected with transcription unit 302, chooses unit 308 respectively with turningR/w cell 302 and database 307 are connected, and edit cell 309 is connected with transcription unit 302, translation unit 310 and transcription unit302 connections.
In the present embodiment, transcription unit 302 is corresponding with paragraph former according to paragraph insertion during shorthand contentSecond information of beginning voice;Wherein, the second information include the voice that receives address in the memory unit and with paragraph pairThe raw tone timestamp information answered.
Extraction unit 306 can be handled (Natural Language Processing, abbreviation NLP) by natural-soundingTechnology, keyword, such as time, place, personage, event and origin of an incident key are automatically extracted on the word content identifiedWord.Further, after keyword is got, keyword can be marked for extraction unit 306, such as keyword is dashed forwardGo out to be highlighted.Further, after keyword is got, keyword and place section can be established with paragraph where keywordIncidence relation between falling.In the present embodiment, keyword can be utilized to form a keyword set, and set for each keywordPut positioning and click on button, can be by clicking on the click button, user being capable of paragraph corresponding to fast positioning to keyword
Further, keyword extracting unit 306 can also receive modification of the user to phrase, and be marked, whenAfter there is the phrase next time, it is possible to shown using the phrase of modification, and count the frequency that hot word occurs in word content, willIt more than being added as new keyword for certain frequency, and can in real time come into force, can effectively lift the knowledge of the keywordOther accuracy rate.
Further, a database can also be set in processing unit 30, can be prestored in the databaseText template and/or some phrases or sentence for reusing.
Interrogator can choose a target text template by choosing unit 308 from database, select meshAfter marking text template, the can of transcription unit 302 carries out the record of word content according to the call format of target text template.EnterOne step, select draft template to be used as target text template from historical record or hard disk by choosing unit 308.The present embodimentIn, new text template can be created by selecting unit 308 and is deposited into database, can also be to the target text of selectionTemplate enters edlin, such as can change the size of font, deletion action such as the color of font or page footer etc..
Further, unit 308 is chosen during record, for example, during hearing record, inquests peopleVoice " what is your name " or " have a talk about cry what " of the member for suspect's name, are identified it is known that the intention of hearing peopleFor " name ", a simple record " name " can be thus formed in notes.Further, unit 308 is chosen to supportUser carries out self-defined editor to conventional sentence template.
Further, the edit cell 309 in processing unit 30 can be carried out to the word content of the transcription of transcription unit 302Editor, for example, typesetting is carried out to word content, or automatic check spelling mistake and basic syntax mistake, help user fastSpeed check and correction record manuscript.Further, edit cell 309 can also remove modal particle and unnecessary vocabulary, to ensure to recordIt is regular.In the present embodiment, automatically word content can be checked and arranged by edit cell 309, furtherReduce the working strength of interrogator so that interrogator can concentrate one's energy to be inquested.
Further, the translation unit 310 in processing unit 30 can realize the interpretative function of a variety of languages.Specifically,The interpretive order that translation unit 310 inputs according to user, wherein, interpretive order includes the target language after conversion, Ran HougenAccording to the word content that the interpretive order will identify that target language is translated from current languages.For example, it can be translated into from ChineseLanguage is tieed up, English is translated into from Chinese, Japanese etc. is translated into from Chinese.
In the present embodiment, using the mode of speech recognition, voice signal is automatically changed into letter signal and rememberedRecord, eliminates the reliance on manual identified voice messaging and is recorded manually, improves record efficiency, reduces human cost, and can be withReduce the probability of error of omission.
Especially during hearing is handled a case, dialogue can be turned by word by the speech processing system of the present embodiment in real time,To alleviate the pressure during procurator handles a case so that personnel in charge of the case can put into more energy in case trial,Lifting hearing quality, for solving in existing Interrogation Procedure, personnel in charge of the case needs to carry out hearing, record and core peer simultaneouslyThe work in face, personnel in charge of the case is not only resulted in and is felt exhausted unbearably, and also occur that and omit asking for details or crucial confession contentTopic.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically showThe description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example descriptionPoint is contained at least one embodiment or example of the present utility model.In this manual, to the schematic table of above-mentioned termState and be necessarily directed to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be withCombined in an appropriate manner in any one or more embodiments or example.In addition, in the case of not conflicting, this areaTechnical staff the different embodiments or example and the feature of different embodiments or example described in this specification can be enteredRow combines and combination.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importanceOr the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed orImplicitly include at least one this feature.In description of the present utility model, " multiple " are meant that at least two, such as twoIt is individual, three etc., unless otherwise specifically defined.
For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or passDefeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipmentPut.The more specifically example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiringConnecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only depositsReservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitableMedium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other mediaHis suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present utility model can be realized with hardware, software, firmware or combinations thereof.In above-mentioned embodiment, what multiple steps or method can be performed in memory and by suitable instruction execution system with storageSoftware or firmware are realized.Such as, if with hardware come realize with another embodiment, can be with well known in the artAny one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signalDiscrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), sceneProgrammable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carriesSuddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage mediumIn matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the utility model can be integrated in a processing module,Can be that unit is individually physically present, can also two or more units be integrated in a module.It is above-mentioned integratedModule can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated mouldIf block is realized in the form of software function module and counted as independent production marketing or in use, one can also be stored inIn calculation machine read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch aboveEmbodiment of the present utility model is stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as new to this practicalityThe limitation of type, one of ordinary skill in the art can be changed to above-described embodiment in the scope of the utility model, repairChange, replace and modification.

Claims (13)

CN201720953479.XU2017-08-012017-08-01Speech processing systemActiveCN207149252U (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201720953479.XUCN207149252U (en)2017-08-012017-08-01Speech processing system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201720953479.XUCN207149252U (en)2017-08-012017-08-01Speech processing system

Publications (1)

Publication NumberPublication Date
CN207149252Utrue CN207149252U (en)2018-03-27

Family

ID=61674157

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201720953479.XUActiveCN207149252U (en)2017-08-012017-08-01Speech processing system

Country Status (1)

CountryLink
CN (1)CN207149252U (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108922525A (en)*2018-06-192018-11-30Oppo广东移动通信有限公司Method of speech processing, device, storage medium and electronic equipment
CN109033150A (en)*2018-06-122018-12-18平安科技(深圳)有限公司Sensitive word verification method, device, computer equipment and storage medium
CN109410933A (en)*2018-10-182019-03-01珠海格力电器股份有限公司Device control method and apparatus, storage medium, and electronic apparatus
CN109976700A (en)*2019-01-252019-07-05广州富港万嘉智能科技有限公司A kind of method, electronic equipment and the storage medium of the transfer of recording permission
CN110211581A (en)*2019-05-162019-09-06济南市疾病预防控制中心A kind of laboratory automatic speech recognition record identification system and method
CN110460798A (en)*2019-06-262019-11-15平安科技(深圳)有限公司Video Interview service processing method, device, terminal and storage medium
CN110588524A (en)*2019-08-022019-12-20精电有限公司 A method for displaying information and a vehicle-mounted auxiliary display system
CN110751950A (en)*2019-10-252020-02-04武汉森哲地球空间信息技术有限公司Police conversation voice recognition method and system based on big data
CN110858492A (en)*2018-08-232020-03-03阿里巴巴集团控股有限公司Audio editing method, device, equipment and system and data processing method
CN111128132A (en)*2019-12-192020-05-08秒针信息技术有限公司Voice separation method, device and system and storage medium
CN111145775A (en)*2019-12-192020-05-12秒针信息技术有限公司Voice separation method, device and system and storage medium
CN111276155A (en)*2019-12-202020-06-12上海明略人工智能(集团)有限公司Voice separation method, device and storage medium
CN111461946A (en)*2020-04-142020-07-28山东致群信息技术有限公司Intelligent public security interrogation system
CN111627448A (en)*2020-05-152020-09-04公安部第三研究所System and method for realizing trial and talk control based on voice big data
CN111953852A (en)*2020-07-302020-11-17北京声智科技有限公司Call record generation method, device, terminal and storage medium
CN112307156A (en)*2019-07-262021-02-02北京宝捷拿科技发展有限公司Cross-language intelligent auxiliary side inspection method and system
CN113936697A (en)*2020-07-102022-01-14北京搜狗智能科技有限公司Voice processing method and device for voice processing
CN114255760A (en)*2021-12-152022-03-29江苏税软软件科技有限公司 Inquiry recording system and method

Cited By (23)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109033150A (en)*2018-06-122018-12-18平安科技(深圳)有限公司Sensitive word verification method, device, computer equipment and storage medium
CN109033150B (en)*2018-06-122024-01-30平安科技(深圳)有限公司Sensitive word verification method, device, computer equipment and storage medium
WO2019242414A1 (en)*2018-06-192019-12-26Oppo广东移动通信有限公司Voice processing method and apparatus, storage medium, and electronic device
CN108922525A (en)*2018-06-192018-11-30Oppo广东移动通信有限公司Method of speech processing, device, storage medium and electronic equipment
CN110858492A (en)*2018-08-232020-03-03阿里巴巴集团控股有限公司Audio editing method, device, equipment and system and data processing method
CN109410933A (en)*2018-10-182019-03-01珠海格力电器股份有限公司Device control method and apparatus, storage medium, and electronic apparatus
CN109410933B (en)*2018-10-182021-02-19珠海格力电器股份有限公司Device control method and apparatus, storage medium, and electronic apparatus
CN109976700A (en)*2019-01-252019-07-05广州富港万嘉智能科技有限公司A kind of method, electronic equipment and the storage medium of the transfer of recording permission
CN110211581A (en)*2019-05-162019-09-06济南市疾病预防控制中心A kind of laboratory automatic speech recognition record identification system and method
CN110460798A (en)*2019-06-262019-11-15平安科技(深圳)有限公司Video Interview service processing method, device, terminal and storage medium
CN112307156A (en)*2019-07-262021-02-02北京宝捷拿科技发展有限公司Cross-language intelligent auxiliary side inspection method and system
CN110588524A (en)*2019-08-022019-12-20精电有限公司 A method for displaying information and a vehicle-mounted auxiliary display system
CN110588524B (en)*2019-08-022021-01-01精电有限公司Information display method and vehicle-mounted auxiliary display system
CN110751950A (en)*2019-10-252020-02-04武汉森哲地球空间信息技术有限公司Police conversation voice recognition method and system based on big data
CN111128132A (en)*2019-12-192020-05-08秒针信息技术有限公司Voice separation method, device and system and storage medium
CN111145775A (en)*2019-12-192020-05-12秒针信息技术有限公司Voice separation method, device and system and storage medium
CN111276155A (en)*2019-12-202020-06-12上海明略人工智能(集团)有限公司Voice separation method, device and storage medium
CN111276155B (en)*2019-12-202023-05-30上海明略人工智能(集团)有限公司Voice separation method, device and storage medium
CN111461946A (en)*2020-04-142020-07-28山东致群信息技术有限公司Intelligent public security interrogation system
CN111627448A (en)*2020-05-152020-09-04公安部第三研究所System and method for realizing trial and talk control based on voice big data
CN113936697A (en)*2020-07-102022-01-14北京搜狗智能科技有限公司Voice processing method and device for voice processing
CN111953852A (en)*2020-07-302020-11-17北京声智科技有限公司Call record generation method, device, terminal and storage medium
CN114255760A (en)*2021-12-152022-03-29江苏税软软件科技有限公司 Inquiry recording system and method

Similar Documents

PublicationPublication DateTitle
CN207149252U (en)Speech processing system
CN105100360B (en)Call householder method and device for voice communication
CN205647778U (en)Intelligent conference system
US9715873B2 (en)Method for adding realism to synthetic speech
US8407049B2 (en)Systems and methods for conversation enhancement
GB2362745A (en)Transcription of text from computer voice mail
US20210232776A1 (en)Method for recording and outputting conversion between multiple parties using speech recognition technology, and device therefor
CN103678269A (en)Information processing method and device
DE102004050785A1 (en) Method and arrangement for processing messages in the context of an integrated messaging system
WO2005027092A1 (en)Document creation/reading method, document creation/reading device, document creation/reading robot, and document creation/reading program
CN109754788A (en)A kind of sound control method, device, equipment and storage medium
EP2682931B1 (en)Method and apparatus for recording and playing user voice in mobile terminal
CN109346057A (en)A kind of speech processing system of intelligence toy for children
US12041313B2 (en)Data processing method and apparatus, device, and medium
CN111415128A (en) Method, system, apparatus, device and medium for controlling conference
CN110751950A (en)Police conversation voice recognition method and system based on big data
CN117995195A (en)Method, device, equipment and storage medium for generating broadcast play
CN117371459A (en)Conference auxiliary system and method based on intelligent voice AI real-time translation
CN112581965A (en)Transcription method, device, recording pen and storage medium
KR20220009319A (en)Apparatus and method for video conferencing service
WO2021134284A1 (en)Voice information processing method, hub device, control terminal and storage medium
KR102575038B1 (en)Apparatus and method for video conferencing service
CN109922397A (en)Audio intelligent processing method, storage medium, intelligent terminal and smart bluetooth earphone
CN108920470A (en)A kind of language of automatic detection audio and the method translated
CN113076747A (en)Voice recognition recording method based on role recognition

Legal Events

DateCodeTitleDescription
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp